Tag Archives: Architecture

Building a Controlled Environment Agriculture Platform

2020-12-22 Ashu Joshi

Post Syndicated from Ashu Joshi original https://aws.amazon.com/blogs/architecture/building-a-controlled-environment-agriculture-platform/

This post was co-written by Michael Wirig, Software Engineering Manager at Grōv Technologies.

A substantial percentage of the world’s habitable land is used for livestock farming for dairy and meat production. The dairy industry has leveraged technology to gain insights that have led to drastic improvements and are continuing to accelerate. A gallon of milk in 2017 involved 30% less water, 21% less land, a 19% smaller carbon footprint, and 20% less manure than it did in 2007 (US Dairy, 2019). By focusing on smarter water usage and sustainable land usage, livestock farming can grow to provide sustainable and nutrient-dense food for consumers and livestock alike.

Grōv Technologies (Grōv) has pioneered the Olympus Tower Farm, a fully automated Controlled Environment Agriculture (CEA) system. Unique amongst vertical farming startups, Grōv is growing cattle feed to improve that sustainable use of land for livestock farming while increasing the economic margins for dairy and beef producers.

The challenges of CEA

The set of growing conditions for a CEA is called a “recipe,” which is a combination of ingredients like temperature, humidity, light, carbon dioxide levels, and water. The optimal recipe is dynamic and is sensitive to its ingredients. Crops must be monitored in near-real time, and CEAs should be able to self-correct in order to maintain the recipe. To build a system with these capabilities requires answers to the following questions:

What parameters are needed to measure for indoor cattle feed production?
What sensors enable the accuracy and price trade-offs at scale?
Where do you place the sensors to ensure a consistent crop?
How do you correlate the data from sensors to the nutrient value?

To progress from a passively monitored system to a self-correcting, autonomous one, the CEA platform also needs to address:

How to maintain optimum crop conditions
How the system can learn and adapt to new seed varieties
How to communicate key business drivers such as yield and dry matter percentage

Grōv partnered with AWS Professional Services (AWS ProServe) to build a digital CEA platform addressing the challenges posed above.

Olympus Tower - Grov Technologies

Tower automation and edge platform

The Olympus Tower is instrumented for measuring recipe ingredients by combining the mechanical, electrical, and domain expertise of the Grōv team with the IoT edge and sensor expertise of the AWS ProServe team. The teams identified a primary set of features such as height, weight, and evenness of the growth to be measured at multiple stages within the Tower. Sensors were also added to measure secondary features such as water level, water pH, temperature, humidity, and carbon dioxide.

The teams designed and developed a purpose-built modular and industrial sensor station. Each sensor station has sensors for direct measurement of the features identified. The sensor stations are extended to support indirect measurement of features using a combination of Computer Vision and Machine Learning (CV/ML).

The trays with the growing cattle feed circulate through the Olympus Tower. A growth cycle starts on a tray with seeding, circulates through the tower over the cycle, and returns to the starting position to be harvested. The sensor station at the seeding location on the Olympus Tower tags each new growth cycle in a tray with a unique “Grow ID.” As trays pass by, each sensor station in the Tower collects the feature data. The firmware, jointly developed for the sensor station, uses AWS IoT SDK to stream the sensor data along with the Grow ID and metadata that’s specific to the sensor station. This information is sent every five minutes to an on-site edge gateway powered by AWS IoT Greengrass. Dedicated AWS Lambda functions manage the lifecycle of the Grow IDs and the sensor data processing on the edge.

The Grōv team developed AWS Greengrass Lambda functions running at the edge to ingest critical metrics from the operation automation software running the Olympus Towers. This information provides the ability to not just monitor the operational efficiency, but to provide the hooks to control the feedback loop.

The two sources of data were augmented with site-level data by installing sensor stations at the building level or site level to capture environmental data such as weather and energy consumption of the Towers.

All three sources of data are streamed to AWS IoT Greengrass and are processed by AWS Lambda functions. The edge software also fuses the data and correlates all categories of data together. This enables two major actions for the Grōv team – operational capability in real-time at the edge and enhanced data streamed into the cloud.

Grov Technologies - Architecture

Cloud pipeline/platform: analytics and visualization

As the data is streamed to AWS IoT Core via AWS IoT Greengrass. AWS IoT rules are used to route ingested data to store in Amazon Simple Sotrage Service (Amazon S3) and Amazon DynamoDB. The data pipeline also includes Amazon Kinesis Data Streams for batching and additional processing on the incoming data.

A ReactJS-based dashboard application is powered using Amazon API Gateway and AWS Lambda functions to report relevant metrics such as daily yield and machine uptime.

A data pipeline is deployed to analyze data using Amazon QuickSight. AWS Glue is used to create a dataset from the data stored in Amazon S3. Amazon Athena is used to query the dataset to make it available to Amazon QuickSight. This provides the extended Grōv tech team of research scientists the ability to perform a series of what-if analyses on the data coming in from the Tower Systems beyond what is available in the react-based dashboard.

Data pipeline - Grov Technologies

Completing the data-driven loop

Now that the data has been collected from all sources and stored it in a data lake architecture, the Grōv CEA platform established a strong foundation for harnessing the insights and delivering the customer outcomes using machine learning.

The integrated and fused data from the edge (sourced from the Olympus Tower instrumentation, Olympus automation software data, and site-level data) is co-related to the lab analysis performed by Grōv Research Center (GRC). Harvest samples are routinely collected and sent to the lab, which performs wet chemistry and microbiological analysis. Trays sent as samples to the lab are associated with the results of the analysis with the sensor data by corresponding Grow IDs. This serves as a mechanism for labeling and correlating the recipe data with the parameters used by dairy and beef producers – dry matter percentage, micro and macronutrients, and the presence of myco-toxins.

Grōv has chosen Amazon SageMaker to build a machine learning pipeline on its comprehensive data set, which will enable fine tuning the growing protocols in near real-time. Historical data collection unlocks machine learning use cases for future detection of anomalous sensors readings and sensor health monitoring, as well.

Because the solution is flexible, the Grōv team plans to integrate data from animal studies on their health and feed efficiency into the CEA platform. Machine learning on the data from animal studies will enhance the tuning of recipe ingredients that impact the animals’ health. This will give the farmer an unprecedented view of the impact of feed nutrition on the end product and consumer.

Conclusion

Grōv Technologies and AWS ProServe have built a strong foundation for an extensible and scalable architecture for a CEA platform that will nourish animals for better health and yield, produce healthier foods and to enable continued research into dairy production, rumination and animal health to empower sustainable farming practices.

Field Notes: Managing an Amazon EKS Cluster Using AWS CDK and Cloud Resource Property Manager

2020-12-16 Raj Seshadri

Post Syndicated from Raj Seshadri original https://aws.amazon.com/blogs/architecture/field-notes-managing-an-amazon-eks-cluster-using-aws-cdk-and-cloud-resource-property-manager/

This post is contributed by Bill Kerr and Raj Seshadri

For most customers, infrastructure is hardly done with CI/CD in mind. However, Infrastructure as Code (IaC) should be a best practice for DevOps professionals when they provision cloud-native assets. Microservice apps that run inside an Amazon EKS cluster often use CI/CD, so why not the cluster and related cloud infrastructure as well?

This blog demonstrates how to spin up cluster infrastructure managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Managing cloud resources is ultimately about managing properties, such as instance type, cluster version, etc. CRPM helps you organize all those properties by importing bite-sized YAML files, which are stitched together with CDK. It keeps all of what’s good about YAML in YAML, and places all of the logic in beautiful CDK code. Ultimately this improves productivity and reliability as it eliminates manual configuration steps.

Architecture Overview

In this architecture, we create a six node Amazon EKS cluster. The Amazon EKS cluster has a node group spanning private subnets across two Availability Zones. There are two public subnets in different Availability Zones available for use with an Elastic Load Balancer.

EKS architecture diagram

Changes to the primary (master) branch triggers a pipeline, which creates CloudFormation change sets for an Amazon EKS stack and a CI/CD stack. After human approval, the change sets are initiated (executed).

CloudFormation sets

Prerequisites

Get ready to deploy the CloudFormation stacks with CDK

First, to get started with CDK you spin up a AWS Cloud9 environment, which gives you a code editor and terminal that runs in a web browser. Using AWS Cloud9 is optional but highly recommended since it speeds up the process.

Create a new AWS Cloud9 environment

Navigate to Cloud9 in the AWS Management Console.
Select Create environment.
Enter a name and select Next step.
Leave the default settings and select Next step again.
Select Create environment.

Download and install the dependencies and demo CDK application

In a terminal, let’s review the code used in this article and install it.

# Install TypeScript globally for CDK
npm i -g typescript

# If you are running these commands in Cloud9 or already have CDK installed, then
skip this command
npm i -g aws-cdk

# Clone the demo CDK application code
git clone https://github.com/shi/crpm-eks

# Change directory
cd crpm-eks

# Install the CDK application
npm i

Create the IAM service role

When creating an EKS cluster, the IAM role that was used to create the cluster is also the role that will be able to access it afterwards.

Deploy the CloudFormation stack containing the role

Let’s deploy a CloudFormation stack containing a role that will later be used to create the cluster and also to access it. While we’re at it, let’s also add our current user ARN to the role, so that we can assume the role.

# Deploy the EKS management role CloudFormation stack
cdk deploy role --parameters AwsArn=$(aws sts get-caller-identity --query Arn --output text)

# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying

Notice the Outputs section that shows up in the CDK deploy results, which contains the role name and the role ARN. You will need to copy and paste the role ARN (ex. arn:aws:iam::123456789012:role/eks-role-us-east-
1) from your Outputs when deploying the next stack.

Example Outputs:
role.ExportsOutputRefRoleFF41A16F = eks-role-us-east-1
role.ExportsOutputFnGetAttRoleArnED52E3F8 = arn:aws:iam::123456789012:role/eksrole-us-east-1

Create the EKS cluster

Now that we have a role created, it’s time to create the cluster using that role.

Deploy the stack containing the EKS cluster in a new VPC

Expect it to take over 20 minutes for this stack to deploy.

# Deploy the EKS cluster CloudFormation stack
# REPLACE ROLE_ARN WITH ROLE ARN FROM OUTPUTS IN ROLE STACK CREATED ABOVE
cdk deploy eks -r ROLE_ARN

# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying

Notice the Outputs section, which contains the cluster name (ex. eks-demo) and the UpdateKubeConfigCommand. The UpdateKubeConfigCommand is useful if you already have kubectl installed somewhere and would rather use your own to interact with the cluster instead of using Cloud9’s.

Example Outputs:
eks.ExportsOutputRefControlPlane70FAD3FA = eks-demo
eks.UpdateKubeConfigCommand = aws eks update-kubeconfig --name eks-demo --region
us-east-1 --role-arn arn:aws:iam::123456789012:role/eks-role-us-east-1
eks.FargatePodExecutionRoleArn = arn:aws:iam::123456789012:role/eks-cluster-
FargatePodExecutionRole-U495K4DHW93M

Navigate to this page in the AWS console if you would like to see your cluster, which is now ready to use.

Configure `kubectl` with access to cluster

If you are following along in Cloud9, you can skip configuring kubectl.

If you prefer to use kubectl installed somewhere else, now would be a good time to configure access to the newly created cluster by running the UpdateKubeConfigCommand mentioned in the Outputs section above. It requires that you have the AWS CLI installed and configured.

aws eks update-kubeconfig --name eks-demo --region us-east-1 --role-arn
arn:aws:iam::123456789012:role/eks-role-us-east-1

# Test access to cluster
kubectl get nodes

Leveraging Infrastructure CI/CD

Now that the VPC and cluster have been created, it’s time to turn on CI/CD. This will create a cloned copy of github.com/shi/crpm-eks in CodeCommit. Then, an AWS CloudWatch Events rule will start watching the CodeCommit repo for changes and triggering a CI/CD pipeline that builds and validates CloudFormation templates, and executes CloudFormation change sets.

Deploy the stack containing the code repo and pipeline

# Deploy the CI/CD CloudFormation stack
cdk deploy cicd

# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying

Notice the Outputs section, which contains the CodeCommit repo name (ex. eks-ci-cd). This is where the code now lives that is being watched for changes.

Example Outputs:
cicd.ExportsOutputFnGetAttLambdaRoleArn275A39EB =
arn:aws:iam::774461968944:role/eks-ci-cd-LambdaRole-6PFYXVSLTQ0D
cicd.ExportsOutputFnGetAttRepositoryNameC88C868A = eks-ci-cd

Review the pipeline for the first time

Navigate to this page in the AWS console and you should see a new pipeline in progress. The pipeline is automatically run for the first time when it is created, even though no changes have been made yet. Open the pipeline and scroll down to the Review stage. You’ll see that two change sets were created in parallel (one for the EKS stack and the other for the CI/CD stack).

CICD image

Select Review to open an approval popup where you can enter a comment.
Select Reject or Approve. Following the Review button, the blue link to the left of Fetch: Initial commit by AWS CodeCommit can be selected to see the infrastructure code changes that triggered the pipeline.

review screen

Go ahead and approve it.

Clone the new AWS CodeCommit repo

Now that the golden source that is being watched for changes lives in a AWS CodeCommit repo, we need to clone that repo and get rid of the repo we’ve been using up to this point.

If you are following along in AWS Cloud9, you can skip cloning the new repo because you are just going to discard the old AWS Cloud9 environment and start using a new one.

Now would be a good time to clone the newly created repo mentioned in the preceding Outputs section Next, delete the old repo that was cloned from GitHub at the beginning of this blog. You can visit this repository to get the clone URL for the repo.

Review this documentation for help with accessing your private AWS CodeCommit repo using HTTPS.

Review this documentation for help with accessing your repo using SSH.

# Clone the CDK application code (this URL assumes the us-east-1 region)
git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/eks-ci-cd

# Change directory
cd eks-ci-cd

# Install the CDK application
npm i

# Remove the old repo
rm -rf ../crpm-eks

Deploy the stack containing the Cloud9 IDE with `kubectl` and CodeCommit repo

If you are NOT using Cloud9, you can skip this section.

To make life easy, let’s create another Cloud9 environment that has kubectl preconfigured and ready to use, and also has the new CodeCommit repo checked out and ready to edit.

# Deploy the IDE CloudFormation stack
cdk deploy ide

Configuring the new Cloud9 environment

Although kubectl and the code are now ready to use, we still have to manually configure Cloud9 to stop using AWS managed temporary credentials in order for kubectl to be able to access the cluster with the management role. Here’s how to do that and test kubectl:

1. Navigate to this page in the AWS console.
2. In Your environments, select Open IDE for the newly created environment (possibly named eks-ide).
3. Once opened, navigate at the top to AWS Cloud9 -> Preferences.
4. Expand AWS SETTINGS, and under Credentials, disable AWS managed temporary credentials by selecting the toggle button. Then, close the Preferences tab.
5. In a terminal in Cloud9, enter aws configure. Then, answer the questions by leaving them set to None and pressing enter, except for Default region name. Set the Default region name to the current region that you created everything in. The output should look similar to:

AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: us-east-1
Default output format [None]:

6. Test the environment

kubectl get nodes

If everything is working properly, you should see two nodes appear in the output similar to:

NAME                           STATUS ROLES  AGE   VERSION
ip-192-168-102-69.ec2.internal Ready  <none> 4h50m v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal   Ready  <none> 4h50m v1.17.11-ekscfdc40

You can use kubectl from this IDE to control the cluster. When you close the IDE browser window, the Cloud9 environment will automatically shutdown after 30 minutes and remain offline until the next time you reopen it from the AWS console. So, it’s a cheap way to have a kubectl terminal ready when needed.

Delete the old Cloud9 environment

If you have been following along using Cloud9 the whole time, then you should have two Cloud9 environments running at this point (one that was used to initially create everything from code in GitHub, and one that is now ready to edit the CodeCommit repo and control the cluster with kubectl). It’s now a good time to delete the old Cloud9 environment.

Navigate to this page in the AWS console.
In Your environments, select the radio button for the old environment (you named it when creating it) and select Delete.
In the popup, enter the word Delete and select Delete.

Now you should be down to having just one AWS Cloud9 environment that was created when you deployed the ide stack.

Trigger the pipeline to change the infrastructure

Now that we have a cluster up and running that’s defined in code stored in a AWS CodeCommit repo, it’s time to make some changes:

We’ll commit and push the changes, which will trigger the pipeline to update the infrastructure.
We’ll go ahead and make one change to the cluster nodegroup and another change to the actual CI/CD build process, so that both the eks-cluster stack as well as the eks-ci-cd stack get changed.

1. In the code that was checked out from AWS CodeCommit, open up res/compute/eks/nodegroup/props.yaml. At the bottom of the file, try changing minSize from 1 to 4, desiredSize from 2 to 6, and maxSize from 3 to 6 as seen in the following screenshot. Then, save the file and close it. The res (resource) directory is your well organized collection of resource properties files.

AWS Cloud9 screenshot

2. Next, open up res/developer-tools/codebuild/project/props.yaml and find where it contains computeType: ‘BUILD_GENERAL1_SMALL’. Try changing BUILD_GENERAL1_SMALL to BUILD_GENERAL1_MEDIUM. Then, save the file and close it.

3. Commit and push the changes in a terminal.

cd eks-ci-cd
git add .
git commit -m "Increase nodegroup scaling config sizes and use larger build
environment"
git push

4. Visit https://console.aws.amazon.com/codesuite/codepipeline/pipelines in the AWS console and you should see your pipeline in progress.

5. Wait for the Review stage to become Pending.

a. Following the Approve action box, click the blue link to the left of “Fetch: …” to see the infrastructure code changes that triggered the pipeline. You should see the two code changes you committed above.

3. After reviewing the changes, go back and select Review to open an approval popup.

4. In the approval popup, enter a comment and select Approve.

5. Wait for the pipeline to finish the Deploy stage as it executes the two change sets. You can refresh the page until you see it has finished. It should take a few minutes.

6. To see that the CodeBuild change has been made, scroll up to the Build stage of the pipeline and click on the AWS CodeBuild link as shown in the following screenshot.

AWS Codebuild screenshot

7. Next, select the Build details tab, and you should determine that your Compute size has been upgraded to 7 GB memory, 4 vCPUs as shown in the following screenshot.

project config

8. By this time, the cluster nodegroup sizes are probably updated. You can confirm with kubectl in a terminal.

# Get nodes
kubectl get nodes

If everything is ready, you should see six (desired size) nodes appear in the output similar to:

NAME                            STATUS   ROLES     AGE     VERSION
ip-192-168-102-69.ec2.internal  Ready    <none>    5h42m   v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal    Ready    <none>    5h42m   v1.17.11-ekscfdc40
ip-192-168-43-7.ec2.internal    Ready    <none>    10m     v1.17.11-ekscfdc40
ip-192-168-27-14.ec2.internal   Ready    <none>    10m     v1.17.11-ekscfdc40
ip-192-168-36-56.ec2.internal   Ready    <none>    10m     v1.17.11-ekscfdc40
ip-192-168-37-27.ec2.internal   Ready    <none>    10m     v1.17.11-ekscfdc40

Cluster is now manageable by code

You now have a cluster than can be maintained by simply making changes to code! The only resources not being managed by CI/CD in this demo, are the management role, and the optional AWS Cloud9 IDE. You can log into the AWS console and edit the role, adding other Trust relationships in the future, so others can assume the role and access the cluster.

Clean up

Do not try to delete all of the stacks at once! Wait for the stack(s) in a step to finish deleting before moving onto the next step.

1. Navigate to this page in the AWS console.
2. Delete the two IDE stacks first (the ide stack spawned another stack).
3. Delete the ci-cd stack.
4. Delete the cluster stack (this one takes a long time).
5. Delete the role stack.

Additional resources

Cloud Resource Property Manager (CRPM) is an open source project maintained by SHI, hosted on GitHub, and available through npm.

Conclusion

In this blog, we demonstrated how you can spin up an Amazon EKS cluster managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Making updates to this cluster is easy as modifying the property files and updating the AWS CodePipline. Using CRPM can improve productivity and reliability because it eliminates manual configurations steps.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

Introducing Spot Blueprints, a template generator for frameworks like Kubernetes and Apache Spark

2020-12-12 Chad Schmutzer

Post Syndicated from Chad Schmutzer original https://aws.amazon.com/blogs/compute/introducing-spot-blueprints-a-template-generator-for-frameworks-like-kubernetes-and-apache-spark/

This post is authored by Deepthi Chelupati, Senior Product Manager for Amazon EC2 Spot Instances, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2

Customers have been using EC2 Spot Instances to save money and scale workloads to new levels for over a decade. Launched in late 2009, Spot Instances are spare Amazon EC2 compute capacity in the AWS Cloud available for steep discounts off On-Demand Instance prices. One thing customers love about Spot Instances is their integration across many services on AWS, the AWS Partner Network, and open source software. These integrations unlock the ability to take advantage of the deep savings and scale Spot Instances provide for interruptible workloads. Some of the most popular services used with Spot Instances include Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (EKS). When we talk with customers who are early in their cost optimization journey about the advantages of using Spot Instances with their favorite workloads, they typically can’t wait to get started. They often tell us that while they’d love to start using Spot Instances right away, flipping between documentation, blog posts, and the AWS Management Console is time consuming and eats up precious development cycles. They want to know the fastest way to start saving with Spot Instances, while feeling confident they’ve applied best practices. Customers have overwhelmingly told us that the best way for them to get started quickly is to see complete workload configuration examples in infrastructure as code templates for AWS CloudFormation and Hashicorp Terraform. To address this feedback, we launched Spot Blueprints.

Spot Blueprints overview

Today we are excited to tell you about Spot Blueprints, an infrastructure code template generator that lives right in the EC2 Spot Console. Built based on customer feedback, Spot Blueprints guides you through a few short steps. These steps are designed to gather your workload requirements while explaining and configuring Spot best practices along the way. Unlike traditional wizards, Spot Blueprints generates custom infrastructure as code in real time within each step so you can easily understand the details of the configuration. Spot Blueprints makes configuring EC2 instance type agnostic workloads easy by letting you express compute capacity requirements as vCPU and memory. Then the wizard automatically expands those requirements to a flexible list of EC2 instance types available in the AWS Region you are operating. Spot Blueprints also takes high-level Spot best practices like Availability Zone flexibility, and applies them to your workload compute requirements. For example, automatically including all of the required resources and dependencies for creating a Virtual Private Cloud (VPC) with all Availability Zones configured for use. Additional workload-specific Spot best practices are also configured, such as graceful interruption handling for load balancer connection draining, automatic job retries, or container rescheduling.

Spot Blueprints makes keeping up with the latest and greatest Spot features (like the capacity-optimized allocation strategy and Capacity Rebalancing for EC2 Auto Scaling) easy. Spot Blueprints is continually updated to support new features as they become available. Today, Spot Blueprints supports generating infrastructure as code workload templates for some of the most popular services used with Spot Instances: Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, and Amazon EKS. You can tell us what blueprint you’d like to see next right in Spot Blueprints. We are excited to hear from you!

What are Spot best practices?

We’ve mentioned Spot best practices a few times so far in this blog. Let’s quickly review the best practices and how they relate to Spot Blueprints. When using Spot Instances, it is important to understand a couple of points:

Spot Instances are interruptible and must be returned when EC2 needs the capacity back
The location and amount of spare capacity available at any given moment is dynamic and continually changes in real time

For these reasons, it is important to follow best practices when using Spot Instances in your workloads. We like to call these “Spot best practices.” These best practices can be summarized as follows:

Only run workloads that are truly interruption tolerant, meaning interruptible at both the individual instance level and overall application level
Spot workloads should be flexible, meaning they can be shifted in real time to where the spare capacity currently is, or otherwise be paused until spare capacity is available again
In practice, being flexible means qualifying a workload to run on multiple EC2 instance types (think big: multiple families, sizes, and generations), and in multiple Availability Zones, at any given time

Over the last few years, we’ve focused on making it easier to follow these best practices by adding features such as the following:

EC2 Instance rebalance recommendation for Spot Instances: a signal that is sent when a Spot Instance is at elevated risk of interruption
Mixed instances policy: an Auto Scaling group configuration to enhance availability by deploying across multiple instance types running in multiple Availability Zones
Capacity Rebalancing for Amazon EC2 Auto Scaling: for proactively managing the Amazon EC2 Spot Instance lifecycle in an Auto Scaling group
Capacity-optimized allocation strategy: designed to help find the most optimal spare capacity
Amazon EC2 Instance Selector: a CLI tool and go library that recommends instance types based on resource criteria like vCPUs and memory

As mentioned prior, in addition to the Spot best practices we reviewed, there is an additional set of Spot best practices native to each workload that has integrated support for Spot Instances. For example, the ability to implement graceful interruption handling for:

Load balancer connection draining
Automatic job retries
Container draining and rescheduling

Spot Blueprints are designed to quickly explain and generate templates with Spot best practices for each specific workload. The custom-generated workload templates can be downloaded in either AWS CloudFormation or HashiCorp Terraform format, allowing for further customization and learning before being deployed in your environment.

Next, let’s walk through configuring and deploying an example Spot blueprint.

Example tutorial

In this example tutorial, we use Spot Blueprints to configure an Apache Spark environment running on Amazon EMR, deploy the template as a CloudFormation stack, run a sample job, and then delete the CloudFormation stack.

First, we navigate to the EC2 Spot console and click on “Spot Blueprints”:

In the next screen, we select the EMR blueprint, and then click “Configure blueprint”:

A couple of notes here:

If you are in a hurry, you can grab a preconfigured template to get started quickly. The preconfigured template has default best practices in place and can be further customized as needed.
If you have a suggestion for a new blueprint you’d like to see, you can click on “Don’t see a blueprint you need?” to give us feedback. We’d love to hear from you!

In Step 1, we give the blueprint a name and configure permissions. We have the blueprint create new IAM roles to allow Amazon EMR and Amazon EC2 compute resources to make calls to the AWS APIs on your behalf:

We see a summary of resources that will be created, along with a code snippet preview. Unlike traditional wizards, you can see the code generated in every step!

In Step 2, the network is configured. We create a new VPC with public subnets in all Availability Zones in the Region. This is a Spot best practice because it increases the number of Spot capacity pools, which increases the possibility for EC2 to find and provision the required capacity:

We see a summary of resources that will be created, along with a code snippet preview. Here is the CloudFormation code for our template, where you can see the VPC creation, including all dependencies such as the internet gateway, route table, and subnets:

attachGateway:
  DependsOn:
    - vpc
    - internetGateway
  Properties:
    InternetGatewayId:
      Ref: internetGateway
    VpcId:
      Ref: vpc
  Type: AWS::EC2::VPCGatewayAttachment
internetGateway:
  DependsOn:
    - vpc
  Type: AWS::EC2::InternetGateway
publicRoute:
  DependsOn:
    - publicRouteTable
    - internetGateway
    - attachGateway
  Properties:
    DestinationCidrBlock: 0.0.0.0/0
    GatewayId:
      Ref: internetGateway
    RouteTableId:
      Ref: publicRouteTable
  Type: AWS::EC2::Route
publicRouteTable:
  DependsOn:
    - vpc
    - attachGateway
  Properties:
    Tags:
      - Key: Name
        Value: Public Route Table
    VpcId:
      Ref: vpc
  Type: AWS::EC2::RouteTable
vpc:
  Properties:
    CidrBlock:
      Fn::FindInMap:
        - CidrMappings
        - vpc
        - CIDR
    EnableDnsHostnames: true
    EnableDnsSupport: true
    Tags:
      - Key: Name
        Value:
          Ref: AWS::StackName
  Type: AWS::EC2::VPC
publicSubnet1RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet1
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet1
publicSubnet1:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1a
    CidrBlock: 10.0.0.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ1)
publicSubnet2RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet2
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet2
publicSubnet2:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1b
    CidrBlock: 10.0.1.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ2)
publicSubnet3RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet3
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet3
publicSubnet3:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1c
    CidrBlock: 10.0.2.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ3)
publicSubnet4RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet4
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet4
publicSubnet4:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1d
    CidrBlock: 10.0.3.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ4)
publicSubnet5RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet5
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet5
publicSubnet5:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1e
    CidrBlock: 10.0.4.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ5)
publicSubnet6RouteTableAssociation:
  DependsOn:
    - publicRouteTable
    - publicSubnet6
    - attachGateway
  Type: AWS::EC2::SubnetRouteTableAssociation
  Properties:
    RouteTableId:
      Ref: publicRouteTable
    SubnetId:
      Ref: publicSubnet6
publicSubnet6:
  DependsOn:
    - attachGateway
  Type: AWS::EC2::Subnet
  Properties:
    VpcId:
      Ref: vpc
    AvailabilityZone: us-east-1f
    CidrBlock: 10.0.5.0/24
    MapPublicIpOnLaunch: true
    Tags:
      - Key: Name
        Value:
          Fn::Sub: ${EnvironmentName} Public Subnet (AZ6)

In Step 3, we configure the compute environment. Spot Blueprints makes this easy by allowing us to simply define our capacity units and required compute capacity for each type of node in our EMR cluster. In this walkthrough we will use vCPUs. We select 4 vCPUs for the core node capacity, and 12 vCPUs for the task node capacity. Based on these configurations, Spot Blueprints will apply best practices to the EMR cluster settings. These include using On-Demand Instances for core nodes since interruptions to core nodes can cause instability in the EMR cluster, and using Spot Instances for task nodes because the EMR cluster is typically more tolerant to task node interruptions. Finally, task nodes are provisioned using the capacity-optimized allocation strategy because it helps find the most optimal Spot capacity:

You’ll notice there is no need to spend time thinking about which instance types to select – Spot Blueprints takes our application’s minimum vCPUs per instance and minimum vCPU to memory ratio requirements and automatically selects the optimal instance types. This instance type selection process applies Spot best practices by a) using instance types across different families, generations, and sizes, and b) using the maximum number of instance types possible for core (5) and task (15) nodes:

Here is the CloudFormation code for our template. You can see the EMR cluster creation, the applications being installed (Spark, Hadoop, and Ganglia), the flexible list of instance types and Availability Zones, and the capacity-optimized allocation strategy enabled (along with all dependencies):

emrSparkInstanceFleetCluster:
  DependsOn:
    - vpc
    - publicRoute
    - publicSubnet1RouteTableAssociation
    - publicSubnet2RouteTableAssociation
    - publicSubnet3RouteTableAssociation
    - publicSubnet4RouteTableAssociation
    - publicSubnet5RouteTableAssociation
    - publicSubnet6RouteTableAssociation
    - spotBlueprintsEmrRole
  Properties:
    Applications:
      - Name: Spark
      - Name: Hadoop
      - Name: Ganglia
    Instances:
      CoreInstanceFleet:
        InstanceTypeConfigs:
          - InstanceType: c5.xlarge
            WeightedCapacity: 4
          - InstanceType: c4.xlarge
            WeightedCapacity: 4
          - InstanceType: c3.xlarge
            WeightedCapacity: 4
          - InstanceType: m5.xlarge
            WeightedCapacity: 4
          - InstanceType: m3.xlarge
            WeightedCapacity: 4
        Name:
          Ref: AWS::StackName
        TargetOnDemandCapacity: "4"
        LaunchSpecifications:
          OnDemandSpecification:
            AllocationStrategy: lowest-price
      Ec2SubnetIds:
        - Ref: publicSubnet1
        - Ref: publicSubnet2
        - Ref: publicSubnet3
        - Ref: publicSubnet4
        - Ref: publicSubnet5
        - Ref: publicSubnet6
      MasterInstanceFleet:
        InstanceTypeConfigs:
          - InstanceType: c5.xlarge
          - InstanceType: c4.xlarge
          - InstanceType: c3.xlarge
          - InstanceType: m5.xlarge
          - InstanceType: m3.xlarge
        Name:
          Ref: AWS::StackName
        TargetOnDemandCapacity: "1"
    JobFlowRole:
      Ref: spotBlueprintsEmrEc2InstanceProfile
    Name:
      Ref: AWS::StackName
    ReleaseLabel: emr-5.30.1
    ServiceRole:
      Ref: spotBlueprintsEmrRole
    Tags:
      - Key: Name
        Value:
          Ref: AWS::StackName
    VisibleToAllUsers: true
  Type: AWS::EMR::Cluster
emrSparkInstanceTaskFleet:
  DependsOn:
    - emrSparkInstanceFleetCluster
  Properties:
    ClusterId:
      Ref: emrSparkInstanceFleetCluster
    InstanceFleetType: TASK
    InstanceTypeConfigs:
      - InstanceType: c5.xlarge
        WeightedCapacity: 4
      - InstanceType: c4.xlarge
        WeightedCapacity: 4
      - InstanceType: c3.xlarge
        WeightedCapacity: 4
      - InstanceType: m5.xlarge
        WeightedCapacity: 4
      - InstanceType: m3.xlarge
        WeightedCapacity: 4
      - InstanceType: m4.xlarge
        WeightedCapacity: 4
      - InstanceType: r4.xlarge
        WeightedCapacity: 4
      - InstanceType: r5.xlarge
        WeightedCapacity: 4
      - InstanceType: r3.xlarge
        WeightedCapacity: 4
      - InstanceType: c4.2xlarge
        WeightedCapacity: 8
      - InstanceType: c3.2xlarge
        WeightedCapacity: 8
      - InstanceType: c5.2xlarge
        WeightedCapacity: 8
      - InstanceType: m5.2xlarge
        WeightedCapacity: 8
      - InstanceType: m4.2xlarge
        WeightedCapacity: 8
      - InstanceType: m3.2xlarge
        WeightedCapacity: 8
    LaunchSpecifications:
      SpotSpecification:
        TimeoutAction: TERMINATE_CLUSTER
        TimeoutDurationMinutes: 60
        AllocationStrategy: capacity-optimized
    Name: TaskFleet
    TargetSpotCapacity: "12"
  Type: AWS::EMR::InstanceFleetConfig

In Step 4, we have the option of enabling EMR managed scaling on the cluster. Enabling EMR managed scaling is a Spot best practice because this allows EMR to automatically increase or decrease the compute capacity in the cluster based on the workload, further optimizing performance and cost. EMR continuously evaluates cluster metrics to make scaling decisions that optimize the cluster for cost and speed. Spot Blueprints automatically configures minimum and maximum scaling values based on the compute requirements we defined in the previous step:

Here is the updated CloudFormation code for our template with managed scaling (ManagedScalingPolicy) enabled:

emrSparkInstanceFleetCluster:
  DependsOn:
    - vpc
    - publicRoute
    - publicSubnet1RouteTableAssociation
    - publicSubnet2RouteTableAssociation
    - publicSubnet3RouteTableAssociation
    - publicSubnet4RouteTableAssociation
    - publicSubnet5RouteTableAssociation
    - publicSubnet6RouteTableAssociation
    - spotBlueprintsEmrRole
  Properties:
    Applications:
      - Name: Spark
      - Name: Hadoop
      - Name: Ganglia
    Instances:
      CoreInstanceFleet:
        InstanceTypeConfigs:
          - InstanceType: c5.xlarge
            WeightedCapacity: 4
          - InstanceType: c4.xlarge
            WeightedCapacity: 4
          - InstanceType: c3.xlarge
            WeightedCapacity: 4
          - InstanceType: m5.xlarge
            WeightedCapacity: 4
          - InstanceType: m3.xlarge
            WeightedCapacity: 4
        Name:
          Ref: AWS::StackName
        TargetOnDemandCapacity: "4"
        LaunchSpecifications:
          OnDemandSpecification:
            AllocationStrategy: lowest-price
      Ec2SubnetIds:
        - Ref: publicSubnet1
        - Ref: publicSubnet2
        - Ref: publicSubnet3
        - Ref: publicSubnet4
        - Ref: publicSubnet5
        - Ref: publicSubnet6
      MasterInstanceFleet:
        InstanceTypeConfigs:
          - InstanceType: c5.xlarge
          - InstanceType: c4.xlarge
          - InstanceType: c3.xlarge
          - InstanceType: m5.xlarge
          - InstanceType: m3.xlarge
        Name:
          Ref: AWS::StackName
        TargetOnDemandCapacity: "1"
    JobFlowRole:
      Ref: spotBlueprintsEmrEc2InstanceProfile
    Name:
      Ref: AWS::StackName
    ReleaseLabel: emr-5.30.1
    ServiceRole:
      Ref: spotBlueprintsEmrRole
    Tags:
      - Key: Name
        Value:
          Ref: AWS::StackName
    VisibleToAllUsers: true
    ManagedScalingPolicy:
      ComputeLimits:
        MaximumCapacityUnits: "32"
        MinimumCapacityUnits: "16"
        MaximumOnDemandCapacityUnits: "4"
        MaximumCoreCapacityUnits: "4"
        UnitType: InstanceFleetUnits
  Type: AWS::EMR::Cluster

In Step 5, we can review and download the template code for further customization in either CloudFormation or Terraform format, review the instance configuration summary, and review a summary of the resources that would be created from the template. Spot Blueprints will also upload the CloudFormation template to an Amazon S3 bucket so we can deploy the template directly from the CloudFormation console or CLI. Let’s go ahead and click on “Deploy in CloudFormation,” copy the URL, and then click on “Deploy in CloudFormation” again:

Having copied the S3 URL, we go to the CloudFormation console to launch the CloudFormation stack:

We walk through the CloudFormation stack creation steps and the stack launches, creating all of the resources as configured in the blueprint. It takes roughly 15-20 minutes for our stack creation to complete:

Once the stack creation is complete, we navigate to the Amazon EMR console to view the EMR cluster configured with Spot best practices:

Next, let’s run a sample Spark application written in Python to calculate the value of pi on the cluster. We’ll do this by uploading the sample application code to an S3 bucket in our account and then adding a step to the cluster referencing the application location code with arguments:

The step runs and completes successfully:

The results of the calculation are sent to our S3 bucket as configured in the arguments:

{"tries":1600000,"hits":1256253,"pi":3.1406325}

Cleanup

Now that our job is done, we delete the CloudFormation stack in order to remove the AWS resources created. Please note that as a part of the EMR cluster creation, EMR creates some EC2 security groups that cannot be removed by CloudFormation since they were created by the EMR cluster and not by CloudFormation. As a result, the deletion of the CloudFormation stack will fail to delete the VPC on the first try. To solve this, we have the option of deleting the VPC manually by hand, or we can let the CloudFormation stack ignore the VPC and leave it (along with the security groups) in place for future use. We can then delete the CloudFormation stack a final time:

Conclusion

No matter if you are a first-time Spot user learning how to take advantage of the savings and scale offered by Spot Instances, or a veteran Spot user expanding your Spot usage into a new architecture, Spot Blueprints has you covered. We hope you enjoy using Spot Blueprints to quickly get you started generating example template frameworks for workloads like Kubernetes and Apache Spark with Spot best practices in place. Please tell us which blueprint you’d like to see next right in Spot Blueprints. We can’t wait to hear from you!

Field Notes: Ingest and Visualize Your Flat-file IoT Data with AWS IoT Services

2020-12-09 Paul Ramsey

Post Syndicated from Paul Ramsey original https://aws.amazon.com/blogs/architecture/field-notes-ingest-and-visualize-your-flat-file-iot-data-with-aws-iot-services/

Customers who maintain manufacturing facilities often find it challenging to ingest, centralize, and visualize IoT data that is emitted in flat-file format from their factory equipment. While modern IoT-enabled industrial devices can communicate over standard protocols like MQTT, there are still some legacy devices that generate useful data but are only capable of writing it locally to a flat file. This results in siloed data that is either analyzed in a vacuum without the broader context, or it is not available to business users to be analyzed at all.

AWS provides a suite of IoT and Edge services that can be used to solve this problem. In this blog, I walk you through one method of leveraging these services to ingest hard-to-reach data into the AWS cloud and extract business value from it.

Overview of solution

This solution provides a working example of an edge device running AWS IoT Greengrass with an AWS Lambda function that watches a Samba file share for new .csv files (presumably containing device or assembly line data). When it finds a new file, it will transform it to JSON format and write it to AWS IoT Core. The data is then sent to AWS IoT Analytics for processing and storage, and Amazon QuickSight is used to visualize and gain insights from the data.

Samba file share solution diagram

Since we don’t have an actual on-premises environment to use for this walkthrough, we’ll simulate pieces of it:

In place of the legacy factory equipment, an EC2 instance running Windows Server 2019 will generate data in .csv format and write it to the Samba file share.
- We’re using a Windows Server for this function to demonstrate that the solution is platform-agnostic. As long as the flat file is written to a file share, AWS IoT Greengrass can ingest it.
An EC2 instance running Amazon Linux will act as the edge device and will host AWS IoT Greengrass Core and the Samba share.
- In the real world, these could be two separate devices, and the device running AWS IoT Greengrass could be as small as a Raspberry Pi.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS Account
Access to provision and delete AWS resources
Basic knowledge of Windows and Linux server administration
If you’re unfamiliar with AWS IoT Greengrass concepts like Subscriptions and Cores, review the AWS IoT Greengrass documentation for a detailed description.

Walkthrough

First, we’ll show you the steps to launch the AWS IoT Greengrass resources using AWS CloudFormation. The AWS CloudFormation template is derived from the template provided in this blog post. Review the post for a detailed description of the template and its various options.

Create a key pair. This will be used to access the EC2 instances created by the CloudFormation template in the next step.
Launch a new AWS CloudFormation stack in the N. Virginia (us-east-1) Region using iot-cfn.yml, which represents the simulated environment described in the preceding bullet.
- Parameters:
  - Name the stack IoTGreengrass.
  - For EC2KeyPairName, select the EC2 key pair you just created from the drop-down menu.
  - For SecurityAccessCIDR, use your public IP with a /32 CIDR (i.e. 1.1.1.1/32).
  - You can also accept the default of 0.0.0.0/0 if you can have SSH and RDP open to all sources on the EC2 instances in this demo environment.
  - Accept the defaults for the remaining parameters.

View the Resources tab after stack creation completes. The stack creates the following resources:
- A VPC with two subnets, two route tables with routes, an internet gateway, and a security group.
- Two EC2 instances, one running Amazon Linux and the other running Windows Server 2019.
- An IAM role, policy, and instance profile for the Amazon Linux instance.
- A Lambda function called GGSampleFunction, which we’ll update with code to parse our flat-files with AWS IoT Greengrass in a later step.
- An AWS IoT Greengrass Group, Subscription, and Core.
- Other supporting objects and custom resource types.
View the Outputs tab and copy the IPs somewhere easy to retrieve. You’ll need them for multiple provisioning steps below.

3. Review the AWS IoT Greengrass resources created on your behalf by CloudFormation:

- Search for IoT Greengrass in the Services drop-down menu and select it.
- Click Manage your Groups.
- Click file_ingestion.
- Navigate through the Subscriptions, Cores, and other tabs to review the configurations.

Leveraging a device running AWS IoT Greengrass at the edge, we can now interact with flat-file data that was previously difficult to collect, centralize, aggregate, and analyze.

Set up the Samba file share

Now, we set up the Samba file share where we will write our flat-file data. In our demo environment, we’re creating the file share on the same server that runs the Greengrass software. In the real world, this file share could be hosted elsewhere as long as the device that runs Greengrass can access it via the network.

Follow the instructions in setup_file_share.md to set up the Samba file share on the AWS IoT Greengrass EC2 instance.
Keep your terminal window open. You’ll need it again for a later step.

Configure Lambda Function for AWS IoT Greengrass

AWS IoT Greengrass provides a Lambda runtime environment for user-defined code that you author in AWS Lambda. Lambda functions that are deployed to an AWS IoT Greengrass Core run in the Core’s local Lambda runtime. In this example, we update the Lambda function created by CloudFormation with code that watches for new files on the Samba share, parses them, and writes the data to an MQTT topic.

Update the Lambda function:

- Search for Lambda in the Services drop-down menu and select it.
- Select the file_ingestion_lambda function.
- From the Function code pane, click Actions then Upload a .zip file.
- Upload the provided zip file containing the Lambda code.
- Select Actions > Publish new version > Publish.

2. Update the Lambda Alias to point to the new version.

- Select the Version: X drop-down (“X” being the latest version number).
- Choose the Aliases tab and select gg_file_ingestion.
- Scroll down to Alias configuration and select Edit.
- Choose the newest version number and click Save.
- Do NOT use $LATEST as it is not supported by AWS IoT Greengrass.

3. Associate the Lambda function with AWS IoT Greengrass.

- Search for IoT Greengrass in the Services drop-down menu and select it.
- Select Groups and choose file_ingestion.
- Select Lambdas > Add Lambda.
- Click Use existing Lambda.
- Select file_ingestion_lambda > Next.
- Select Alias: gg_file_ingestion > Finish.
- You should now see your Lambda associated with the AWS IoT Greengrass group.
- Still on the Lambda function tab, click the ellipsis and choose Edit configuration.
- Change the following Lambda settings then click Update:
  - Set Containerization to No container (always).
  - Set Timeout to 25 seconds (or longer if you have large files to process).
  - Set Lambda lifecycle to Make this function long-lived and keep it running indefinitely.

Deploy AWS IoT Greengrass Group

Restart the AWS IoT Greengrass daemon:

- A daemon restart is required after changing containerization settings. Run the following commands on the Greengrass instance to restart the AWS IoT Greengrass daemon:

 cd /greengrass/ggc/core/

 sudo ./greengrassd stop

 sudo ./greengrassd start

2. Deploy the AWS IoT Greengrass Group to the Core device.

- Return to the file_ingestion AWS IoT Greengrass Group in the console.
- Select Actions > Deploy.
- Select Automatic detection.
- After a few minutes, you should see a Status of Successfully completed. If the deployment fails, check the logs, fix the issues, and deploy again.

Generate test data

You can now generate test data that is ingested by AWS IoT Greengrass, written to AWS IoT Core, and then sent to AWS IoT Analytics and visualized by Amazon QuickSight.

Follow the instructions in generate_test_data.md to generate the test data.
Verify that the data is being written to AWS IoT Core following these instructions (Use iot/data for the MQTT Subscription Topic instead of hello/world).

screenshot

Setup AWS IoT Analytics

Now that our data is in IoT Cloud, it only takes a few clicks to configure AWS IoT Analytics to process, store, and analyze our data.

Search for IoT Analytics in the Services drop-down menu and select it.
Set Resources prefix to file_ingestion and Topic to iot/data. Click Quick Create.
Populate the data set by selecting Data sets > file_ingestion_dataset >Actions > Run now. If you don’t get data on the first run, you may need to wait a couple of minutes and run it again.

Visualize the Data from AWS IoT Analytics in Amazon QuickSight

We can now use Amazon QuickSight to visualize the IoT data in our AWS IoT Analytics data set.

Search for QuickSight in the Services drop-down menu and select it.
If your account is not signed up for QuickSight yet, follow these instructions to sign up (use Standard Edition for this demo)
Build a new report:

- Click New analysis > New dataset.
- Select AWS IoT Analytics.
- Set Data source name to iot-file-ingestion and select file_ingestion_dataset. Click Create data source.
- Click Visualize. Wait a moment while your rows are imported into SPICE.
- You can now drag and drop data fields onto field wells. Review the QuickSight documentation for detailed instructions on creating visuals.
- Following is an example of a QuickSight dashboard you can build using the demo data we generated in this walkthrough.

Cleaning up

Be sure to clean up the objects you created to avoid ongoing charges to your account.

In Amazon QuickSight, Cancel your subscription.
In AWS IoT Analytics, delete the datastore, channel, pipeline, data set, role, and topic rule you created.
In CloudFormation, delete the IoTGreengrass stack.
In Amazon CloudWatch, delete the log files associated with this solution.

Conclusion

Gaining valuable insights from device data that was once out of reach is now possible thanks to AWS’s suite of IoT services. In this walkthrough, we collected and transformed flat-file data at the edge and sent it to IoT Cloud using AWS IoT Greengrass. We then used AWS IoT Analytics to process, store, and analyze that data, and we built an intelligent dashboard to visualize and gain insights from the data using Amazon QuickSight. You can use this data to discover operational anomalies, enable better compliance reporting, monitor product quality, and many other use cases.

For more information on AWS IoT services, check out the overviews, use cases, and case studies on our product page. If you’re new to IoT concepts, I’d highly encourage you to take our free Internet of Things Foundation Series training.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Migrating File Servers to Amazon FSx and Integrating with AWS Managed Microsoft AD

2020-12-02 Kyaw Soe Hlaing

Post Syndicated from Kyaw Soe Hlaing original https://aws.amazon.com/blogs/architecture/field-notes-migrating-file-servers-to-amazon-fsx-and-integrating-with-aws-managed-microsoft-ad/

Amazon FSx provides AWS customers with the native compatibility of third-party file systems with feature sets for workloads such as Windows-based storage, high performance computing (HPC), machine learning, and electronic design automation (EDA). Amazon FSx automates the time-consuming administration tasks such as hardware provisioning, software configuration, patching, and backups. Since Amazon FSx integrates the file systems with cloud-native AWS services, this makes them even more useful for a broader set of workloads.

Amazon FSx for Windows File Server provides fully managed file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. Built on Windows Server, Amazon FSx delivers a wide range of administrative features such as data deduplication, end-user file restore, and Microsoft Active Directory (AD) integration.

In this post, I explain how to migrate files and file shares from on-premises servers to Amazon FSx with AWS DataSync in a domain migration scenario. Customers are migrating their file servers to Amazon FSx as part of their migration from an on-premises Active Directory to AWS managed Active Directory. Their plan is to replace their file servers with Amazon FSx during Active Directory migration to AWS Managed AD.

Arhictecture diagram

Prerequisites

Before you begin, perform the steps outlined in this blog to migrate the user accounts and groups to the managed Active Directory.

Walkthrough

There are numerous ways to perform the Active Directory migration. Generally, the following five steps are taken:

Establish two-way forest trust between on-premises AD and AWS Managed AD
Migrate user accounts and group with the ADMT tool
Duplicate Access Control List (ACL) permissions in the file server
Migrate files and folders with existing ACL to Amazon FSx using AWS DataSync
Migrate User Computers

In this post, I focus on duplication of ACL permissions and migration of files and folders using Amazon FSx and AWS DataSync. In order to perform duplication of ACL permission in file servers, I use SubInACL tool, which is available from the Microsoft website.

Duplication of the ACL is required because users want to seamlessly access file shares once their computers are migrated to AWS Managed AD. Thus all migrated files and folders have permission with Managed AD users and group objects. For enterprises, the migration of user computers does not happen overnight. Normally, migration takes place in batches or phases. With ACL duplication, both migrated and non-migrated users can access their respective file shares seamlessly during and after migration.

Duplication of Access Control List (ACL)

Before we proceed with ACL duplication, we must ensure that the migration of user accounts and groups was completed. In my demo environment, I have already migrated on-premises users to the Managed Active Directory. In the meantime, we presume that we are migrating identical users to the Managed Active Directory. There might be a scenario where migrated user accounts have different naming such as samAccount name. In this case, we will need to handle this during ACL duplication with SubInACL. For more information about syntax, refer to the SubInACL documentation.

As indicated in following screenshots, I have two users created in the on-premises Active Directory (onprem.local) and those two identical users have been created in the Managed Active Directory too (corp.example.com).

Screenshot of on-premises Active Directory (onprem.local)

Screenshot of Active Directory

In the following screenshot, I have a shared folder called “HR_Documents” in an on-premises file server. Different users have different access rights to that folder. For example, John Smith has “Full Control” but Onprem User1 only have “Read & Execute”. Our plan is to add same access right to identical users from the Managed Active Directory, here corp.example.com, so that once John Smith is migrated to managed AD, he can access to shared folders in Amazon FSx using his Managed Active Directory credential.

Let’s verify the existing permission in the “HR_Documents” folder. Two users from onprem.local are found with different access rights.

Screenshot of HR docs

Now it’s time to install SubInACL.

We install it in our on-premises file server. After the SubInACL tool is installed, it can be found under “C:\Program Files (x86)\Windows Resource Kits\Tools” folder by default. To perform an ACL duplication, run command prompt as administrator and run the following command;

Subinacl /outputlog=C:\temp\HR_document_log.txt /errorlog=C:\temp\HR_document_Err_log.txt /Subdirectories C:\HR_Documents\* /migratetodomain=onprem=corp

There are several parameters that I am using in the command:

Outputlog = where log file is saved
ErrorLog = where error log file is saved
Subdirectories = to apply permissions including subfolders and files
Migratetodomain= NetBIOS name of source domain and destination domain

Screenshot windows resources kits

screenshot of windows resources kit

If the command is run successfully, you should able to see a summary of the results. If there is no error or failure, you can verify whether ACL permissions are duplicated as expected by looking at the folders and files. In our case, we can see that there is one ACL entry of identical account from corp.example.com is added.

Note: you will always see two ACL entries, one from onprem.local and another one from corp.example.com domain in all the files and folders that you used during migration. Permissions are now applied to both at the folder and file level.

screenshot of payroll properties

screenshot of doc 1 properties

Migrate files and folders using AWS DataSync

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services such as Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for Windows File Server. Manual tasks related to data transfers can slow down migrations and burden IT operations. AWS DataSync reduces or automatically handles many of these tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization.

Create an AWS DataSync agent

An AWS DataSync agent deploys as a virtual machine in an on-premises data center. An AWS DataSync agent can be run on ESXi, KVM, and Microsoft Hyper-V hypervisors. The AWS DataSync agent is used to access on-premises storage systems and transfer data to the AWS DataSync managed service running on AWS. AWS DataSync always performs incremental copies by comparing from a source to a destination and only copying files that are new or have changed.

AWS DataSync supports the following SMB (Server Message Block) locations to migrate data from:

Network File System (NFS)
Server Message Block (SMB)

In this blog, I use SMB as the source location, since I am migrating from an on-premises Windows File server. AWS DataSync supports SMB 2.1 and SMB 3.0 protocols.

AWS DataSync saves metadata and special files when copying to and from file systems. When files are copied from a SMB file share and Amazon FSx for Windows File Server, AWS DataSync copies the following metadata:

File timestamps: access time, modification time, and creation time
File owner and file group security identifiers (SIDs)
Standard file attributes
NTFS discretionary access lists (DACLs): access control entries (ACEs) that determine whether to grant access to an object

Data Synchronization with AWS DataSync

When a task starts, AWS DataSyc goes through different stages. It begins with examining file system follows by data transfer to destination. Once data transfer is completed, it performs verification for consistency between source and destination file systems. You can review detailed information about the data synchronization stages.

DataSync Endpoints

You can activate your agent by using one of the following endpoint types:

Public endpoints – If you use public endpoints, all communication from your DataSync agent to AWS occurs over the public internet.
Federal Information Processing Standard (FIPS) endpoints – If you need to use FIPS 140-2 validated cryptographic modules when accessing the AWS GovCloud (US-East) or AWS GovCloud (US-West) Region, use this endpoint to activate your agent. You use the AWS CLI or API to access this endpoint.
Virtual private cloud (VPC) endpoints – If you use a VPC endpoint, all communication from AWS DataSync to AWS services occurs through the VPC endpoint in your VPC in AWS. This approach provides a private connection between your self-managed data center, your VPC, and AWS services. It increases the security of your data as it is copied over the network.

In my demo environment, I have implemented AWS DataSync as indicated in following diagram. The DataSync Agent can be run either on VMware or Hyper-V and KVM platform in a customer on-premises data center.

Datasync Agent Arhictecture

Once the AWS DataSync Agent setup is completed and the task that defined the source file servers and destination Amazon FSx server is added, you can verify agent status in the AWS Management Console.

Console screenshot

Select Task and then choose Start to start copying files and folders. This will start the replication task (or you can wait until the task runs hourly). You can check the History tab to see a history of the replication task executions.

Console screenshot

Congratulations! You have replicated the contents of an on-premises file server to Amazon FSx. Let’s look and make sure the ACL permissions are still intact in their destination after migration. As shown in the following screenshots, the ACL permissions in the Payroll folder still remains as is, both on-premises users and Managed AD users are inside. Once the user’s computers are migrated to the Managed AD, they can access the same file share in Amazon FSx server using Managed AD credentials.

Payroll properties screenshot

Cleaning up

If you are performing testing by following the preceding steps in your own account, delete the following resources, to avoid incurring future charges:

EC2 instances
Managed AD
Amazon FSx file system
AWS Datasync

Conclusion

You have learned how to duplicate ACL permissions and shared folder permissions during migration of file servers to Amazon FSx. This process provides a seamless migration experience for users. Once the user’s computers are migrated to the Managed AD, they only need to remap shared folders from Amazon FSx. This can be automated by pushing down shared folders mapping with a Group Policy. If new files or folders are created in the source file server, AWS Datasync will synchronize to Amazon FSx server.

For customers who are planning to do a domain migration from on-premises to AWS Managed Microsoft AD, migration of resources like file servers are common. Handling ACL permissions plays a vital role in providing a seamless migration experience. The duplication of ACL can be an option, otherwise, the ADMT tool can be used to migrate SID information from the source Domain to destination Domain. To migrate SID history, SID filtering needs to be disabled during migration.

If you want to provide feedback about this post, you are welcome to submit in the comments section below.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Real-Time In-Stream Inference with AWS Kinesis, SageMaker & Apache Flink

2020-11-27 Shawn Sachdev

Post Syndicated from Shawn Sachdev original https://aws.amazon.com/blogs/architecture/realtime-in-stream-inference-kinesis-sagemaker-flink/

As businesses race to digitally transform, the challenge is to cope with the amount of data, and the value of that data diminishes over time. The challenge is to analyze, learn, and infer from real-time data to predict future states, as well as to detect anomalies and get accurate results. In this blog post, we’ll explain the architecture for a solution that can achieve real-time inference on streaming data. We’ll also cover the integration of Amazon Kinesis Data Analytics (KDA) with Apache Flink to asynchronously invoke any underlying services (or databases).

Managed real-time in-stream data inference is quite a mouthful; let’s break it up:

In-stream data refers to the capability of processing a data stream that collects, processes, and analyzes data.
Real-time inference refers to the ability to use data from the feed to project future state for the underlying data.

Consider a streaming application that captures credit card transactions along with the other parameters (such as source IP to capture the geographic details of the transaction as well as the amount). This data can then be used to be used to infer fraudulent transactions instantaneously. Compare that to a traditional batch-oriented approach that identifies fraudulent transactions at the end of every business day and generates a report when it’s too late, after bad actors have already committed fraud.

Architecture overview

In this post, we discuss how you can use Amazon Kinesis Data Analytics for Apache Flink (KDA), Amazon SageMaker, Apache Flink, and Amazon API Gateway to address the challenges such as real-time fraud detection on a stream of credit card transaction data. We explore how to build a managed, reliable, scalable, and highly available streaming architecture based on managed services that substantially reduce the operational overhead compared to a self-managed environment. Our particular focus is on how to prepare and run Flink applications with KDA for Apache Flink applications.

The following diagram illustrates this architecture:

Run Apache Flink applications with KDA for Apache Flink applications

In above architecture, data is ingested in AWS Kinesis Data Streams (KDS) using Amazon Kinesis Producer Library (KPL), and you can use any ingestion patterns supported by KDS. KDS then streams the data to an Apache Flink-based KDA application. KDA manages the required infrastructure for Flink, scales the application in response to changing traffic patterns, and automatically recovers from underlying failures. The Flink application is configured to call an API Gateway endpoint using Asynchronous I/O. Residing behind the API Gateway is an AWS SageMaker endpoint, but any endpoints can be used based on your data enrichment needs. Flink distributes the data across one or more stream partitions, and user-defined operators can transform the data stream.

Let’s talk about some of the key pieces of this architecture.

What is Apache Flink?

Apache Flink is an open source distributed processing framework that is tailored to stateful computations over unbounded and bounded datasets. The architecture uses KDA with Apache Flink to run in-stream analytics and uses Asynchronous I/O operator to interact with external systems.

KDA and Apache Flink

KDA for Apache Flink is a fully managed AWS service that enables you to use an Apache Flink application to process streaming data. With KDA for Apache Flink, you can use Java or Scala to process and analyze streaming data. The service enables you to author and run code against streaming sources. KDA provides the underlying infrastructure for your Flink applications. It handles core capabilities like provisioning compute resources, parallel computation, automatic scaling, and application backups (implemented as checkpoints and snapshots).

Flink Asynchronous I/O Operator

Flink Asynchronous I/O Operator

Flink’s Asynchronous I/O operator allows you to use asynchronous request clients for external systems to enrich stream events or perform computation. Asynchronous interaction with the external system means that a single parallel function instance can handle multiple requests and receive the responses concurrently. In most cases this leads to higher streaming throughput. Asynchronous I/O API integrates well with data streams, and handles order, event time, fault tolerance, etc. You can configure this operator to call external sources like databases and APIs. The architecture pattern explained in this post is configured to call API Gateway integrated with SageMaker endpoints.

Please refer code at kda-flink-ml, a sample Flink application with implementation of Asynchronous I/O operator to call an external Sagemaker endpoint via API Gateway. Below is the snippet of code of StreamingJob.java from sample Flink application.

DataStream<HttpResponse<RideRequest>> predictFareResponse =
            // Asynchronously call predictFare Endpoint
            AsyncDataStream.unorderedWait(
                predictFareRequests,
                new Sig4SignedHttpRequestAsyncFunction<>(predictFareEndpoint, apiKeyHeader),
                30, TimeUnit.SECONDS, 20
            )
            .returns(newTypeHint<HttpResponse<RideRequest>() {});

The operator code above requires following inputs:

An input data stream
An implementation of AsyncFunction that dispatches the requests to the external system
Timeout, which defines how long an asynchronous request may take before it considered failed
Capacity, which defines how many asynchronous requests may be in progress at the same time

How Amazon SageMaker fits into this puzzle

In our architecture we are proposing a SageMaker endpoint for inferencing that is invoked via API Gateway, which can detect fraudulent transactions.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to build and develop high quality models. You can use these trained models in an ingestion pipeline to make real-time inferences.

You can set up persistent endpoints to get predictions from your models that are deployed on SageMaker hosting services. For an overview on deploying a single model or multiple models with SageMaker hosting services, see Deploy a Model on SageMaker Hosting Services.

Ready for a test drive

To help you get started, we would like to introduce an AWS Solution: AWS Streaming Data Solution for Amazon Kinesis (Option 4) that is available as a single-click cloud formation template to assist you in quickly provisioning resources to get your real-time in-stream inference pipeline up and running in a few minutes. In this solution we leverage AWS Lambda, but that can be switched with a SageMaker endpoint to achieve the architecture discussed earlier in this post. You can also leverage the pre-built AWS Solutions Construct, which implements an Amazon API Gateway connected to an Amazon SageMaker endpoint pattern that can replace AWS Lambda in the below solution. See the implementation guide for this solution.

The following diagram illustrates the architecture for the solution:

architecture for the solution

Conclusion

In this post we explained the architecture to build a managed, reliable, scalable, and highly available application that is capable of real-time inferencing on a data stream. The architecture was built using KDS, KDA for Apache Flink, Apache Flink, and Amazon SageMaker. The architecture also illustrates how you can use managed services so that you don’t need to spend time provisioning, configuring, and managing the underlying infrastructure. Instead, you can spend your time creating insights and inference from your data.

We also talked about the AWS Streaming Data Solution for Amazon Kinesis, which is an AWS vetted solution that provides implementations for applications you can automatically deploy directly into your AWS account. The solution automatically configures the AWS services necessary to easily capture, store, process, and infer from streaming data.

Serving Content Using a Fully Managed Reverse Proxy Architecture in AWS

2020-11-25 Leonardo Machado

Post Syndicated from Leonardo Machado original https://aws.amazon.com/blogs/architecture/serving-content-using-fully-managed-reverse-proxy-architecture/

With the trends to autonomous teams and microservice style architectures, web frontend tiers are challenged to become more flexible and integrate different components with independent architectures and technology stacks. Two scenarios are prominent:

Micro-Frontends, where there is a single page application and components within this page are owned by different teams
Web portals, where there is a landing page and subsections of the presence are owned by different teams. In the following we will refer to these as components as well.

What these scenarios have in common is that they consist of loosely coupled components that are seamlessly hidden to the end user behind a common interface. Often, a reverse proxy serves content from one single entry domain but retrieves the content from different origins. In the example in Figure 1 (below) we want to address one specific domain name, and depending on the path prefix, we retrieve the content from an on-premises webserver, from a webserver running on Amazon Elastic Cloud Compute (EC2), or from Amazon S3 Static Hosting, in the figure represented by the prefixes /hotels, /pets, and /cars, respectively. If we forward the path to the webserver without the path prefix, the component would not know what prefix it is run under and the prefix could be changed any time without impacting the component, thus making the component context-unaware.

Figure 1: Architecture, AWS Amplify Console

Some common requirements to these approaches are:

Components should be technology-agnostic, each component should be able to choose the technology stack independently.
Each component can be maintained by a dedicated autonomous team without depending on other teams.
All components are served from the same domain name. For example, this could have implications on search engine optimization.
Components should be unaware of the context where it is used.

The traditional approach would be to run a reverse proxy tier with rewrite rules to different origins. In this post we look into managed alternatives in AWS that take away the heavy lifting of running and scaling the proxy infrastructure.

Note: AWS Application Load Balancer can be used as a reverse proxy, but it only supports static targets (fixed IP address), no dynamic targets (domain name). Thus, we do not consider it here.

AWS Amplify Console

The AWS Amplify Console provides a Git-based workflow for hosting fullstack serverless web apps with continuous deployment. Amplify Console also offers a rewrites and redirects feature, which can be used for forwarding incoming requests with different path patterns to different origins (see Figure 2).

Figure 2: Dashboard, AWS Amplify Console (rewrites and redirects feature)

Note: In Figure 2, <*> stands for a wildcard that matches any pattern. Target addresses must be HTTPS (no HTTP allowed).

This architectural option is the simplest to setup and manage and is the best approach for teams looking for the least management effort. AWS Amplify Console offers a simple interface for easily mapping incoming patterns to target addresses. It also makes it easy to serve additional static content if needed. Configuration options are limited and more complex scenarios cannot be implemented.

If you want to rewrite paths to remove the path prefix, you can accomplish this by using the wildcard pattern. The source address would contain the path prefix, but the target address would omit the prefix as seen in Figure 2.

When looking at pricing compared to the other approaches it is important to look at the outgoing traffic. With higher volumes, this can get expensive.

Amazon API Gateway

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. API Gateway’s REST API type allows users to setup HTTP proxy integrations, which can be used for forwarding incoming requests with different path patterns to different origin servers according to the API specifications (Figure 3).

Figure 3: Dashboard, Amazon API Gateway (HTTP proxy integration)

Note: In Figure 3, {proxy+} and {proxy} stand for the same wildcard pattern.

API Gateway, in comparison to Amplify Console, is better suited when looking for a higher customization degree. API Gateway offers multiple customization and monitoring features, such as custom gateway responses and dashboard monitoring.

Similar to Amplify Console, API Gateway provides a feature to rewrite paths and thus remove context from the path using the {proxy} wildcard.

API Gateway REST API pricing is based on the number of API calls as well as any external data transfers. External data transfers are charged at the EC2 data transfer rate.

Note: The HTTP integration type in API Gateway REST APIs does not support forwarding trailing slashes. If this is needed for your application, consider other integration types such as AWS Lambda integration or AWS service integration.

Amazon CloudFront and AWS Lambda@Edge

Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency and high transfer speeds. CloudFront is able to route incoming requests with different path patterns to different origins or origin groups by configuring its cache behavior rules (Figure 4).

Figure 4: Dashboard, CloudFront (Cache Behavior)

Additionally, Amazon CloudFront allows for integration with AWS Lambda@Edge functions. Lambda@Edge runs your code in response to events generated by CloudFront. In this scenario we can use Lambda@Edge to change the path pattern before forwarding a request to the origin and thus removing the context. For details on see this detailed re:Invent session.

This approach offers most control over caching behavior and customization. Being able to add your own custom code through a custom Lambda function adds an entire new range of possibilities when processing your request. This enables you to do everything from simple HTTP request and response processing at the edge to more advanced functionality, such as website security, real-time image transformation, intelligent bot mitigation, and search engine optimization.

Amazon CloudFront is charged by request and by Lambda@Edge invocation. The data traffic out is charged with the CloudFront regional data transfer out pricing.

Conclusion

With AWS Amplify Console, Amazon API Gateway, and Amazon CloudFront, we have seen three approaches to implement a reverse proxy pattern using managed services from AWS. The easiest approach to start with is AWS Amplify Console. If you run into more complex scenarios consider API Gateway. For most flexibility and when data traffic cost becomes a factor look into Amazon CloudFront with Lambda@Edge.

Field Notes: Setting Up Disaster Recovery in a Different Seismic Zone Using AWS Outposts

2020-11-24 Vijay Menon

Post Syndicated from Vijay Menon original https://aws.amazon.com/blogs/architecture/field-notes-setting-up-disaster-recovery-in-a-different-seismic-zone-using-aws-outposts/

Recovering your mission-critical workloads from outages is essential for business continuity and providing services to customers with little or no interruption. That’s why many customers replicate their mission-critical workloads in multiple places using a Disaster Recovery (DR) strategy suited for their needs.

With AWS, a customer can achieve this by deploying multi Availability Zone High-Availability setup or a multi-region setup by replicating critical components of an application to another region. Depending on the RPO and RTO of the mission-critical workload, the requirement for disaster recovery ranges from simple backup and restore, to multi-site, active-active, setup. In this blog post, I explain how AWS Outposts can be used for DR on AWS.

In many geographies, it is possible to set up your disaster recovery for a workload running in one AWS Region to another AWS Region in the same country (for example in US between us-east-1 and us-west-2). For countries where there is only one AWS Region, it’s possible to set up disaster recovery in another country where AWS Region is present. This method can be designed for the continuity, resumption and recovery of critical business processes at an agreed level and limits the impact on people, processes and infrastructure (including IT). Other reasons include to minimize the operational, financial, legal, reputational and other material consequences arising from such events.

However, for mission-critical workloads handling critical user data (PII, PHI or financial data), countries like India and Canada have regulations which mandate to have a disaster recovery setup at a “safe distance” within the same country. This ensures compliance with any data sovereignty or data localization requirements mandated by the regulators. “Safe distance” means the distance between the DR site and the primary site is such that the business can continue to operate in the event of any natural disaster or industrial events affecting the primary site. Depending on the geography, this safe distance could be 50KM or more. These regulations limit the options customers have to use another AWS Region in another country as a disaster recovery site of their primary workload running on AWS.

In this blog post, I describe an architecture using AWS Outposts which helps set up disaster recovery on AWS within the same country at a distance that can meet the requirements set by regulators. This architecture also helps customers to comply with various data sovereignty regulations in a given country. Another advantage of this architecture is the homogeneity of the primary and disaster recovery site. Your existing IT teams can set up and operate the disaster recovery site using familiar AWS tools and technology in a homogenous environment.

Prerequisites

Readers of this blog post should be familiar with basic networking concepts like WAN connectivity, BGP and the following AWS services:

Amazon EC2
Amazon VPC
AWS Outposts
AWS Transit Gateway
AWS Managed VPN
AWS Direct Connect
AWS Marketplace Amazon Machine Images (AMI) for Network Infrastructure

Architecture Overview

I explain the architecture using an example customer scenario in India, where a customer is using AWS Mumbai Region for their mission-critical workload. This workload needs a DR setup to comply with local regulation and the DR setup needs to be in a different seismic zone than the one for Mumbai. Also, because of the nature of the regulated business, the user/sensitive data needs to be stored within India.

Following is the architecture diagram showing the logical setup.

This solution is similar to a typical AWS Outposts use case where a customer orders the Outposts to be installed in their own Data Centre (DC) or a CoLocation site (Colo). It will follow the shared responsibility model described in AWS Outposts documentation.

The only difference is that the AWS Outpost parent Region will be the closest Region other than AWS Mumbai, in this case Singapore. Customers will then provision an AWS Direct Connect public VIF locally for a Service Link to the Singapore Region. This ensures that the control plane stays available via the AWS Singapore Region even if there is an outage in AWS Mumbai Region affecting control plane availability. You can then launch and manage AWS Outposts supported resources in the AWS Outposts rack.

For data plane traffic, which should not go out of the country, the following options are available:

Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and AWS Transit Gateway (TGW) in the primary Region.
Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and Virtual Private Gateway (VGW) in the primary Region.

Note: The Primary Region in this example is AWS Mumbai Region. This VPN will be provisioned via Local Gateway and DX public VIF. This ensures that data plane traffic will not traverse any network out of the country (India) to comply with data localization mandated by the regulators.

Architecture Walkthrough

Make sure your data center (DC) or the choice of collocate facility (Colo) meets the requirements for AWS Outposts.
Create an Outpost and order Outpost capacity as described in the documentation. Make sure that you do this step while logged into AWS Outposts console of the AWS Singapore Region.
Provision connectivity between AWS Outposts and network of your DC/Colo as mentioned in AWS Outpost documentation. This includes setting up VLANs for service links and Local Gateway (LGW).
Provision an AWS Direct Connect connection and public VIF between your DC/Colo and the primary Region via the closest AWS Direct Connect location.
- For the WAN connectivity between your DC/Colo and AWS Direct Connect location you can choose any telco provider of your choice or work with one of AWS Direct Connect partners.
- This public VIF will be used to attach AWS Outposts to its parent Region in Singapore over AWS Outposts service link. It will also be used to establish an IPsec GRE tunnel between AWS Outposts subnet and a TGW or VGW for data plane traffic (explained in subsequent steps).
- Alternatively, you can provision separate Direct Connect connection and public VIFs for Service Link and data plane traffic for better segregation between the two. You will have to provision sufficient bandwidth on Direct Connect connection for the Service Link traffic as well as the Data Plane traffic (like data replication between primary Region and AWS outposts).
- For an optimal experience and resiliency, AWS recommends that you use dual 1Gbps connections to the AWS Region. This connectivity can also be achieved over Internet transit; however, I recommend using AWS Direct Connect because it provides private connectivity between AWS and your DC/Colo environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.
Create a subnet in AWS Outposts and launch an EC2 instance running a router AMI of your choice from AWS Marketplace in this subnet. This EC2 instance is used to establish the IPsec GRE tunnel to the TGW or VGW in primary Region.
- Choose an EC2 instance which can support your bandwidth requirement as per the AMI provider and disable source/destination check for this EC2 instance.
- Assign an Elastic IP address to this EC2 instance from the customer-owned pool provisioned for your AWS Outposts.
Add rules in security group of these EC2 instances to allow ISAKMP (UDP 500), NAT Traversal (UDP 4500), and ESP (IP Protocol 50) from VGW or TGW endpoint public IP addresses.
NAT (Network Address Translation) the EIP assigned in step 5 to a public IP address at your edge router connecting to AWS Direct connect or internet transit. This public IP will be used as the customer gateway to establish IPsec GRE tunnel to the primary Region.
Create a customer gateway using the public IP address used to NAT the EC2 instances step 7. Follow the steps in similar process found at Create a Customer Gateway.
Create a VPN attachment for the transit gateway using the customer gateway created in step 8. This VPN must be a dynamic route-based VPN. For steps, review Transit Gateway VPN Attachments. If you are connecting the customer gateway to VPC using VGW in primary Region then follow the steps mentioned at How do I create a secure connection between my office network and Amazon Virtual Private Cloud?.
Configure the customer gateway (EC2 instance running a router AMI in AWS Outposts subnet) side for VPN connectivity. You can base this configuration suggested by AWS during the creation of VPN in step 9. This suggested sample configuration can be downloaded from AWS console post VPN setup as discussed in this document.
Modify the route table of AWS outpost Subnets to point to the EC2 instance launched in step 5 as the target for any destination in your VPCs in the primary Region, which is AWS Mumbai in this example.

At this point, you will have end-to-end connectivity between VPCs in a primary Region and resources in an AWS Outposts. This connectivity can now be used to replicate data from your primary site to AWS Outposts for DR purposes. This keeps the setup compliant with any internal or external data localization requirements.

Conclusion

In this blog post, I described an architecture using AWS Outposts for Disaster Recovery on AWS in countries without a second AWS Region. To set up disaster recovery, your existing IT teams can set up and operate the disaster recovery site using the familiar AWS tools and technology in a homogeneous environment. To learn more about AWS Outposts, refer to the documentation and FAQ.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Fast and Cost-Effective Image Manipulation with Serverless Image Handler

2020-11-23 Ajay Swamy

Post Syndicated from Ajay Swamy original https://aws.amazon.com/blogs/architecture/fast-and-cost-effective-image-manipulation-with-serverless-image-handler/

As a modern company, you most likely have both a web-based and mobile app platform to provide content to customers who view it on a range of devices. This means you need to store multiple versions of images, depending on the device. The resulting image management can be a headache as it can be expensive and cumbersome to manage.

Serverless Image Handler (SIH) is an AWS Solution Implementation you use to store a single version of every image featured in your content, while dynamically delivering different versions at runtime based on your end user’s device. The solution simplifies code, saves on storage costs, and is ideal for use with web applications and mobile apps. SIH features include the ability to resize images, change background colors, apply formatting, and add watermarks.

Architecture overview

The SIH solution utilizes an AWS CloudFormation template to deploy the solution within minutes, and it’s for those of you who have multiple image assets needing an option to dynamically change or manipulate customer-facing images. SIH deploys best-in-class AWS services such as Amazon CloudFront, Amazon API Gateway, and AWS Lambda functions, and it connects to your Amazon Simple Storage Service (Amazon S3) bucket for storage.

Deploying this solution with the default parameters builds the following environment in AWS Cloud:

SIH: Emvironment in AWS Cloud-2

SIH uses the following AWS services:

Amazon CloudFront to quickly and securely deliver images to your end users at scale
AWS Lambda to run code for image manipulation without the need for provisioning or managing servers (thereby reducing costs and overhead)
Your Amazon S3 bucket for storage of your image assets
AWS Secrets Manager to support the signing of image URLs so that image access is protected

How does Serverless Image Handler work?

When an HTTP request is received from a customer device, it is passed from CloudFront to API Gateway, and then forwarded to the Lambda function for processing. If the image is cached by CloudFront because of an earlier request, CloudFront will return the cached image instead of forwarding the request to the API Gateway. This reduces latency and eliminates the cost of reprocessing the image.

Requests that are not cached are passed to the API Gateway, and the entire request is forwarded to the Lambda function. The Lambda function retrieves the original image from your Amazon S3 bucket and uses Sharp (the open source image processing software) to return a modified version of the image to the API Gateway. SIH also utilizes Thumbor to apply dynamic filters on the fly. Additionally, the solution generates a CloudFront domain name that supports caching in CloudFront. The newly manipulated image is now cached at CloudFront for easy access and retrieval. The end-to-end request and response can be secured by using the solution’s signed URL feature via AWS Secrets Manager, which allows you to prevent unauthorized use of your proprietary images.

Lastly, SIH uses Amazon Rekognition for face detection in images submitted for smart cropping, allowing for easy cropping for specific content and image needs.

Code example of image manipulation

Please refer to the SIH implementation guide to quickly set up and use SIH. Using Node.js, you can create an image request as illustrated below. The code block specifies the image location as myImageBucket and specifies edits of grayscale :true to change the image to grayscale.

const imageRequest = JSON.stringify({
    bucket: “myImageBucket”,
    key: “myImage.jpg”,
    edits: {
        grayscale: true
    }
});

const url = `${CloudFrontUrl}/${Buffer.from(imageRequest).toString(‘base64’)}`;

With the generated URL, SIH can serve the grayscale image.

Conclusion

If you’re looking for a fast and cost-effective solution for image management, Serverless Image Handler provides a great way to manipulate and serve images on the fly with speed and security. Learn more about SIH and watch the accompanying Solving with AWS Solutions video below.

Field Notes: Automating Migration Requests for Reserved Instances and Savings Plans in Closed Accounts

2020-11-19 Daniel Li

Post Syndicated from Daniel Li original https://aws.amazon.com/blogs/architecture/field-notes-automating-migration-requests-for-reserved-instances-and-savings-plans-in-closed-accounts/

Enterprise AWS customers are often managing many accounts under a payer account, and sometimes accounts are closed before Reserved Instances (RI) or Savings Plans (SP) are fully used. Manually tracking account closures and requesting RI and SP migration from the closed accounts can become complex and error prone.

This blog post describes a solution for automating the process of requesting migration of unused RI and SP from closed accounts that are linked to a payer account. The solution helps reduce manual work and loss of unused RI and SP due to human error.

The solution automatically detects newly closed accounts that are linked to a payer account. If the closed accounts have unused RI and SP attached, it creates support cases with the required information for RI and SP migration. It can optionally send email notifications with the support case IDs. After a support case is created, the AWS Support team starts a workflow for processing the request, and updates the support case when the request is completed.

Solution Overview

The following diagram shows the overview of the solution.

Walkthrough

An Amazon CloudWatch Events time-based rule will create an event with the configured interval, such as one hour.
The event triggers an AWS Lambda function (refresh.js), which will retrieve the current status of all the linked accounts for the payer account from AWS Organizations. It saves the account status as an item in an Amazon DynamoDB table.
The DynamoDB table has Streams enabled and it triggers another Lambda function (process.js) when there are updates in the table. The stream view type is set to NEW_AND_OLD_IMAGES.
The Lambda function performs the following steps:
- Compares the differences between the new and the old images in the stream record. If an account’s status is SUSPENDED in the new image and is ACTIVE in the old image, the account is newly closed.
- Retrieves the RI and SP utilization information from AWS Cost Explorer for the closed account.
- If there are unused RI and SP in the account, it creates support cases to request RI and SP migration and/or send email notifications through Amazon SNS.

Next, let’s take a deeper look at the Lambda functions, which were written in JavaScript. The code for the Lambda functions can be downloaded from this repository.

Lambda function: refresh.js

This Lambda function is triggered by a CloudWatch Events rule at the predefined interval, for example, one hour. It retrieves the account statuses of all the linked accounts under a payer account from AWS Organizations. Then it saves the account statuses to a DynamoDB table.

It calls the AWS Organizations ListAccounts API to get the current status of all the linked accounts for the given payer account. The following sample code is for the ListAccounts API call.

const AWS = require('aws-sdk')
// Environment variable
const organizationsEndpoint = process.env.ORGANIZATIONS_ENDPOINT; 
const orgEndpoint = new AWS.Endpoint(organizationsEndpoint);
// Get region from the endpoint hostname, e.g. organizations.us-east-1.amazonaws.com
const orgRegion = organizationsEndpoint.split(".")[1];

const organizations = new AWS.Organizations({
    endpoint: orgEndpoint,
    region: orgRegion 
});

// Call the ListAccounts API to retrieve the status for all the linked accounts
function refreshAccounts() {
    console.log("Enter refreshAccounts");

    var newData = {Accounts:[]};
    
    var handleError = function(response) {
        console.log(response.err, response.err.stack); // an error occurred
        throw response.err;
    };
    
    var params = {};
    organizations.listAccounts(params).on('success', function handlePage(response) {
        console.log("ListAccounts response: \n" + JSON.stringify(response.data));    
        var account;
            
        // extract each account's Id and Status 
        for (account of response.data.Accounts) {
            var accountStatus = {Id: account.Id, Status: account.Status};
            newData.Accounts.push(accountStatus);
        }
    
        // handle response pagination
        if (response.hasNextPage()) {
            response.nextPage().on('success', handlePage).on('error', handleError)
            .send();
        } else {
            console.log("Account Status: \n" + JSON.stringify(newData)); 
            // insert code here to save the data to DynamoDB
        }
    }).on('error', handleError).send();
}

The following sample output is from the ListAccounts API call.

{
  Accounts: [
    {
      Id: '123456789000',
      Arn: 'arn:aws:organizations::123456789000:account/o-x3ub1dr53f/123456789000',
      Email: '[email protected]',
      Name: 'Payer',
      Status: 'ACTIVE',
      JoinedMethod: 'INVITED',
      JoinedTimestamp: 2020-01-09T06:49:36.035Z
    },
    {
      Id: '987654321000',
      Arn: 'arn:aws:organizations::123456789000:account/o-x3ub1dr53f/987654321000',
      Email: '[email protected]',
      Name: 'Linked Account 1',
      Status: 'SUSPENDED',
      JoinedMethod: 'INVITED',
      JoinedTimestamp: 2020-08-11T23:46:18.560Z
    }
  ]
}

From the API output, the Lambda function creates a smaller JSON with only the account ID and status information.

{
    "Accounts": [
        {
            "Id": "123456789000",
            "Status": "ACTIVE"
        },
        {
            "Id": "987654321000",
            "Status": "SUSPENDED"
        }
    ]
}

Next, it calls the DynamoDB UpdateItem API to save the current account status data as an item in the DynamoDB table. The API is configured with conditional update.

If the DynamoDB table already has an item with the same key, the item is not updated unless there is a different value. When the table is updated, DynamoDB Streams triggers a call to the next Lambda function. Below is the sample code for the UpdateItem API call.

const tableName = process.env.TABLE_NAME; // environment variable
const dynamodb = new AWS.DynamoDB();

// Save the account status JSON to the DynamoDB table 
// if it is different than the existing data in the table
function saveDataToDynamoDB(data) {
    console.log("Enter saveDataToDynamoDB");

    var params = {
        ExpressionAttributeNames: {
            "#ACC": "Accounts"
        }, 
        ExpressionAttributeValues: {
            ":t": {
                S: data
            } 
        }, 
        Key: {
            "Organization": {
                S: "default"
            }
        }, 
        ConditionExpression: "#ACC <> :t",
        ReturnValues: "ALL_NEW", 
        TableName: tableName, 
        UpdateExpression: "SET #ACC = :t"
    };
    
    dynamodb.updateItem(params, function(err, data) {
        console.log("DynamoDB callback...");  
        if (err) {
            console.log(err, err.stack); // an error occurred
        
            if (err.code != "ConditionalCheckFailedException") {
                throw err;
            }
        } else {
            console.log(data);           // successful response
        }
    });
}

Lambda function: process.js

This Lambda function processes DynamoDB stream records and compares the old and new item data in the stream record. If any of the linked accounts changed statuses to SUSPENDED from ACTIVE, it calls the Cost Explorer APIs – GetReservationUtilization API and GetSavingsPlansUtilizationDetails API– to check if the newly closed accounts have unused RI and SP attached. The following sample code is for the GetReservationUtilization API call.

const costexplorerEndpoint = process.env.COST_EXPLORER_ENDPOINT; // environment variable
const ceEndpoint = new AWS.Endpoint(costexplorerEndpoint);
// Get region from the endpoint hostname, e.g. ce.us-east-1.amazonaws.com
const ceRegion = costexplorerEndpoint.split(".")[1];

const costexplorer = new AWS.CostExplorer({
    endpoint:ceEndpoint,
    region:ceRegion
});

// Check if the source account has unused RI. If yes, send email notification 
// or create support case based on ACTION_TYPE
// Params
//   startDate: String with 'YYYY-MM-DD' format
//   endDate: String with 'YYYY-MM-DD' format
//   sourceAccountId: the AWS account to migrate RI from
//   destinationAccountId: the AWS account to migrate RI to
function processReservationUtilization(startDate,endDate,
            sourceAccountId,destinationAccountId) {
  
  console.log("Enter processReservationUtilization");
  
  // array to hold all the unused RI lease IDs from the source account
  var riLeaseIds = []; 
  
  const now = new Date();

  var params = {
    TimePeriod: {
      End: endDate, 
      Start: startDate
    },
    Filter: { 
      "Dimensions": {
          "Key": "LINKED_ACCOUNT",
          "Values": [
              sourceAccountId
          ]
      }
    },
    GroupBy: [
      {
        Key: 'SUBSCRIPTION_ID',
        Type: "DIMENSION"
      }
    ]
  };
    
  var handleError = function(response) {
      console.log(response.err, response.err.stack); // an error occurred
      throw response.err;
  };

  costexplorer.getReservationUtilization(params)
    .on('success', function handlePage(response) {
    console.log("GetReservationUtilization response \n" + response.data); 
    
    var utilizationsByTime;
    var group;
    
    // Find unused RI in the results
    for (utilizationsByTime of response.data.UtilizationsByTime) {
      for (group of utilizationsByTime.Groups) {
        const endDateTime = new Date(group.Attributes.endDateTime);
        if (endDateTime.getTime() > now.getTime()) {
          riLeaseIds.push(group.Attributes.leaseId);
        }        
      }
    }
    
    // Handle response pagination
    if (response.hasNextPage()) {
        response.nextPage().on('success', handlePage)
          .on('error', handleError).send();
    } else if (riLeaseIds.length > 0) {
      var requestBody = riRequestBody + "\nSource Account: " + sourceAccountId 
        + "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(riLeaseIds);
      // Insert code here to create support case and/or send message to SNS
    }      
  }).on('error', handleError).send();
}

The following sample code is for the GetSavingsPlansUtilizationDetails API call.

// Check if the source account has unused SP. If yes, send email notification 
// or create support case based on ACTION_TYPE
// Params
//   startDate: String with 'YYYY-MM-DD' format
//   endDate: String with 'YYYY-MM-DD' format
//   sourceAccountId: the AWS account to migrate SP from
//   destinationAccountId: the AWS account to migrate SP to
function processSavingsPlansUtilization(startDate,endDate,
        sourceAccountId,destinationAccountId) {
  
  console.log("Enter processSavingsPlansUtilization");
  
  //array to hold all the unused SP ARNs from the source account
  var savingsPlansArns = []; 
  
  const now = new Date();  
  
  var params = {
    TimePeriod: { /* required */
      End: endDate, 
      Start: startDate    
    },
    Filter: { 
      "Dimensions": {
          "Key": "LINKED_ACCOUNT",
          "Values": [
            sourceAccountId
          ]
      }
    }
  };

  var handleError = function(response) {
      console.log(response.err, response.err.stack); // an error occurred
      throw response.err;
  };

  costexplorer.getSavingsPlansUtilizationDetails(params).on('success', function handlePage(response) {
    console.log("getSavingsPlansUtilizationDetails response \n" + response.data);  

    var utilizationDetail;
    
    // Find unused SP in the results
    for (utilizationDetail of response.data.SavingsPlansUtilizationDetails) {
      const endDateTime = new Date(utilizationDetail.Attributes.EndDateTime);
      if (endDateTime.getTime() > now.getTime()) {
        savingsPlansArns.push(utilizationDetail.SavingsPlanArn);
      }
    }

    // handle response pagination
    if (response.hasNextPage()) {
        response.nextPage().on('success', handlePage).on('error', handleError).send();
    } else if (savingsPlansArns.length > 0) {
      var requestBody = spRequestBody + "\nSource Account: " + sourceAccountId + "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(savingsPlansArns);
      // insert code here to create support case and/or send message to SNS
    }
  }).on('error', handleError).send();
}

For those accounts that require RI or SP migration, it calls the AWS Support CreateCase API to create support cases. It can either/or send email notifications based on the value of the environment variable ACTION_TYPE.

EMAIL_ONLY: Send the RI and SP details to an SNS topic for email notification.
SUPPORT_CASE_ONLY: Create support cases to request RI and SP migration.
SUPPORT_CASE_AND_EMAIL: Create support cases to request RI and SP migration and send the request details to an SNS topic for email notification. The support case IDs are included in the email subject.

The following sample code is for the CreateCase API call.

const supportEndpoint = process.env.SUPPORT_ENDPOINT; // environment variable
const supEndpoint = new AWS.Endpoint(supportEndpoint);
// get region from the endpoint hostname, e.g. support.us-east-1.amazonaws.com
const supRegion = supportEndpoint.split(".")[1];

const support = new AWS.Support({
    endpoint:supEndpoint,
    region:supRegion
});

// Create an AWS Support Case
// Params:
//   emailAddress: email address for the support case
//   caseSubject: the subject of the support case
//   caseBody: the body of the support case
function createSupportCase(emailAddresses, caseSubject, caseBody, notify) {
  
  console.log("Enter createSupportCase");
  
  var params = {
    communicationBody: caseBody, /* required */
    subject: caseSubject, /* required */
    categoryCode: 'reserved-instances',
    ccEmailAddresses: [
      emailAddresses
    ],
    issueType: 'customer-service',
    serviceCode: 'billing',
    severityCode: 'low'
  };
  
  support.createCase(params, function(err, data) {
    console.log("Support callback...");  
    
    if (err) {
      console.log(err, err.stack); // an error occurred
      throw err;
    } else {
      console.log(data);           // successful response
      if (notify) {
        // insert code here to publish the message to a SNS topic
      }
    }
  });
}

The following sample support case is content from the Lambda function:

Subject:

Savings Plans Migration

Body:

Please migrate the following Savings Plans from the source account to the destination account.

Source Account: 987654321000

Destination Account: 123456789000

Savings Plans Arns: [“arn:aws:savingsplans::987654321000:savingsplan/bb242919-11ba-429c-b1dc-101010101010”]

AWS Support case guidelines

The AWS RI Operations Team has provided the following guidelines for RI and SP migration support cases.

1) If a closed account is still linked to a payer account, create one support case from the payer account. Otherwise, create one support case from each account – source and destination.

2) When creating a support case, use a user ID or a role with RI and SP purchase IAM permissions.

3) In the support case body, provide RI lease IDs or SP ARNs, source account ID, destination account ID, and an explicit request to migrate the RI and SP. If there are many RI lease IDs or SP ARNs, provide the data in CSV format.

Note: Do not use PDF or images.

4) Set severity code to low, service code to billing, category code to reserved-instances, and issue type to customer-service.

Deploy the stack

Download the template for deploying the solution used in this post.

Use this template to launch the solution and all associated components. The default configuration deploys AWS Lambda functions, a DynamoDB table, an Amazon CloudWatch Events rule, an SNS topic, and AWS Identity and Access Management (IAM) roles necessary to set up the solution in your account, but you can customize the template to meet your specific needs.

Before you launch the solution, review the architecture, configuration, network security, and other considerations. Follow the step-by-step instructions in this section to configure and deploy the solution into your account. You need to have authority to launch resources in the payer account before launching the stack.

Note: You are responsible for the cost of the AWS services used while running this solution. For more details, refer to the pricing web page for each AWS service used in this solution.

Time to deploy: Approximately five minutes.

1. Sign in to the AWS Management Console and use the following button to launch the AWS CloudFormation stack. Optionally, you can download the template as a starting point for your own implementation. Note: the CloudFormation stack needs to be created in the payer account.

2. The template launches in the US East (N. Virginia) Region by default. You can choose to launch it in other regions.

3. On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.

4. On the Specify stack details page, assign a name to your solution stack.

5. Under Parameters, review the parameters for this solution template and modify them as necessary. This solution uses the following default values.

Recommended documentation for reviewing these parameters:

6. Choose Next.

7. On the Configure stack options page, choose Next.

8. On the Review page, review and confirm the settings. Check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources.

9. Choose Create stack to deploy the stack.

You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive a CREATE_COMPLETE status in approximately five minutes.

Clean Up

To clean up the stack, select the stack in the AWS CloudFormation console and click the Delete button. In the popup window, choose Delete stack.

Conclusion

This post described a solution for automating the process of detecting account closures and requesting RI and SP migration. The solution will help enterprise customers reduce manual work and avoid losing unused RI and SP due to human errors.

If you have any comments or questions, please don’t hesitate to leave them in the comments section.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Restricting Amazon WorkSpaces Users to Run Amazon Athena Queries

2020-11-18 Somdeb Bhattacharjee

Post Syndicated from Somdeb Bhattacharjee original https://aws.amazon.com/blogs/architecture/field-notes-restricting-amazon-workspaces-users-to-run-amazon-athena-queries/

One of the use cases we hear from customers is that they want to provide very limited access to Amazon Workspaces users (for example contractors, consultants) in an AWS account. At the same time they want to allow them to query Amazon Simple Storage Service (Amazon S3) data in another account using Amazon Athena over a JDBC connection.

For example, marketing companies might provide private access to the first party data to media agencies through this mechanism.

The restrictions they want to put in place are:

For security reasons these Amazon WorkSpaces should not have internet connectivity. So the access to Amazon Athena must be over AWS PrivateLink.
Public access to Athena is not allowed using the credentials used for the JDBC connection. This is to prevent the users from leveraging the credentials to query the data from anywhere else.

In this post, we show how to use Amazon Virtual Private Cloud (Amazon VPC) endpoints for Athena, along with AWS Identity and Access Management (AWS IAM) policies. This provides private access to query the Amazon S3 data while preventing users from querying the data from outside their Amazon WorkSpaces or using the Athena public endpoint.

Let’s review the steps to achieve this:

Initial setup of two AWS accounts (AccountA and AccountB)
Set up Amazon S3 bucket with sample data in AccountA
Set up an IAM user with Amazon S3 and Athena access in AccountA
Create an Amazon VPC endpoint for Athena in AccountA
Set up Amazon WorkSpaces for a user in AccountB
Install a SQL client tool (we will use DbVisualizer Free) and Athena JDBC driver in Amazon WorkSpaces in AccountB
Use DbVisualizer to the query the Amazon S3 data in AccountA using the Athena public endpoint
Update IAM policy for user in AccountA to restrict private only access

Prerequisites

To follow the steps in this post, you need two AWS Accounts. The Amazon VPC and subnet requirements are specified in the detailed steps.

Note: The AWS CloudFormation template used in this blog post is for US-EAST-1 (N. Virginia) Region so ensure the Region setting for both the accounts are set to US-EAST-1 (N. Virginia).

Walkthrough

The two AWS accounts are:

AccountA – Contains the Amazon S3 bucket where the data is stored. For AccountA you can create a new Amazon VPC or use the default Amazon VPC.

AccountB – Amazon WorkSpaces account. Use the following AWS CloudFormation template for AccountB:

The AWS CloudFormation template will create a new Amazon VPC in AccountB with CIDR 10.10.0.0/16 and set up one public subnet and two private subnets.
It will also create a NAT Gateway in the public subnet and create both public and private route tables.
Since we will be launching Amazon WorkSpaces in these private subnets and not all Availability Zones (AZ) are supported by Amazon WorkSpaces, it is important to choose the right AZ when creating them.

Review the documentation to learn which AWS Regions/AZ are supported.

We have provided two parameters in the AWS CloudFormation template:

AZName1
AZName2

Step 1

Before launching the CloudFormation stack:

Log in to AccountB
Search for AWS Resource Access Manager
On the right-hand side, you will notice the AZ ID to AZ Name mapping. Note down the AZ Name corresponding to AZ ID use1-az2 and use1-az4
Now launch the CloudFormation template and remember to choose the AZ names you noted down earlier
- https://athena-workspaces-blogpost.s3.amazonaws.com/vpc.yaml
Enter the CloudFormation Stack Name as – ‘AthenaWorkspaces’ and leave everything default.
Once the CloudFormation stack creation is complete, create a peering connection from AccountB to AccountA.
Update the associated route tables for the private subnets with the new peering connection.

For information on how to create a VPC peering connection, refer to AWS documentation on VPC Peering.

AccountB VPC Route Table:

AccountA VPC Route Table:

AccountA VPC Route Table

Step 2

Create a new Amazon S3 bucket in AccountA with a bucket name that starts with ‘athena-’.
Next, you can download a sample file and upload it to the Amazon S3 bucket you just created.
Use the following statements to create AWS Glue database. Use an external table for the data in the Amazon S3 bucket so that you can query it from Athena.
Go to Athena console and define a new database:

CREATE DATABASE IF NOT EXISTS sampledb

Once the database is created, create a new table in sampledb (by selecting sampledb from the “Database” drop down menu). Replace the <<your bucket name>> with the bucket you just created:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.amazon_reviews_tsv(
  marketplace string, 
  customer_id string, 
  review_id string, 
  product_id string, 
  product_parent string, 
  product_title string,
  product_category string,
  star_rating int, 
  helpful_votes int, 
  total_votes int, 
  vine string, 
  verified_purchase string, 
  review_headline string, 
  review_body string, 
  review_date string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
LOCATION
  's3://<<your bucket name>>/'
TBLPROPERTIES ("skip.header.line.count"="1")

Step 3

In AccountA, create a new IAM user with programmatic access.
Save the access key and secret access key.
For the same user add an Inline Policy which allows the following actions:

IAM summary

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ]
        }
    ]
}

Step 4

In this step, we create an Interface VPC endpoint (AWS PrivateLink) for Athena in AccountA. When you use an interface VPC endpoint, communication between your Amazon VPC and Athena is conducted entirely within the AWS network.
Each VPC endpoint is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets.
To create an Interface VPC endpoint follow the instructions and select Athena in the AWS Services list. Do not select the checkbox for Enable Private DNS Name.
Ensure the security group that is attached to the Amazon VPC endpoint is open to inbound traffic on port 443 and 444 for source AccountB VPC CIDR 10.10.0.0/16. Port 444 is used by Athena to stream query results.
Once you create the VPC endpoint, you will get a DNS endpoint name which is in the following format. We are going to use this in JDBC connection from the SQL client.

VPC_Endpoint_ID.athena.Region.vpce.amazonaws.com

Step 5

In this step we set up Amazon WorkSpaces in AccountB.
Each Amazon WorkSpace is associated with the specific Amazon VPC and AWS Directory Service construct that you used to create it. All Directory Service constructs (Simple AD, AD Connector, and Microsoft AD) require two subnets to operate, each in different Availability Zones. This is why we created 2 private subnets at the beginning.
For this blog post I have used Simple AD as the directory service for the Amazon WorkSpaces.
By default, IAM users don’t have permissions for Amazon WorkSpaces resources and operations.
To allow IAM users to manage Amazon WorkSpaces resources, you must create an IAM policy that explicitly grants them permissions
Then attach the policy to the IAM users or groups that require those permissions.
To start, go to the Amazon WorkSpaces console and select Advanced Setup.
- Set up a new directory using the SimpleAD option.
- Use the “small” directory size and choose the Amazon VPC and private subnets you created in Step 1 for AccountB.
- Once you create the directory, register the directory with Amazon WorkSpaces by selecting “Register” from the “Action” menu.
- Select private subnets you created in Step 1 for AccountB.

Directory info

Next, launch Amazon WorkSpaces by following the Launch WorkSpaces button.
Select the directory you created and create a new user.
For the bundle, choose Standard with Windows 10 (PCoIP).
After the Amazon WorkSpaces is created, you can log in to the Amazon WorkSpaces using a client software. You can download it from https://clients.amazonworkspaces.com/
Login to your Amazon WorkSpace, install a SQL Client of your choice. At this point your Amazon WorkSpace still has Internet access via the NAT Gateway
I have used DbVisualizer (the free version) as the SQL client. Once you have that installed, install the JDBC driver for Athena following the instructions
Now you can set up the JDBC connections to Athena using the access key and secret key you set up for an IAM user in AccountA.

Step 6

To test out both the Athena public endpoint and the Athena VPC endpoint, create two connections using the same credentials.

For the Athena public endpoint, you need to use athena.us-east-1.amazonaws.com service endpoint. (jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Athena public

For the VPC Endpoint Connection, use the VPC Endpoint you created in Step 4 (jdbc:awsathena://vpce-<>.athena.us-east-1.vpce.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Database connection Athena

Now run a simple query to select records from the amazon_reviews_tsv table using both the connections.

SELECT * FROM sampledb.amazon_reviews_tsv limit 10

You should be able to see results using both the connections. Since the private subnets are still connected to the internet via the NAT Gateway, you can query using the Athena public endpoint.

Run the AWS Command Line Interface (AWS CLI) command using the credentials used for the JDBC connection from your workstation. You should be able to access the Amazon S3 bucket objects and the Athena query run list using the following commands.

aws s3 ls s3://athena-workspaces-blogpost

aws athena list-query-executions

Step 7

Now we lock down the access as described in the beginning of this blog post by taking the following actions:
Update the route table for the private subnets by removing the route for the internet so access to the Athena public endpoint is restricted from the Amazon WorkSpaces. The only access will be allowed through the Athena VPC Endpoint.
Add conditional checks to the IAM user access policy that will restrict access to the Amazon S3 buckets and Athena only if:
- The request came in through the VPC endpoint. For this we use the “aws:SourceVpce” check and provide the VPC Endpoint ID value.
- The request for Amazon S3 data is through Athena. For this we use the condition “aws:CalledVia” and provide a value of “athena.amazonaws.com”.
In the IAM access policy below replace <<your vpce id>> with your VPC endpoint id and update the previous inline policy which was added to the IAM user in Step 3.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ],
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        }
    ]
}

Once you applied the changes, try to reconnect using both the Athena VPC endpoint as well Athena public endpoint connections. The Athena VPC endpoint connection should work but the public endpoint connection will time out. Also try the same Amazon S3 and Athena AWS CLI commands. You should get access denied for both the operations.

Clean Up

To avoid incurring costs, remember to delete the resources that you created.

For AWS AccountA:

Delete the S3 buckets
Delete the database you created in AWS Glue
Delete the Amazon VPC endpoint you created for Amazon Athena

For AccountB:

Delete the Amazon Workspace you created along with the Simple AD directory. You can review more information on how to delete your Workspaces.

Conclusion

In this blog post, I showed how to leverage Amazon VPC endpoints and IAM policies to privately connect to Amazon Athena from Amazon Workspaces that don’t have internet connectivity.

Give this solution a try and share your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture Monthly Magazine: Open Source

2020-11-17 Annik Stahl

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-open-source/

According to the Open Source Initiative, the term “open source” was created at a strategy session held in 1998 in Palo Alto, California, shortly after the announcement of the release of the Netscape source code. Stakeholders at that session realized that this announcement created an opportunity to educate and advocate for the superiority of an open development process.

We’ve witnessed big changes in open source in the past 22 years and this year-end issue of Architecture Monthly looks at a few trends, including the shift from businesses only consuming open source to contributing to it. You’ll also going to learn how AWS open source projects are one of the ways we’re making technology less cost-prohibitive and more accessible to everyone.

In this month’s Open Source issue

Ask an Expert: Ricardo Sueiras, Principal Advocate for Open Source at AWS
Blog: How Amazon Retail Systems Run Machine Learning Predictions with Apache Spark using Deep Java Library
Case Study: Absa Transforms IT and Fosters Tech Talent Using AWS
Reference Architecture: Running WordPress on AWS
Blog: Simplifying Serverless Best Practices with Lambda Powertools
Whitepaper: Modernizing the Amazon Database Infrastructure: Migrating from Oracle to AWS
Quick Start: Magento on AWS
Related Videos: Wix, Viber, UC Santa Cruz, & Redfin

Download the magazine

How to access the magazine

View and download past issues as PDFs on the AWS Architecture Monthly webpage.
Readers in the US, UK, Germany, and France can subscribe to the Kindle version of the magazine at Kindle Newsstand.
Visit Flipboard, a personalized mobile magazine app that you can also read on your computer.

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Snowflake: Running Millions of Simulation Tests with Amazon EKS

2020-11-16 Keith Joelner

Post Syndicated from Keith Joelner original https://aws.amazon.com/blogs/architecture/snowflake-running-millions-of-simulation-tests-with-amazon-eks/

This post was co-written with Brian Nutt, Senior Software Engineer and Kao Makino, Principal Performance Engineer, both at Snowflake.

Transactional databases are a key component of any production system. Maintaining data integrity while rows are read and written at a massive scale is a major technical challenge for these types of databases. To ensure their stability, it’s necessary to test many different scenarios and configurations. Simulating as many of these as possible allows engineers to quickly catch defects and build resilience. But the Holy Grail is to accomplish this at scale and within a timeframe that allows your developers to iterate quickly.

Snowflake has been using and advancing FoundationDB (FDB), an open-source, ACID-compliant, distributed key-value store since 2014. FDB, running on Amazon Elastic Cloud Compute (EC2) and Amazon Elastic Block Storage (EBS), has proven to be extremely reliable and is a key part of Snowflake’s cloud services layer architecture. To support its development process of creating high quality and stable software, Snowflake developed Project Joshua, an internal system that leverages Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Registry (ECR), Amazon EC2 Spot Instances, and AWS PrivateLink to run over one hundred thousand of validation and regression tests an hour.

About Snowflake

Snowflake is a single, integrated data platform delivered as a service. Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency that today’s organizations require. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.

Snowflake architecture

Developing a simulation-based testing and validation framework

Snowflake’s cloud services layer is composed of a collection of services that manage virtual warehouses, query optimization, and transactions. This layer relies on rich metadata stored in FDB.

Prior to the creation of the simulation framework, Project Joshua, FDB developers ran tests on their laptops and were limited by the number they could run. Additionally, there was a scheduled nightly job for running further tests.

Joshua at Snowflake

Amazon EKS as the foundation

Snowflake’s platform team decided to use Kubernetes to build Project Joshua. Their focus was on helping engineers run their workloads instead of spending cycles on the management of the control plane. They turned to Amazon EKS to achieve their scalability needs. This was a crucial success criterion for Project Joshua since at any point in time there could be hundreds of nodes running in the cluster. Snowflake utilizes the Kubernetes Cluster Autoscaler to dynamically scale worker nodes in minutes to support a tests-based queue of Joshua’s requests.

With the integration of Amazon EKS and Amazon Virtual Private Cloud (Amazon VPC), Snowflake is able to control access to the required resources. For example: the database that serves Joshua’s test queues is external to the EKS cluster. By using the Amazon VPC CNI plugin, each pod receives an IP address in the VPC and Snowflake can control access to the test queue via security groups.

To achieve its desired performance, Snowflake created its own custom pod scaler, which responds quicker to changes than using a custom metric for pod scheduling.

The agent scaler is responsible for monitoring a test queue in the coordination database (which, coincidentally, is also FDB) to schedule Joshua agents. The agent scaler communicates directly with Amazon EKS using the Kubernetes API to schedule tests in parallel.
Joshua agents (one agent per pod) are responsible for pulling tests from the test queue, executing, and reporting results. Tests are run one at a time within the EKS Cluster until the test queue is drained.

Achieving scale and cost savings with Amazon EC2 Spot

A Spot Fleet is a collection—or fleet—of Amazon EC2 Spot instances that Joshua uses to make the infrastructure more reliable and cost effective. Spot Fleet is used to reduce the cost of worker nodes by running a variety of instance types.

With Spot Fleet, Snowflake requests a combination of different instance types to help ensure that demand gets fulfilled. These options make Fleet more tolerant of surges in demand for instance types. If a surge occurs it will not significantly affect tasks since Joshua is agnostic to the type of instance and can fall back to a different instance type and still be available.

For reservations, Snowflake uses the capacity-optimized allocation strategy to automatically launch Spot Instances into the most available pools by looking at real-time capacity data and predicting which are the most available. This helps Snowflake quickly switch instances reserved to what is most available in the Spot market, instead of spending time contending for the cheapest instances, at the cost of a potentially higher price.

Overcoming hurdles

Snowflake’s usage of a public container registry posed a scalability challenge. When starting hundreds of worker nodes, each node needs to pull images from the public registry. This can lead to a potential rate limiting issue when all outbound traffic goes through a NAT gateway.

For example, consider 1,000 nodes pulling a 10 GB image. Each pull request requires each node to download the image across the public internet. Some issues that need to be addressed are latency, reliability, and increased costs due to the additional time to download an image for each test. Also, container registries can become unavailable or may rate-limit download requests. Lastly, images are pulled through public internet and other services in the cluster can experience pulling issues.

For anything more than a minimal workload, a local container registry is needed. If an image is first pulled from the public registry and then pushed to a local registry (cache), it only needs to pull once from the public registry, then all worker nodes benefit from a local pull. That’s why Snowflake decided to replicate images to ECR, a fully managed docker container registry, providing a reliable local registry to store images. Additional benefits for the local registry are that it’s not exclusive to Joshua; all platform components required for Snowflake clusters can be cached in the local ECR Registry. For additional security and performance Snowflake uses AWS PrivateLink to keep all network traffic from ECR to the workers nodes within the AWS network. It also resolved rate-limiting issues from pulling images from a public registry with unauthenticated requests, unblocking other cluster nodes from pulling critical images for operation.

Conclusion

Project Joshua allows Snowflake to enable developers to test more scenarios without having to worry about the management of the infrastructure. Snowflake’s engineers can schedule thousands of test simulations and configurations to catch bugs faster. FDB is a key component of the Snowflake stack and Project Joshua helps make FDB more stable and resilient. Additionally, Amazon EC2 Spot has provided non-trivial cost savings to Snowflake vs. running on-demand or buying reserved instances.

If you want to know more about how Snowflake built its high performance data warehouse as a Service on AWS, watch the This is My Architecture video below.

Field Notes: Optimize your Java application for Amazon ECS with Quarkus

2020-11-11 Sascha Moellering

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/field-notes-optimize-your-java-application-for-amazon-ecs-with-quarkus/

In this blog post, I show you an interesting approach to implement a Java-based application and compile it to a native image using Quarkus. This native image is the main application, which is containerized, and runs in an Amazon Elastic Container Service and Amazon Elastic Kubernetes Service cluster on AWS Fargate.

Amazon ECS is a fully managed container orchestration service, Amazon EKS is a fully managed Kubernetes service, both services support Fargate to provide serverless compute for containers. Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design. AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.

Quarkus is a Supersonic Subatomic Java framework that uses OpenJDK HotSpot as well as GraalVM and over fifty different libraries like RESTEasy, Vertx, Hibernate, and Netty. In a previous blog post, I demonstrated how GraalVM can be used to optimize the size of Docker images. GraalVM is an open source, high-performance polyglot virtual machine from Oracle. I use it to compile native images ahead of time to improve startup performance, and reduce the memory consumption and file size of Java Virtual Machine (JVM)-based applications. The framework that allows ahead-of-time-compilation (AOT) is called Substrate.

Application Architecture

First, review the GitHub repository containing the demo application.

Our application is a simple REST-based Create Read Update Delete (CRUD) service that implements basic user management functionalities. All data is persisted in an Amazon DynamoDB table. Quarkus offers an extension for Amazon DynamoDB that is based on AWS SDK for Java V2. This Quarkus extension supports two different programming models: blocking access and asynchronous programming. For local development, DynamoDB Local is also supported. DynamoDB Local is the downloadable version of DynamoDB that lets you write and test applications without accessing the DynamoDB service. Instead, the database is self-contained on your computer. When you are ready to deploy your application in production, you can make a few minor changes to the code so that it uses the DynamoDB service.

The REST-functionality is located in the class UserResource which uses the JAX-RS implementation RESTEasy. This class invokes the UserService that implements the functionalities to access a DynamoDB table with the AWS SDK for Java. All user-related information is stored in a Plain Old Java Object (POJO) called User.

Building the application

To create a Docker container image that can be used in the task definition of my ECS cluster, follow these three steps: build the application, create the Docker Container Image, and push the created image to my Docker image registry.

To build the application, I used Maven with different profiles. The first profile (default profile) uses a standard build to create an uber JAR – a self-contained application with all dependencies. This is very useful if you want to run local tests with your application, because the build time is much shorter compared to the native-image build. When you run the package command, it also execute all tests, which means you need DynamoDB Local running on your workstation.

$ docker run -p 8000:8000 amazon/dynamodb-local -jar DynamoDBLocal.jar -inMemory -sharedDb

$ mvn package

The second profile uses GraalVM to compile the application into a native image. In this case, you use the native image as base for a Docker container. The Dockerfile can be found under src/main/docker/Dockerfile.native and uses a build-pattern called multi-stage build.

$ mvn package -Pnative -Dquarkus.native.container-build=true

An interesting aspect of multi-stage builds is that you can use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base image, and begins a new stage of the build. You can pick the necessary files and copy them from one stage to another, thereby limiting the number of files you have to copy. Use this feature to build your application in one stage and copy your compiled artifact and additional files to your target image. In this case, you use ubi-quarkus-native-image:20.1.0-java11 as base image and copy the necessary TLS-files (SunEC library and the certificates) and point your application to the necessary files with JVM properties.

FROM quay.io/quarkus/ubi-quarkus-native-image:20.1.0-java11 as nativebuilder
RUN mkdir -p /tmp/ssl-libs/lib \
  && cp /opt/graalvm/lib/security/cacerts /tmp/ssl-libs \
  && cp /opt/graalvm/lib/libsunec.so /tmp/ssl-libs/lib/

FROM registry.access.redhat.com/ubi8/ubi-minimal
WORKDIR /work/
COPY target/*-runner /work/application
COPY --from=nativebuilder /tmp/ssl-libs/ /work/
RUN chmod 775 /work
EXPOSE 8080
CMD ["./application", "-Dquarkus.http.host=0.0.0.0", "-Djava.library.path=/work/lib", "-Djavax.net.ssl.trustStore=/work/cacerts"]

In the second and third steps, I have to build and push the Docker image to a Docker registry of my choice which is straight forward:

$ docker build -f src/main/docker/Dockerfile.native -t

$ docker push <repo/image:tag>

Setting up the infrastructure

You’ve compiled the application to a native-image and have built a Docker image. Now, you set up the basic infrastructure consisting of an Amazon Virtual Private Cloud (VPC), an Amazon ECS or Amazon EKS cluster with on AWS Fargate launch type, an Amazon DynamoDB table, and an Application Load Balancer.

Figure 1: Architecture of the infrastructure (for Amazon ECS)

Codifying your infrastructure allows you to treat your infrastructure just as code. In this case, you use the AWS Cloud Development Kit (AWS CDK), an open source software development framework, to model and provision your cloud application resources using familiar programming languages. The code for the CDK application can be found in the demo application’s code repository under eks_cdk/lib/ecs_cdk-stack.ts or ecs_cdk/lib/ecs_cdk-stack.ts. Set up the infrastructure in the AWS Region us-east-1:

$ npm install -g aws-cdk // Install the CDK
$ cd ecs_cdk
$ npm install // retrieves dependencies for the CDK stack
$ npm run build // compiles the TypeScript files to JavaScript
$ cdk deploy  // Deploys the CloudFormation stack

The output of the AWS CloudFormation stack is the load balancer’s DNS record. The heart of our infrastructure is an Amazon ECS or Amazon EKS cluster with AWS Fargate launch type. The Amazon ECS cluster is set up as follows:

const cluster = new ecs.Cluster(this, "quarkus-demo-cluster", {
      vpc: vpc
    });
    
    const logging = new ecs.AwsLogDriver({
      streamPrefix: "quarkus-demo"
    })

    const taskRole = new iam.Role(this, 'quarkus-demo-taskRole', {
      roleName: 'quarkus-demo-taskRole',
      assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com')
    });
    
    const taskDef = new ecs.FargateTaskDefinition(this, "quarkus-demo-taskdef", {
      taskRole: taskRole
    });
    
    const container = taskDef.addContainer('quarkus-demo-web', {
      image: ecs.ContainerImage.fromRegistry("<repo/image:tag>"),
      memoryLimitMiB: 256,
      cpu: 256,
      logging
    });
    
    container.addPortMappings({
      containerPort: 8080,
      hostPort: 8080,
      protocol: ecs.Protocol.TCP
    });

    const fargateService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, "quarkus-demo-service", {
      cluster: cluster,
      taskDefinition: taskDef,
      publicLoadBalancer: true,
      desiredCount: 3,
      listenerPort: 8080
    });

Cleaning up

After you are finished, you can easily destroy all of these resources with a single command to save costs.

$ cdk destroy

Conclusion

In this post, I described how Java applications can be implemented using Quarkus, compiled to a native-image, and ran using Amazon ECS or Amazon EKS on AWS Fargate. I also showed how AWS CDK can be used to set up the basic infrastructure. I hope I’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample template in the source repository or provide feedback in the comments.

We also encourage you to explore how you to optimize your Java application for AWS Lambda with Quarkus.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

The Satellite Ear Tag that is Changing Cattle Management

2020-11-10 Karen Hildebrand

Post Syndicated from Karen Hildebrand original https://aws.amazon.com/blogs/architecture/the-satellite-ear-tag-that-is-changing-cattle-management/

Most cattle are not raised in cities—they live on cattle stations, large open plains, and tracts of land largely unpopulated by humans. It’s hard to keep connected with the herd. Cattle don’t often carry their own mobile phones, and they don’t pay a mobile phone bill. Naturally, the areas in which cattle live, often do not have cellular connectivity or reception. But they now have one way to stay connected: a world-first satellite ear tag.

Ceres Tag co-founders Melita Smith and David Smith recognized the problem given their own farming background. David explained that they needed to know simple things to begin with, such as:

Where are they?
How many are out there?
What are they doing?
What condition are they in?
Are they OK?

Later, the questions advanced to:

Which are the higher performing animals that I want to keep?
Where do I start when rounding them up?
As assets, can I get better financing and insurance if I can prove their location, existence, and condition?

To answer these questions, Ceres Tag first had to solve the biggest challenge, and it was not to get cattle to carry their mobile phones and pay mobile phone bills to generate the revenue needed to get greater coverage. David and Melita knew they needed help developing a new method of tracking, but in a way that aligned with current livestock practices. Their idea of a satellite connected ear tag came to life through close partnership and collaboration with CSIRO, Australia’s national science agency. They brought expertise to the problem, and rallied together teams of experts across public and private partnerships, never accepting “that’s not been done before” as a reason to curtail their innovation.

Figure 1: How Ceres Tag works in practice

Thinking Big: Ceres Tag Protocol

Melita and David constructed their idea and brought the physical hardware to reality. This meant finding strategic partners to build hardware, connectivity partners that provided global coverage at a cost that was tenable to cattle operators, integrations with existing herd management platforms and a global infrastructure backbone that allowed their solution to scale. They showed resilience, tenacity and persistence that are often traits attributed to startup founders and lifelong agricultural advocates. Explaining the purpose of the product often requires some unique approaches to defining the value proposition while fundamentally breaking down existing ways of thinking about things. As David explained, “We have an internal saying, ‘As per Ceres Tag protocol …..’ to help people to see the problem through a new lens.” This persistence led to the creation of an easy to use ear tagging applicator and a two-prong smart ear tag. The ear tag connects via satellite for data transmission, providing connectivity to more than 120 countries in the world and 80% of the earth’s surface.

Figure 2: The Ceres Tag applicator, smart tag, and global satellite connectivity

Unlocking the blocker: data-driven insights

With the hardware and connectivity challenges solved, Ceres Tag turned to how the data driven insights would be delivered. The company needed to select a technology partner that understood their global customer base, and what it means to deliver a low latency solution for web, mobile and API-driven solutions. David, once again knew the power in leveraging the team around him to find the best solution. The evaluation of cloud providers was led by Lewis Frost, COO, and Heidi Perrett, Data Platform Manager. Ceres Tag ultimately chose to partner with AWS and use the AWS Cloud as the backbone for the Ceres Tag Management System.

Figure 3: Ceres Tag conceptual diagram

The Ceres Tag Management System houses the data and metadata about each tag, enabling the traceability of that tag throughout each animal’s life cycle. This includes verification as to whom should have access to their health records and history. Based on the nature of the data being stored and transmitted, security of the application is critical. As a startup, it was important for Ceres Tag to keep costs low, but to also to be able to scale based on growth and usage as it expands globally.

Ceres Tag is able to quickly respond to customers regardless of geography, routing traffic to the appropriate end point. They accomplish this by leveraging Amazon CloudFront as the Content Delivery Network (CDN) for traffic distribution of front-end requests and Amazon Route 53 for DNS routing. A multi-Availability Zone deployment and AWS Application Load Balancer distribute incoming traffic across multiple targets, increasing the availability of your application.

Ceres Tag is using AWS Fargate to provide a serverless compute environment that matches the pay-as-you-go usage-based model. AWS also provides many advanced security features and architecture guidance that has helped to implement and evaluate best practice security posture across all of the environments. Authentication is handled by Amazon Cognito, which allows Ceres Tag to scale easily by supporting millions of users. It leverages easy-to-use features like sign-in with social identity providers, such as Facebook, Google, and Amazon, and enterprise identity providers via SAML 2.0.

The data captured from the ear tag on the cattle is will be ingested via AWS PrivateLink. By providing a private endpoint to access your services, AWS PrivateLink ensures your traffic is not exposed to the public internet. It also makes it easy to connect services across different accounts and VPCs to significantly simplify your network architecture. In leveraging a satellite connectivity provider running on AWS, Ceres Tag will benefit from the AWS Ground Station infrastructure leveraged by the provider in addition to the streaming IoT database.

How Netflix Scales its API with GraphQL Federation (Part 1)

2020-11-09 Netflix Technology Blog

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/how-netflix-scales-its-api-with-graphql-federation-part-1-ae3557c187e2

Netflix is known for its loosely coupled and highly scalable microservice architecture. Independent services allow for evolving at different paces and scaling independently. Yet they add complexity for use cases that span multiple services. Rather than exposing 100s of microservices to UI developers, Netflix offers a unified API aggregation layer at the edge.

UI developers love the simplicity of working with one conceptual API for a large domain. Back-end developers love the decoupling and resilience offered by the API layer. But as our business has scaled, our ability to innovate rapidly has approached an invisible asymptote. As we’ve grown the number of developers and increased our domain complexity, developing the API aggregation layer has become increasingly harder.

In order to address this rising problem, we’ve developed a federated GraphQL platform to power the API layer. This solves many of the consistency and development velocity challenges with minimal tradeoffs on dimensions like scalability and operability. We’ve successfully deployed this approach for Netflix’s studio ecosystem and are exploring patterns and adaptations that could work in other domains. We’re sharing our story to inspire others and encourage conversations around applicability elsewhere.

Case Study: Studio Edge

Intro to Studio Ecosystem

Netflix is producing original content at an accelerated pace. From the time a TV show or a movie is pitched to when it’s available on Netflix, a lot happens behind the scenes. This includes but is not limited to talent scouting and casting, deal and contract negotiations, production and post-production, visual effects and animations, subtitling and dubbing, and much more. Studio Engineering is building hundreds of applications and tools that power these workflows.

Netflix Studio Content Lifecycle — Content Lifecycle

Studio API

Looking back to a few years ago, one of the pains in the studio space was the growing complexity of the data and its relationships. The workflows depicted above are inherently connected but the data and its relationships were disparate and existed in myriads of microservices. The product teams solved for this with two architectural patterns.

1) Single-use aggregation layers — Due to the loose coupling, we observed that many teams spent considerable effort building duplicative data-fetching code and aggregation layers to support their product needs. This was either done by UI teams via BFF (Backend For Frontend) or by a backend team in a mid-tier service.

2) Materialized views for data from other teams — some teams used a pattern of building a materialized view of another service’s data for their specific system needs. Materialized views had performance benefits, but data consistency lagged by varying degrees. This was not acceptable for the most important workflows in the Studio. Inconsistent data across different Studio applications was the top support issue in Studio Engineering in 2018.

Graph API: To better address the underlying needs, our team started building a curated graph API called “Studio API”. Its goal was to provide an unified abstraction on top of data and relationships. Studio API used GraphQL as its underlying API technology and created significant leverage for accessing core shared data. Consumers of Studio API were able to explore the graph and build new features more quickly. We also observed fewer instances of data inconsistency across different UI applications, as every field in GraphQL resolves to a single piece of data-fetching code.

Studio API Architecture Diagram — **Studio API Architecture**

Bottlenecks of Studio API

The One Graph exposed by Studio API was a runaway success; product teams loved the reusability and easy, consistent data access. But new bottlenecks emerged as the number of consumers and amount of data in the graph increased.

First, the Studio API team was disconnected from the domain expertise and the product needs, which negatively impacted the schema’s health. Second, connecting new elements from a back-end into the graph API was manual and ran counter to the rapid evolution promised by a microservice architecture. Finally, it was hard for one small team to handle the increasing operational and support burden for the expanding graph.

We knew that there had to be a better way — unified but decoupled, curated but fast moving.

Returning to Core Principles

To address these bottlenecks, we leaned into our rich history of microservices and breaking monoliths apart. We still wanted to keep the unified GraphQL schema of Studio API but decentralize the implementation of the resolvers to their respective domain teams.

As we were brainstorming the new architecture back in early 2019, Apollo released the GraphQL Federation Specification. This promised the benefits of a unified schema with distributed ownership and implementation. We ran a test implementation of the spec with promising results, and reached out to collaborate with Apollo on the future of GraphQL Federation. Our next generation architecture, “Studio Edge”, emerged with federation as a critical element.

GraphQL Federation Primer

The goal of GraphQL Federation is two-fold: provide a unified API for consumers while also giving backend developers flexibility and service isolation. To achieve this, schemas need to be created and annotated to indicate how ownership is distributed. Let’s look at an example with three core entities:

Movie: At Netflix, we make titles (shows, films, shorts etc.). For simplicity, let’s assume each title is a Movie object.
Production: Each Movie is associated with a Studio Production. A Production object tracks everything needed to make a Movie including shooting location, vendors, and more.
Talent: the people working on a Movie are the Talent, including actors, directors, and so on.

These three domains are owned by three separate engineering teams responsible for their own data sources, business logic, and corresponding microservices. In an unfederated implementation, we would have this simple Schema and Resolvers owned and implemented by the Studio API team. The GraphQL Framework would take in queries from clients and orchestrate the calls to the resolvers in a breadth-first traversal.

Schemas & Resolvers for Studio API — **Schema & Resolvers for Studio API**

To transition to a federated architecture, we need to transfer ownership of these resolvers to their respective domains without sacrificing the unified schema. To achieve this, we need to extend the Movie type across GraphQL service boundaries:

Federating the movie type — **Federating Movie**

This ability to extend a Movie type across GraphQL service boundaries makes Movie a Federated Type. Resolving a given field requires delegation by a gateway layer down to the owning domain services.

Studio Edge Architecture

Using the ability to federate a type, we envisioned the following architecture:

Key Architectural Components

Domain Graph Service (DGS) is a standalone spec-compliant GraphQL service. Developers define their own federated GraphQL schema in a DGS. A DGS is owned and operated by a domain team responsible for that subsection of the API. A DGS developer has the freedom to decide if they want to convert their existing microservice to a DGS or spin up a brand new service.

Schema Registry is a stateful component that stores all the schemas and schema changes for every DGS. It exposes CRUD APIs for schemas, which are used by developer tools and CI/CD pipelines. It is responsible for schema validation, both for the individual DGS schemas and for the combined schema. Last, the registry composes together the unified schema and provides it to the gateway.

GraphQL Gateway is primarily responsible for serving GraphQL queries to the consumers. It takes a query from a client, breaks it into smaller sub-queries (a query plan), and executes that plan by proxying calls to the appropriate downstream DGSs.

Implementation Details

There are 3 main business logic components that power GraphQL Federation.

Schema Composition

Composition is the phase that takes all of the federated DGS schemas and aggregates them into a single unified schema. This composed schema is exposed by the Gateway to the consumers of the graph.

Whenever a new schema is pushed by a DGS, the Schema Registry validates that:

New schema is a valid GraphQL schema
New schema composes seamlessly with the rest of the DGSs schemas to create a valid composed schema
New schema is backwards compatible

If all of the above conditions are met, then the schema is checked into the Schema Registry.

Query Planning and Execution

The federation config consists of all the individual DGS schemas and the composed schema. The Gateway uses the federation config and the client query to generate a query plan. The query plan breaks down the client query into smaller sub-queries that are then sent to the downstream DGSs for execution, along with an execution ordering that includes what needs to be done in sequence versus run in parallel.

Let’s build a simple query from the schema referenced above and see what the query plan might look like.

For this query, the gateway knows which fields are owned by which DGS based on the federation config. Using that information, it breaks the client query into three separate queries to three DGSs. The first query is sent to Movie DGS since the root field movies is owned by that DGS. This results in retrieving the movieId and title fields for the first 10 movies in the dataset. Then using the movieIds it got from the previous request, the gateway executes two parallel requests to Production DGS and Talent DGS to fetch the production and actors fields for those 10 movies. Upon completion, the sub-query responses are merged together and the combined data response is returned to the caller.

A note on performance: Query Planning and Execution adds a ~10ms overhead in the worst case. This includes the compute for building the query plan, as well as the deserialization of DGS responses and the serialization of merged gateway response.

Entity Resolver

Now you might be wondering, how do the parallel sub-queries to Production and Talent DGS actually work? That’s not something that the DGS supports. This is the final piece of the puzzle.

Let’s go back to our federated type Movie. In order for the gateway to join Movie seamlessly across DGSs, all the DGSs that define and extend the Movie need to agree on one or more fields that define the primary key (e.g. movieId). To make this work, Apollo introduced the @key directive in the Federation Spec. Second, DGSs have to implement a resolver for a generic Query field, _entities. The _entities query returns a union type of all the federated types in that DGS. The gateway uses the _entities query to look up Movie by movieId.

Let’s take a look at how the query plan actually looks like

Detailed federated query plan — Detailed Federated Query Plan

The representation object consists of the movieId and is generated from the response of the first request to Movie DGS. Since we requested for the first 10 movies, we would have 10 representation objects to send to Production and Talent DGS.

This is similar to Relay’s Object Identification with a few differences. _Entity is a union type, while Relay’s Node is an interface. Also, with @key, there is support for variable key names and types as well as composite keys while in Relay, the id is a single opaque ID field.

Combined together, these are the ingredients that power the core of a federated API architecture.

The Journey, Summarized

Our Studio Ecosystem architecture has evolved in distinct phases, all motivated by reducing the time between idea and implementation, improving the developer experience, and streamlining operations. The architectural phases look like:

Stay Tuned

Over the past year we’ve implemented the federated API architecture components in our Studio Edge. Getting here required rapid iteration, lots of cross-functional collaborations, a few pivots, and ongoing investment. We’re live with 70 DGSes and hundreds of developers contributing to and using the Studio Edge architecture. In our next Netflix Tech Blog post, we’ll share what we learned along the way, including the cross-cutting concerns necessary to build a holistic solution.

We want to thank the entire GraphQL open-source community for all the generous contributions and paving the path towards the promise of GraphQL. If you’d like to be a part of solving complex and interesting problems like this at Netflix scale, check out our jobs page or reach out to us directly.

By Tejas Shikhare

Additional Credits: Stephen Spalding, Jennifer Shin, Philip Fisher-Ogden, Robert Reta, Antoine Boyer, Bruce Wang, David Simmer

How Netflix Scales its API with GraphQL Federation (Part 1) was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Field Notes: How to Identify and Block Fake Crawler Bots Using AWS WAF

2020-11-04 Vijay Menon

Post Syndicated from Vijay Menon original https://aws.amazon.com/blogs/architecture/field-notes-how-to-identify-and-block-fake-crawler-bots-using-aws-waf/

In this blog post, we focus on how to identify fake bots using these AWS services: AWS WAF, Amazon Kinesis Data Firehose, Amazon S3 and AWS Lambda. We use fake Google/Bing bots to demonstrate, but the principles can be applied to other popular crawlers like Slurp Bot from Yahoo, DuckDuckBot from DuckDuckGo, Alexa crawler from Alexa internet ranking service.

For industries like media, online retailors, news or social websites, content is critical and often sets them apart from other competitors. These companies put in a significant amount of effort to make the content as visible and accessible as possible. To do that these companies rely on crawler bots, so that legitimate users searching for content can find the content easily. Crawler bots are useful for indexing the site pages and helping make the content more searchable and improve rankings.

However, this capability can be misused. So it is important to distinguish between genuine crawler bots and fake ones that are doing more than just indexing your site. It’s important to properly identify good and bad actors so that you can stop the bad ones without impacting the ability of good ones, and at scale. This helps in driving more traffic, visitors, and more revenue from your websites.

Identifying bots

There are two primary sources of information required to identify a fake bot:

HTTP Header User-Agent: Fake bots try to present themselves as real bots, for example as Google or Bing, by using the same user agent string used by Google or Bing.
IP Address: You can look at the source IP address of the incoming request and determine if it belongs to the search engine provider network like Google or Bing. You can do this by performing a forward and reverse look up and comparing the results. These methods are well documented by the search engine providers Google and Bing.

Solution Overview

The solution leverages the capabilities of AWS WAF. Our demonstration application is a static website hosted on Amazon S3 fronted by Amazon CloudFront. This means that we can provide permission of access to the S3 bucket only to CloudFront using origin access identity.

The logs are streamed in near real time using Amazon Kinesis Data Firehose, inspected using a Lambda function to help identify fake bots before storing the logs on Amazon S3. The Lambda function does two things:

Inspect the traffic using the rules to identify bad or fake bots. In this case it uses forward and reverse DNS lookup results of the Client IP address of packets with User-Agent string resembling GoogleBot or BingBot.
Once identified as a fake bot, the Lambda function updates AWS WAF IP-Set to permanently block the requests coming from IP addresses of fake bots.

Note: For the sake of this demonstration, we are using a static website hosted on Amazon S3 with CloudFront. This requires the AWS WAF and IP-Set used by AWS WAF to be of scope ‘CLOUDFRONT’. You can modify it to use scope ‘REGIONAL’ if you chose to protect your web properties behind Application Load Balancer or API Gateway.

Solution Architecture

Prerequisites

Readers of this blog post should be familiar with HTTP and the following AWS services:

For this walkthrough, you should have the following:

AWS Account
AWS Command Line Interface (AWS CLI): You need AWS CLI installed and configured on the workstation from where you are going to try the steps mentioned below.
Credentials configured in AWS CLI should have the required IAM permissions to spin up and modify the resources mentioned in this post.
Make sure that you deploy the solution to us-east-1 Region and your AWS CLI default Region is us-east-1. If us-east-1 is not the default Region, reference the Regions explicitly while executing AWS CLI commands using --region us-east-1 switch.
Amazon S3 bucket in us-east-1 region

Walkthrough

1. Create an Amazon S3 static Website and put it behind an Amazon CloudFront distribution. You can follow these steps. Note down the CloudFront distribution ID. We will use it in subsequent steps.

2. Download and unzip the file containing CloudFormation template and lambda function code from here to a folder in your local workstation. You will need to run all of the subsequent commands from this folder.

3. Zip the lambda function lambda_function.py and upload it to an Amazon S3 bucket of your choice in us-east-1. Note the bucket name as it will be used in the subsequent steps. The blog uses waf-logs-fake-bots-us-east-1 for reference as the S3 bucket name.

$ zip lambda_function.py lambda_function.py.zip $ aws s3 cp lambda_function.py.zip s3://waf-logs-fake-bots-us-east-1/

4. Create the resources required for this blog post by deploying the AWS CloudFormation template and running the below command:

aws cloudformation create-stack \
--stack-name FakeBotBlockBlog \
--template-body file://BotBlog.yml \
--parameters ParameterKey=KinesisBufferIntervalSeconds,ParameterValue=900 ParameterKey=KinesisBufferSizeMB,ParameterValue=3 ParameterKey=IPSetName,ParameterValue=BlockFakeBotIPSet ParameterKey=IPSetScope,ParameterValue=CLOUDFRONT ParameterKey=S3BucketWithDeploymentPackage,ParameterValue=waf-logs-fake-bots-us-east-1 ParameterKey=DeploymentPackageZippedFilename,ParameterValue=lambda_function.py.zip \
--capabilities CAPABILITY_IAM \
--region us-east-1

You need to provide the following information, and you can change the parameters based on your specific needs:

a. KinesisBufferIntervalSeconds and KinesisBufferSizeMB. These will define the interval at which Kinesis Firehose ships the logs to Amazon S3, the Default is 900 seconds and 3MB respectively, whichever is met first.

b. IPSetName. Name of the IP Set which will be used to record the client IP address of fake bots. Default value is BlockFakeBotIPSet.

c. IPSetScope. Scope of the IP Set. I am using CLOUDFRONT and associate it with the CloudFront distribution created in step 1. You can choose to make it REGIONAL in which case WebACLassociation will need to be with an ALB or an API Gateway.

d. S3BucketWithDeploymentPackage. Name of S3 bucket used in step 3. The blog assumes waf-logs-fake-bots-us-east-1.

e. DeploymentPackageZipedFilename. Lambda function filename without the file extension. For example, the blog assumes lambda_function.py.zip is available on the Amazon S3 bucket and uses this value for this parameter.

Some stack templates might include resources that can affect permissions in your AWS account, for example, by creating new AWS Identity and Access Management (IAM) role. For those stacks, you must explicitly acknowledge this by specifying CAPABILITY_IAM or CAPABILITY_NAMED_IAM value for the –capabilities parameter.

Stack creation will take you approximately 5-7 minutes. Check the status of the stack by executing the below command every few minutes. You should see StackStatus value as CREATE_COMPLETE.

Example:

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog | grep StackStatus

The CloudFormation template will create the following resources:

IP Set for AWS WAF
WebACL with rules to block the client IP addresses of fake bots, and an AWS-managed common rule set.
Lambda function to help detect fake bots and modify the AWS WAF IP Set to block them
Kinesis Firehose delivery stream, which will use the above Lambda function for processing
IAM roles with required permissions for the Lambda function and Kinesis Firehose
S3 bucket for AWS WAF logs

5. Enable logging for the WebACL using AWS CLI. For this you need the ARN of the WebACL and Kinesis Firehose. You can find that information from the output of the CloudFormation stack created in step 4 using the below AWS CLI command

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog

Please note the 2 ARNs and run the following commands by replacing (1) ResourceArn value with WebACL ARN and (2) LogDestinationConfigs value with Kinesis Firehose delivery stream ARN.

Example:

aws wafv2 put-logging-configuration –-logging-configuration ResourceArn=arn:aws:wafv2:us-east-1:123456789012:global/webacl/FakeBotWebACL/259ea98f-24ba-4acd-8803-3e7d02e8d482,LogDestinationConfigs=arn:aws:firehose:us-east-1:123456789012:deliverystream/aws-waf-logs-FakeBotBlockBlog --region us-east-1

6. Associate CloudFront distribution with this WebACL: Sign in to the AWS Management Console and open the AWS WAF and Shield console at https://console.aws.amazon.com/wafv2/homev2/web-acls?region=global .

Click on the WebACL created earlier in this procedure
Navigate to ‘Associated AWS resources’ tab and select Add AWS resources
In the subsequent screen, select the CloudFront distribution created in step 1
Select Add

Note: If you are using an ALB or API Gateway for your web property, then you need to use REGIONAL WebACL and IP Set. Review the procedure to associate an ALB or API Gateway to the WebACL.

You can monitor WebACL performance from the Overview section of WebACL from the AWS WAF and Shield console.

Testing

To test, you will need to generate some traffic which will trigger the lambda function to detect and block the fake bots created earlier in this blog. The web traffic can be generated from the local machine or from an EC2 instance with access to the internet using curl. Manually set the user agent to resemble Googlebot by running the following command from shell:

Replace http://www.awsdemodesign.com/ with the URL of your CloudFront distribution you created in step 1 of the walkthrough.

for i in {1..1000}; do curl -I -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.awsdemodesign.com/; done

Initially you will see HTTP1.1/ 200 OK response. This will trigger Lambda and modify the IP Set to include your public IP address to be blocked. You can verify that by inspecting the IP set from AWS WAF and Shield console.

Sign in to the AWS Management Console and open the AWS WAF and Shield console
Click on IP Set created earlier in this blog. In the subsequent screen you can see your public IP address in the list of IP addresses.

If you run the curl command again, you will see that the response now is HTTP/1.1 403 Forbidden.

Clean Up

Disassociate the CloudFront Distribution from WebACL
Delete the S3 bucket and CloudFront Distribution created in Step 1
Empty and delete the S3 bucket created by the CloudFormation stack for AWS WAF logs.
Delete CloudFormation stack created in Step 4.

Conclusion

In this blog post, we demonstrated how you can set up and inspect incoming web traffic using AWS Lambda, AWS WAF native logging capabilities, and Kinesis Firehose to help detect and block bad or fake bots at scale. Furthermore, the solution outlined in this post provides a framework which can be extended to identify similar unwanted traffic impersonating as other good bots.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecting for Reliable Scalability

2020-11-03 Marwan Al Shawi

Post Syndicated from Marwan Al Shawi original https://aws.amazon.com/blogs/architecture/architecting-for-reliable-scalability/

Cloud solutions architects should ideally “build today with tomorrow in mind,” meaning their solutions need to cater to current scale requirements as well as the anticipated growth of the solution. This growth can be either the organic growth of a solution or it could be related to a merger and acquisition type of scenario, where its size is increased dramatically within a short period of time.

Still, when a solution scales, many architects experience added complexity to the overall architecture in terms of its manageability, performance, security, etc. By architecting your solution or application to scale reliably, you can avoid the introduction of additional complexity, degraded performance, or reduced security as a result of scaling.

Generally, a solution or service’s reliability is influenced by its up time, performance, security, manageability, etc. In order to achieve reliability in the context of scale, take into consideration the following primary design principals.

Modularity

Modularity aims to break a complex component or solution into smaller parts that are less complicated and easier to scale, secure, and manage.

Figure 1: Monolithic architecture vs. modular architecture

Modular design is commonly used in modern application developments. where an application’s software is constructed of multiple and loosely coupled building blocks (functions). These functions collectively integrate through pre-defined common interfaces or APIs to form the desired application functionality (commonly referred to as microservices architecture).

Figure 2: Scalable modular applications

For more details about building highly scalable and reliable workloads using a microservices architecture, refer to Design Your Workload Service Architecture.

This design principle can also be applied to different components of the solution’s architecture. For example, when building a cloud solution on a single Amazon VPC, it may reach certain scaling limits and make it harder to introduce changes at scale due to the higher level of dependencies. This single complex VPC can be divided into multiple smaller and simpler VPCs. The architecture based on multiple VPCs can vary. For example, the VPCs can be divided based on a service or application building block, a specific function of the application, or on organizational functions like a VPC for various departments. This principle can also be leveraged at a regional level for very high scale global architectures. You can make the architecture modular at a global level by distributing the multiple VPCs across different AWS Regions to achieve global scale (facilitated by AWS Global Infrastructure).

In addition, modularity promotes separation of concerns by having well-defined boundaries among the different components of the architecture. As a result, each component can be managed, secured, and scaled independently. Also, it helps you avoid what is commonly known as “fate sharing,” where a vertically scaled server hosts a monolithic application, and any failure to this server will impact the entire application.

Horizontal scaling

Horizontal scaling, commonly referred to as scale-out, is the capability to automatically add systems/instances in a distributed manner in order to handle an increase in load. Examples of this increase in load could be the increase of number of sessions to a web application. With horizontal scaling, the load is distributed across multiple instances. By distributing these instances across Availability Zones, horizontal scaling not only increases performance, but also improves the overall reliability.

In order for the application to work seamlessly in a scale-out distributed manner, the application needs to be designed to support a stateless scaling model, where the application’s state information is stored and requested independently from the application’s instances. This makes the on-demand horizontal scaling easier to achieve and manage.

This principle can be complemented with a modularity design principle, in which the scaling model can be applied to certain component(s) or microservice(s) of the application stack. For example, only scale-out Amazon Elastic Cloud Compute (EC2) front-end web instances that reside behind an Elastic Load Balancing (ELB) layer with auto-scaling groups. In contrast, this elastic horizontal scalability might be very difficult to achieve for a monolithic type of application.

Leverage the content delivery network

Leveraging Amazon CloudFront and its edge locations as part of the solution architecture can enable your application or service to scale rapidly and reliably at a global level, without adding any complexity to the solution. The integration of a CDN can take different forms depending on the solution use case.

For example, CloudFront played an important role to enable the scale required throughout Amazon Prime Day 2020 by serving up web and streamed content to a worldwide audience, which handled over 280 million HTTP requests per minute.

Go serverless where possible

As discussed earlier in this post, modular architectures based on microservices reduce the complexity of the individual component or microservice. At scale it may introduce a different type of complexity related to the number of these independent components (microservices). This is where serverless services can help to reduce such complexity reliably and at scale. With this design model you no longer have to provision, manually scale, maintain servers, operating systems, or runtimes to run your applications.

For example, you may consider using a microservices architecture to modernize an application at the same time to simplify the architecture at scale using Amazon Elastic Kubernetes Service (EKS) with AWS Fargate.

Figure 3: Example of a serverless microservices architecture

In addition, an event-driven serverless capability like AWS Lambda is key in today’s modern scalable cloud solutions, as it handles running and scaling your code reliably and efficiently. See How to Design Your Serverless Apps for Massive Scale and 10 Things Serverless Architects Should Know for more information.

Secure by design

To avoid any major changes at a later stage to accommodate security requirements, it’s essential that security is taken into consideration as part of the initial solution design. For example, if the cloud project is new or small, and you don’t consider security properly at the initial stages, once the solution starts to scale, redesigning the entire cloud project from scratch to accommodate security best practices is usually not a simple option, which may lead to consider suboptimal security solutions that may impact the desired scale to be achieved. By leveraging CDN as part of the solution architecture (as discussed above), using Amazon CloudFront, you can minimize the impact of distributed denial of service (DDoS) attacks as well as perform application layer filtering at the edge. Also, when considering serverless services and the Shared Responsibility Model, from a security lens you can delegate a considerable part of the application stack to AWS so that you can focus on building applications. See The Shared Responsibility Model for AWS Lambda.

Design with security in mind by incorporating the necessary security services as part of the initial cloud solution. This will allow you to add more security capabilities and features as the solution grows, without the need to make major changes to the design.

Design for failure

The reliability of a service or solution in the cloud depends on multiple factors, the primary of which is resiliency. This design principle becomes even more critical at scale because the failure impact magnitude typically will be higher. Therefore, to achieve a reliable scalability, it is essential to design a resilient solution, capable of recovering from infrastructure or service disruptions. This principle involves designing the overall solution in such a way that even if one or more of its components fail, the solution is still be capable of providing an acceptable level of its expected function(s). See AWS Well-Architected Framework – Reliability Pillar for more information.

Conclusion

Designing for scale alone is not enough. Reliable scalability should be always the targeted architectural attribute. The design principles discussed in this blog act as the foundational pillars to support it, and ideally should be combined with adopting a DevOps model.

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 2

2020-10-29 Gary Emmerton

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-2/

In part 1 of this blog I introduced the need for organizations to securely connect thousands of IoT devices with many different systems in the hyperconnected world that exists today, and how that can be addressed using AWS IoT Greengrass and AWS Secrets Manager. We walked through the creation of ServiceNow credentials in AWS Secrets Manager, the creation of IAM roles and the Lambda functions that will run on our edge device (a Raspberry Pi).

In this second part of the blog, we will setup AWS IoT Greengrass, on our Raspberry Pi, and AWS IoT Core so that we can run the AWS Lambda functions and access our ServiceNow credentials, retrieved securely from AWS Secrets Manager.

Setting up AWS IoT Core and AWS IoT Greengrass

The overall sequence for configuring AWS IoT Core and AWS IoT Greengrass is:

Create a certificate, and IoT Thing and link them
Create AWS IoT Greengrass group
Associate IAM role to the AWS IoT Greengrass group
Create and attach a policy to the certificate
Create an AWS IoT Greengrass Resource Definition for our ‘Secret’
Create an AWS IoT Greengrass Function Definition for our Lambda functions
Create an AWS IoT Greengrass Subscription Definition for IoT Topics to be used
Finally associate our Resource, Function and Subscription Definitions with our AWS IoT Greengrass Core

Steps

For this walkthrough, I have selected the AWS region “eu-west-1”, however, feel free to use other Regions where AWS IoT Core and AWS IoT Greengrass are available.

First, let’s install Greengrass on the Raspberry Pi:

Follow the instructions to configure the pre-requisites on the Raspberry Pi
Then we download the AWS IoT Greengrass software
And then we unzip the AWS IoT Greengrass software using the following command (note, this command is for version 1.10.0 of Greengrass and will change as later versions are released):

sudo tar -xzvf greengrass-linux-armv6l-1.10.0.tar.gz -C /

Note that AWS IoT Greengrass must be compatible with the version of the AWS Greengrass SDK installed to identify what versions are compatible and use sudo pip3 install greengrasssdk==<version_number> to install the SDK compatible with the version of AWS IoT Greengrass that we installed.

Our AWS IoT Greengrass core will authenticate with AWS IoT Core in AWS using certificates, so we need to generate these first using the following command:

aws iot create-keys-and-certificate --set-as-active --certificate-pem-outfile "iot-ge.cert.pem" --public-key-outfile "iot-ge.public.key" --private-key-outfile "iot-ge.private.key"

This command will generate three files containing the private key, public key and certificate. All of these files need to be copied to the /greengrass/certs folder on the Raspberry Pi. Also, the output of the preceding command will give the ARN of the certificate – we need to make a note of this ARN as we will use it in the next steps.

We also need to download a copy of the Amazon Root CA into the /greegrass/certs folder using the command below:

sudo wget -O root.ca.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem

For the next step we need our AWS account number and IoT Host address unique to our account – we get the IoT Host address using the command:

aws iot describe-endpoint --endpoint-type iot:Data-ATS

Now we need to create a config.json file on the Raspberry Pi in the /greengrass/config folder, with the account number and IoT Host address obtained in the previous step;

{
  "coreThing" : {
    "caPath" : "root.ca.pem",
    "certPath" : "iot-ge.cert.pem",
    "keyPath" : "iot-ge.private.key",
    "thingArn" : "arn:aws:iot:eu-west-1:<aws_account_number>:thing/IoT-blog_Core",
    "iotHost" : "<endpoint_address>",
    "ggHost" : "greengrass-ats.iot.eu-west-1.amazonaws.com",
    "keepAlive" : 600
  },
  "runtime" : {
    "cgroup" : {
      "useSystemd" : "yes"
    },
    "allowFunctionsToRunAsRoot" : "yes"
  },
  "managedRespawn" : false,
  "crypto" : {
    "principals" : {
      "SecretsManager" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key"
      },
      "IoTCertificate" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key",
        "certificatePath" : "file:///greengrass/certs/iot-ge.cert.pem"
      }
    },
    "caPath" : "file:///greengrass/certs/root.ca.pem"
  }
}

Note that the line "allowFunctionsToRunAsRoot" : "yes" allows the Lambda functions to easily access the SenseHat on the Raspberry Pi. This configuration should normally be avoided in Production environments for security reasons but has been used here for simplicity.

Next we create the IoT Thing to represent our Raspberry Pi to match the entry we added into the config.json file previously:

aws iot create-thing --thing-name IoT-blog_Core

Now that our config.json file is in place and our IoT ‘thing’ created we can start the AWS IoT Greengrass software using the following commands:

cd /greengrass/ggc/core/
sudo ./greengrassd start

Then we attach the certificate to our new Thing – we need the ARN of the certificate that was noted in the earlier steps when we created the certificates:

aws iot attach-thing-principal --thing-name "IoT-blog_Core" --principal "<certificate_arn>"

Now we create the AWS IoT Greengrass group – make a note of the Group ID in the output of this command as we use it later:

aws greengrass create-group --name IoT-blog-group

Next we create the AWS IoT Greengrass Core definition file – create this using a text editor and save as core-def.json

{
  "Cores": [
    {
      "CertificateArn": "<certificate_arn>",
      "Id": "<IoT Thing Name>",
      "SyncShadow": true,
      "ThingArn": "<thing_arn>"
    }
  ]
}

Then, using the preceding file we just created, we create the core definition using the following command:

aws greengrass create-core-definition --name "IoT-blog_Core" --initial-version file://core-def.json

Now we associate the AWS IoT Greengrass core with the AWS IoT Greengrass group – we need the LatestVersionARN from the output of the command above and the group ID of your existing AWS IoT Greengrass group (in the output from the command for creation of the group in previous steps):

aws greengrass create-group-version --group-id "<greengrass_group_id>" --core-definition-version-arn "<core_definition_version_arn>"

Then we associate the IAM role (created earlier) to the AWS IoT Greengrass group;

aws greengrass associate-role-to-group --group-id "<greengrass_group_id>" --role-arn "arn:aws:iam::<aws_account_number>:role/IoTGGRole"

We need to create a policy to associate with the certificate so that our AWS IoT Greengrass Core (authenticated/authorized by our certificates) has rights to interact with AWS IoT Core. To do this we create the policy.json file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iot:Publish",
        "iot:Subscribe",
        "iot:Connect",
        "iot:Receive"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:GetThingShadow",
        "iot:UpdateThingShadow",
        "iot:DeleteThingShadow"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "greengrass:*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Then create the policy using the policy file using the command below:

aws iot create-policy --policy-name myGGPolicy --policy-document file://policy.json

And finally attach our new policy to the certificate – as the certificate is attached to our AWS IoT Greengrass Core, this gives the rights defined in the policy to our AWS IoT Greengrass Core;

aws iot attach-policy --target "<certificate_arn>" --policy-name "myGGPolicy"

Now we have the AWS IoT Greengrass Core and permissions in place, it’s time to add our Secret as a resource for AWS IoT Greengrass.

First, we need to create a resource definition that refers to the ARN of the secret we created earlier. Get the ARN of the secret using the following command:

aws secretsmanager describe-secret --secret-id "greengrass-snow-creds"

And then we create a text file containing the following and save it as resource.json:

{
"Resources": [
    {
      "Id": "SNOW-Credentials",
      "Name": "SNOW-Credentials",
      "ResourceDataContainer": {
        "SecretsManagerSecretResourceData": {
          "ARN": "<secret_arn>"
        }
      }
    }
  ]
}

Now we command to create the resource reference in IoT to the Secret:

aws greengrass create-resource-definition --name "MySNOWSecret" --initial-version file://resource.json

Note the Resource ID from the output as it is needed as it has to be added to the Lambda definition json file in the next steps. The function definition file contains the details of the Lambda function(s) that we will attach to our AWS IoT Greengrass group. We create a text file with the content below and save as lambda-def.json.

We also specify a couple of variables in the definition file; these are the same as the environment variables that can be specified for Lambda, but they make the variables available in AWS IoT Greengrass.

Note, if we specify environment variables for the functions in the Lambda console then these will NOT be available when the function is running under AWS IoT Greengrass. We will need our ServiceNow API URL to add to the configuration below, and this will be in the form of https://devXXXXX.service-now.com/api/now/table/incident, where XXXXX is the developer instance number assigned by ServiceNow when our instance is created.

We need the ARNs of the Lambda functions that we created in part 1 of the blog – these appear in the output after successfully creating the functions from the command line, or can be obtained using the aws lambda list-functions command – we need to have the ‘:1’ at the end of the ARN as AWS IoT Greengrass needs to reference published function versions.

{
  "DefaultConfig": {
    "Execution": {
      "IsolationMode": "NoContainer",
      "RunAs": {
        "Gid": 0,
        "Uid": 0
      }
    }
  },
  "Functions": [
    {
      "FunctionArn": "<lambda_function1_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "Variables": { 
            "tempLimit": "30",
            "humidLimit": "50"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": true,
        "Timeout": 10
      },
    "Id": "sensorLambda"
    },
    {
      "FunctionArn": "<lambda_function2_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "ResourceAccessPolicies": [
            {
              "Permission": "ro",
              "ResourceId": "SNOW-Credentials"
            }
          ],
          "Variables": { 
            "snowUrl": "<service_now_api_url>"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": false,
        "Timeout": 10
      },
    "Id": "anomalyLambda"
    }
  ]
}

The Lambda function now needs to be registered within our AWS IoT Greengrass core using the definition file just created, using the following command:

aws greengrass create-function-definition --name "IoT-blog-lambda" --initial-version file://lambda-def.json

Create Subscriptions

We now need to create some IoT Topics to pass data between the two Lambda functions and also to submit all sensor data to AWS IoT Core, which gives us visibility of the successful collection of sensor data.cd.

First, let’s create a subscription configuration file (subscriptions.json) for sensor data and anomaly data:

{
  "Subscriptions": [
    {
      "Id": "SensorData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/sensorData",
      "Target": "cloud"
    },
    {
      "Id": "AnomalyData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "<lambda_function2_arn>:1"
    },
    {
      "Id": "AnomalyDataB",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "cloud"
    }
  ]
}

And next, we run the command to create the subscription from this configuration:

aws greengrass create-subscription-definition --name "IoT-sensor-subs" --initial-version file://subscriptions.json

Update AWS IoT Greengrass Group Associations and Deploy

Now that the functions, subscriptions and resources have been defined, we run the following command to update our AWS IoT Greengrass group to the new version with those components included:

aws greengrass create-group-version --group-id <gg_group_id> --core-definition-version-arn "<core_def_version_arn>" --function-definition-version-arn "<function_def_version_arn>" --resource-definition-version-arn "<resource_def_version_arn>" --subscription-definition-version-arn "<subscription_def_version_arn>"

And finally, we can deploy our configuration. Use the following command to deploy the Greengrass group to our device, using the group-version-id from the output of the previous command and also the group-id:

aws greengrass create-deployment --deployment-type NewDeployment --group-id <gg_group_id> --group-version-id <gg_group_version_id>

Summarized below is the integration between the different functions and components that we have now deployed to get from our sensor data through to an incident being raised in ServiceNow:

Raspberry PI

Create an Incident

Everything is setup now from an IoT perspective, so we can attempt to trigger a threshold breach on the sensors to trigger the creation of an incident in ServiceNow. In order to trigger the incident creation, let’s raise the humidity around the sensor so that it breaches the threshold defined in the environment variables of the Lambda function.

Under normal conditions we will just see the data published by the first Lambda function in the IoTBlog/sensorData topic:

IoTblog sensordata

However, when a threshold is breached (in our example, humidity above 50%), the data is published to the IoTBlog/anomaly topic as shown below:

ioTblog Anomaly

Via the AWS IoT Greengrass subscriptions created earlier, this message arriving in the anomaly topic also triggers the second Lambda function to create the ticket in ServiceNow.

The log for the second Lambda function on AWS IoT Greengrass (stored in /greengrass/ggc/var/log/user/<region>/<aws_account_number>/ on the Raspberry Pi) will show a ‘201’ return code if the incident is successfully created in ServiceNow.

Now let’s log on to ServiceNow and check out our new incident. Good news, our new incident appears correctly:

And when we click on our incident we can see the detail, including the full data from the IoT topic in the Activities section;

This is only a basic use of the ServiceNow API and there are many other parameters that you can use to increase the richness of the incident, refer to the ServiceNow API documentation for more details.

Cleaning up

To avoid incurring future charges, delete the resources that you created in the walkthrough.

Conclusion

We have built an IoT device (Raspberry Pi), running AWS IoT Greengrass, AWS Lambda, and using ServiceNow credentials managed in AWS Secrets Manager. Using this we have triggered an anomaly event that has created an incident automatically in ServiceNow, directly from the Lambda function running on our Pi. You can use this architecture as the foundation to integrate your edge devices and ITSM solution to automate ticket generation in your organization.

Look out for follow-up blogs that will extend this solution to provide a real-time dashboard for the sensor data and store the sensor data in a Data Lake for historical visualization.

Find out more about deploying Secrets to AWS IoT Greengrass Core.

Check out the AWS IoT Blog for more examples of how to use AWS to integrate your edge devices with the AWS Cloud.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 1

2020-10-28 Gary Emmerton

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-1/

IT Security is a hot topic in every organization, and in a hyper connected world the need to integrate thousands of IoT devices securely with many different systems at scale is critical.

AWS Secrets Manager helps customers manage their system credentials securely in the AWS Cloud, and with its integration with AWS IoT Greengrass, that capability now extends out to your edge-connected IoT devices.

In this two part blog post, I will walk through the steps to use this integration to give edge devices the capability to connect to and log incidents directly into ServiceNow. The credentials for connecting to ServiceNow are created in AWS Secrets Manager, and deployed locally (encrypted) to the edge device via AWS IoT Greengrass.

Part 1 (this post) gives an overview of the whole solution and the steps for setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions. In part 2 of the blog we will then set up AWS IoT Greengrass and AWS IoT Core so that we can run the functions and access the secret on our edge device (Raspberry Pi).

Enabling edge devices to automatically raise incidents in your organization’s ITSM toolset ensures that you can use existing workflows and incident escalation paths for your edge devices. Previously this would have been challenging to integrate. Additionally, by running this capability at the edge, it enables quicker responses and reduces the need to make calls back to the AWS Cloud.

Overview of solution

The solution makes use of a Raspberry Pi, running AWS IoT Greengrass. AWS Lambda functions on AWS IoT Greengrass capture temperature and humidity sensor data and make calls directly to ServiceNow when thresholds are breached. The integration of AWS Secrets Manager with AWS IoT Core enables the credentials required for the ServiceNow API calls to be available locally. These calls are encrypted and available on the Pi for the Lambda function to use.

The sensors for temperature and humidity in this example are on a Raspberry Pi Sense Hat, which illustrates sensors that could be used in an industrial or manufacturing use case. You can use any type of sensor such as vibration, strain gauges, or other electro-mechanical sensors.

One of the Lambda functions running on AWS IoT Greengrass on the RPi captures the sensor readings, and should a threshold on either be exceeded then it triggers a second Lambda function (again running on AWS IoT Greengrass). This then makes a ‘create incident’ API call to ServiceNow, using the credentials stored in AWS Secrets Manager.

In order to have visibility of the sensor data, and to manage communications between the first and second Lambda functions, all data from the sensors is published to one IoT Topic Data related to any threshold breaches is published to another IoT Topic.

Following is a high-level diagram for the architecture used in this blog.

ServiceNow RA

Prerequisites

To complete the steps in this blog, you need:

An AWS Account
A ServiceNow developer instance or other test ServiceNow instance that you can access – You can sign-up for a free ServiceNow developer account
A Raspberry Pi (I used a Pi 3B with Raspbian Buster)
A Raspberry Pi SenseHat
A workstation with the latest AWS Command Line Interface (CLI) installed

Additionally, ensure that you have Python 3.7.x installed with the following Python modules on the Raspberry Pi (Raspian Buster includes Python 3.7 by default). Install the following packages as the root user (sudo pip):

greengrasssdk
boto3
requests
sense_hat
datetime

I have taken the approach of having these modules installed on the Pi in order to simplify the creation of the Lambda function. This ensures the function will run locally on the Pi rather than having to build all of the Python modules on the Pi and then zip them to run the Lambda in an AWS IoT Greengrass container.

Walkthrough

The steps in the walkthrough can be achieved from the AWS console. I have focused on the command line approach, using the AWS CLI, as this will give a more detailed view of what is happening and the dependencies between the different components. The overall sequence for the steps is:

Create secret in AWS Secrets Manager – this will contain the credentials required to access ServiceNow for Lambda running on AWS IoT Greengrass
Create IAM role – provides permissions for AWS IoT Greengrass to other AWS services, including AWS Secrets Manager
Create Lambda functions – the functions that will capture sensor data and create the Service Now ticket
Configure and Deploy IoT Core – deploy our configuration to the Raspberry Pi, covered in Part 2 of this blog

I’ve structured the order of the steps in a logical sequence so that any dependencies of later steps are created first. There are a number of places where a value in the output of one command (such as an ARN) needs to be noted as required for subsequent commands. At times the steps may seem counter-intuitive, but whilst developing this blog, I found this sequence has proven to be the most effective.

Create Secret

First, we create our ‘Secret’ in AWS Secrets Manager. This consists of a secret string containing a JSON object for the username and password required for authentication to the API of my ServiceNow developer instance. For the purposes of this example, we will use the default encryption key for the AWS Secrets Manager service.

The following command, from the AWS CLI, is used to create the new secret, entering the relevant username and password for the ServiceNow instance.

IMPORTANT: the name of the secret must start with “greengrass-“(specified in the command after the “–name” parameter) because the IAM Greengrass managed service role (which we will use later) has permission by default to access secrets that start with this text.

aws secretsmanager create-secret --name "greengrass-snow-creds" --description "Credentials for ServiceNow API access" --secret-string '{"username":"&lt;username&gt;","password":"&lt;password&gt;"}'

The successful completion of the *preceding command results in an output on your terminal screen, containing the ARN (Amazon Resource Name) for the new secret.

Create IAM roles

In order for AWS IoT Greengrass to access our new secret and download it securely to AWS IoT Greengrass running on the Pi, we need to give it an IAM role. If we do not set this up correctly then there will be an error when trying to deploy the AWS IoT Greengrass group later in the walkthrough.

Our new IAM role needs a policy document that describes the permissions that we will give the role. The first step is to create the IAM policy document, containing only the permissions for the Greengrass service this new role – to do this we create a text file called assume-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "greengrass.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTGGRole" --assume-role-policy-document file://assume-role.json

And finally we need to attach the IAM managed AWS IoT Greengrass service role to our new custom role:

aws iam attach-role-policy --role-name "IoTGGRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSGreengrassResourceAccessRolePolicy

We also need a role for our Lambda functions with basic execution permissions; create a text file called lambda-role.json containing the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then we run the following command, referencing the file just created:

aws iam create-role --role-name "IoTLambdaRole" --assume-role-policy-document file://lambda-role.json

And finally we need to attach the IAM managed Lambda basic execution role to our new custom role:

aws iam attach-role-policy --role-name "IoTLambdaRole" --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create Lambda Functions

We now create the Lambda functions that will be deployed to Greengrass – these functions manage the interactions between the sensors and ServiceNow.

Lambda Function 1 – Get/Publish Sensor Data

The first Lambda function reads the sensor data and publishes the data to an IoT Topic (IoTBlog/sensorData), every 5 seconds, which can then be used by downstream services for analytics. This function also determines whether a threshold has been breached and if so, it publishes the data to a separate IoT Topic (IotBlog/anomaly) to which our second Lambda function is subscribed.

import os
import json
from datetime import datetime
import time
import sys
from sense_hat import SenseHat
import greengrasssdk
import boto3

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
sense = SenseHat()
sense.clear()

t_threshold = int(os.environ['tempLimit'])
h_threshold = int(os.environ['humidLimit'])

def lambda_handler(event, context):
    return
 
# Get sensor data and check against thresholds
def getSensorData():
    while True:
        eventTitle = "no event"
        anomaly = False
        ts = time.time()
        dt = datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
        
        temp     = round(sense.get_temperature(),2)
        humidity = round(sense.get_humidity(),2)
        
        time.sleep(5)

        if (temp > int(t_threshold)):
            anomaly = True
            eventTitle = "Temperature breach"
            
        if (humidity > int(h_threshold)):
            anomaly = True
            eventTitle = "Humidity breach"

        sensorData = { 'title': eventTitle,'dt':dt,'ts':ts,'t':temp,'h':humidity }
        publishData(anomaly,sensorData)

# Publish sensor data to IoT topic(s)
def publishData(anomaly,myData):
    response = client.publish(
        topic = 'IoTBlog/sensorData',
        payload = json.dumps(myData) )

We create a text file lambda_function.py containing the code above and then compress into a zip file named lambda_function_1.zip. Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "1-IoT-GetSensorData" --runtime python3.7 --zip-file fileb://lambda_function_1.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;aws_account_number&gt;:role/IoTLambdaRole

In order to make use of Lambda functions in AWS IoT Greengrass we then need to publish a version of the function using the following command:

aws lambda publish-version --function-name "1-IoT-GetSensorData"

Lambda Function 2 – Create Anomaly Ticket

The second Lambda function is triggered through its subscription to the anomaly data published to the anomaly IoT Topic by the first Lambda function. This then makes a call to the ServiceNow API to create an incident. Prior to making the API call, the function obtains the ServiceNow credentials from the secret that has been made available to AWS IoT Greengrass.

As this is a Resource within the AWS IoT Greengrass Core, it is automatically downloaded to the Raspberry Pi as part of the deployment of the AWS IoT Greengrass Core.

import os
import json
import sys
import greengrasssdk
import requests

client = greengrasssdk.client('iot-data')
secClient = greengrasssdk.client('secretsmanager')
secret_name = "greengrass-snow-creds"
snow_instance = os.environ['snowUrl']
msg = ""

def publishAnomaly(msg,title):
    auth = getLocalSecret()
    createTicket(auth,msg,title)

# Get secret from GG
def getLocalSecret():
    secret = secClient.get_secret_value(SecretId=secret_name)
    rawSecret = secret.get('SecretString')
    return json.loads(str(rawSecret))

# Create ticket in ServiceNow            
def createTicket(auth,eventData,title):
    API_ENDPOINT = snow_instance
    HEADERS = {"Content-Type":"application/json","Accept":"application/json"}
    PARAMS = { 
        "short_description":title,
        "assignment_group":"sensor_team",
        "urgency":"2",
        "impact":"2",
        "comments":eventData
    } 

    request = requests.post(url = API_ENDPOINT, auth=(str(auth["username"]),str(auth["password"])), headers=HEADERS, data = json.dumps(PARAMS))
    print("Response:",request)

def lambda_handler(event, context):
    msg = json.dumps(event)
    msg = json.loads(msg)
    title = "Sensor Threshold - " + msg["title"]
    
    publishAnomaly(msg,title)
    return

We create another text file, lambda_function.py containing the code above and then compress into a zip file named lambda_function_2.zip. Once this is done we can then create the function in AWS using the following command:

aws lambda create-function --function-name "2-IoT-ServiceNow" --runtime python3.7 --zip-file fileb://lambda_function_2.zip --handler lambda_function.lambda_handler –role arn:aws:iam::&lt;account_number&gt;:role/IoTLambdaRole

In order to make use of Lambda functions in Greengrass we then need to publish a version of the function using the following command:

aws lambda publish-version --function-name "2-IoT-ServiceNow"

Conclusion

In this post, I showed you the steps for integrating IoT and ITSM by setting up AWS Secrets Manager, creating the required IAM roles and AWS Lambda functions. Now you can proceed to part 2 of the blog to set up AWS IoT-Core and AWS IoT Greengrass to make use of the secret and functions that you created in this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

The challenges of CEA

Tower automation and edge platform

Cloud pipeline/platform: analytics and visualization

Completing the data-driven loop

Conclusion

Architecture Overview

Prerequisites

Get ready to deploy the CloudFormation stacks with CDK

Create a new AWS Cloud9 environment

Download and install the dependencies and demo CDK application

Create the IAM service role

Create the EKS cluster

Deploy the stack containing the EKS cluster in a new VPC

Configure kubectl with access to cluster

Leveraging Infrastructure CI/CD

Deploy the stack containing the code repo and pipeline

Review the pipeline for the first time

Clone the new AWS CodeCommit repo

Deploy the stack containing the Cloud9 IDE with kubectl and CodeCommit repo

Configuring the new Cloud9 environment

Delete the old Cloud9 environment

Trigger the pipeline to change the infrastructure

Cluster is now manageable by code

Clean up

Additional resources

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

Spot Blueprints overview

What are Spot best practices?

Example tutorial

Cleanup

Conclusion

Overview of solution

Prerequisites

Walkthrough

Set up the Samba file share

Configure Lambda Function for AWS IoT Greengrass

Deploy AWS IoT Greengrass Group

Generate test data

Setup AWS IoT Analytics

Visualize the Data from AWS IoT Analytics in Amazon QuickSight

Cleaning up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Prerequisites

Walkthrough

Duplication of Access Control List (ACL)

Migrate files and folders using AWS DataSync

Cleaning up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture overview

What is Apache Flink?

KDA and Apache Flink

How Amazon SageMaker fits into this puzzle

Ready for a test drive

Conclusion

AWS Amplify Console

Amazon API Gateway

Amazon CloudFront and AWS Lambda@Edge

Conclusion

Prerequisites

Architecture Overview

Architecture Walkthrough

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Architecture overview

How does Serverless Image Handler work?

Code example of image manipulation

Conclusion

Solution Overview

Clean Up

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Prerequisites

Walkthrough

Step 1

Step 2

Step 3

Step 4

Configure `kubectl` with access to cluster

Deploy the stack containing the Cloud9 IDE with `kubectl` and CodeCommit repo