Tag Archives: Technical How-to

Field Notes: Set Up a Highly Available Database on AWS with IBM Db2 Pacemaker

2021-09-24 Sai Parthasaradhi

Post Syndicated from Sai Parthasaradhi original https://aws.amazon.com/blogs/architecture/field-notes-set-up-a-highly-available-database-on-aws-with-ibm-db2-pacemaker/

Many AWS customers need to run mission-critical workloads—like traffic control system, online booking system, and so forth—using the IBM Db2 LUW database server. Typically, these workloads require the right high availability (HA) solution to make sure that the database is available in the event of a host or Availability Zone failure.

This HA solution for the Db2 LUW database with automatic failover is managed using IBM Tivoli System Automation for Multiplatforms (Tivoli SA MP) technology with IBM Db2 high availability instance configuration utility (db2haicu). However, this solution is not supported on AWS Cloud deployment because the automatic failover may not work as expected.

In this blog post, we will go through the steps to set up an HA two-host Db2 cluster with automatic failover managed by IBM Db2 Pacemaker with quorum device setup on a third EC2 instance. We will also set up an overlay IP as a virtual IP pointing to a primary instance initially. This instance is used for client connections and in case of failover, the overlay IP will automatically point to a new primary instance.

IBM Db2 Pacemaker is an HA cluster manager software integrated with Db2 Advanced Edition and Standard Edition on Linux (RHEL 8.1 and SLES 15). Pacemaker can provide HA and disaster recovery capabilities on AWS, and an alternative to Tivoli SA MP technology.

Note: The IBM Db2 v11.5.5 database server implemented in this blog post is a fully featured 90-day trial version. After the trial period ends, you can select the required Db2 edition when purchasing and installing the associated license files. Advanced Edition and Standard Edition are supported by this implementation.

Overview of solution

For this solution, we will go through the steps to install and configure IBM Db2 Pacemaker along with overlay IP as virtual IP for the clients to connect to the database. This blog post also includes prerequisites, and installation and configuration instructions to achieve an HA Db2 database on Amazon Elastic Compute Cloud (Amazon EC2).

Figure 1. Cluster management using IBM Db2 Pacemaker

Prerequisites for installing Db2 Pacemaker

To set up IBM Db2 Pacemaker on a two-node HADR (high availability disaster recovery) cluster, the following prerequisites must be met.

Set up instance user ID and group ID.

Instance user id and group id’s must be set up as part of Db2 Server installation which can be verified as follows:

grep db2iadm1 /etc/group
grep db2inst1 /etc/group

Set up host names for all the hosts in /etc/hosts file on all the hosts in the cluster.

For both of the hosts in the HADR cluster, ensure that the host names are set up as follows.

Format: ipaddress fully_qualified_domain_name alias

Install kornshell (ksh) on both of the hosts.

sudo yum install ksh -y

Ensure that all instances have TCP/IP connectivity between their ethernet network interfaces.
Enable password less secure shell (ssh) for the root and instance user IDs across both instances.After the password less root ssh is enabled, verify it using the “ssh <host name> -l root ls” command (hostname is either an alias or fully-qualified domain name).

ssh <host name> -l root ls

Activate HADR for the Db2 database cluster.
Make available the IBM Db2 Pacemaker binaries in the /tmp folder on both hosts for installation. The binaries can be downloaded from IBM download location (login required).

Installation steps

After completing all prerequisites, run the following command to install IBM Db2 Pacemaker on both primary and standby hosts as root user.

cd /tmp
tar -zxf Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64.tar.gz
cd Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64/RPMS/

dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -y
dnf install */*.rpm -y

cp /tmp/Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64/Db2/db2cm /home/db2inst1/sqllib/adm

chmod 755 /home/db2inst1/sqllib/adm/db2cm

Run the following command by replacing the -host parameter value with the alias name you set up in prerequisites.

/home/db2inst1/sqllib/adm/db2cm -copy_resources
/tmp/Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64/Db2agents -host <host>

After the installation is complete, verify that all required resources are created as shown in Figure 2.

ls -alL /usr/lib/ocf/resource.d/heartbeat/db2*

Figure 2. List of heartbeat resources

Configuring Pacemaker

After the IBM Db2 Pacemaker is installed on both primary and standby hosts, initiate the following configuration commands from only one of the hosts (either primary or standby hosts) as root user.

Create the cluster using db2cm utility.Create the Pacemaker cluster using db2cm utility using the following command. Before running the command, replace the -domain and -host values appropriately.

/home/db2inst1/sqllib/adm/db2cm -create -cluster -domain <anydomainname> -publicEthernet eth0 -host <primary host alias> -publicEthernet eth0 -host <standby host alias>

Note: Run ifconfig to get the –publicEthernet value and replace in the former command.

Create instance resource model using the following commands.Modify -instance and -host parameter values in the following command before running.

/home/db2inst1/sqllib/adm/db2cm -create -instance db2inst1 -host <primary host alias>
/home/db2inst1/sqllib/adm/db2cm -create -instance db2inst1 -host <standby host alias>

Create the database instance using db2cm utility. Modify -db parameter value accordingly.

/home/db2inst1/sqllib/adm/db2cm -create -db TESTDB -instance db2inst1

After configuring Pacemaker, run crm status command from both the primary and standby hosts to check if the Pacemaker is running with automatic failover activated.

Figure 3. Pacemaker cluster status

Quorum device setup

Next, we shall set up a third lightweight EC2 instance that will act as a quorum device (QDevice) which will act as a tie breaker avoiding a potential split-brain scenario. We need to install only corsync-qnetd* package from the Db2 Pacemaker cluster software.

Prerequisites (quorum device setup)

Update /etc/hosts file on Db2 primary and standby instances to include the host details of QDevice EC2 instance.
Set up password less root ssh access between Db2 instances and the QDevice instance.
Ensure TCP/IP connectivity between the Db2 instances and the QDevice instance on port 5403.

Steps to set up quorum device

Run the following commands on the quorum device EC2 instance.

cd /tmp
tar -zxf Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64.tar.gz
cd Db2_v11.5.5.0_Pacemaker_20201118_RHEL8.1_x86_64/RPMS/
dnf install */corosync-qnetd* -y

Run the following command from one of the Db2 instances to join the quorum device to the cluster by replacing the QDevice value appropriately.

/home/db2inst1/sqllib/adm/db2cm -create -qdevice <hostnameofqdevice>

Verify the setup using the following commands.

From any Db2 servers:

/home/db2inst1/sqllib/adm/db2cm -list

From QDevice instance:

corosync-qnetd-tool -l

Figure 4. Quorum device status

Setting up overlay IP as virtual IP

For HADR activated databases, virtual IP provides a common connection point for the clients so that in case of failovers there is no need to update the connection strings with the actual IP address of the hosts. Furtermore, the clients can continue to establish the connection to the new primary instance.

We can use the overlay IP address routing on AWS to send the network traffic to HADR database servers within Amazon Virtual Private Cloud (Amazon VPC) using a route table so that the clients can connect to the database using the overlay IP from the same VPC (any Availability Zone) where the database exists. aws-vpc-move-ip is a resource agent from AWS which is available along with the Pacemaker software that helps to update the route table of the VPC.

If you need to connect to the database using overlay IP from on-premises or outside of the VPC (different VPC than database servers), then additional setup is needed using either AWS Transit Gateway or Network Load Balancer.

Prerequisites (setting up overlay IP as virtual IP)

Choose the overlay IP address range which needs to be configured. This IP should not be used anywhere in the VPC or on-premises, and should be a part of the private IP address range as defined in RFC 1918. If the VPC is conﬁgured in the range of 0.0.0.0/8 or 172.16.0.0/12, we can use the overlay IP from the range of 192.168.0.0/16.We will use the following IP and ethernet settings.

192.168.1.81/32
eth0

To route traffic through overlay IP, we need to disable source and target destination checks on the primary and standby EC2 instances.

aws ec2 modify-instance-attribute –profile <AWS CLI profile> –instance-id EC2-instance-id –no-source-dest-check

Steps to configure overlay IP

The following commands can be run as root user on the primary instance.

Create the following AWS Identity and Access Management (IAM) policy and attach it to the instance profile. Update region, account_id, and routetableid values.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": "arn:aws:ec2:<region>:<account_id>:route-table/<routetableid>"
    },
    {
      "Sid": "Stmt1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}

Add the overlay IP on the primary instance.

ip address add 192.168.1.81/32 dev eth0

Update the route table (used in Step 1) with the overlay IP specifying the node with the Db2 primary instance. The following command returns True.

aws ec2 create-route –route-table-id <routetableid> –destination-cidr-block 192.168.1.81/32 –instance-id <primrydb2instanceid>

Create a file overlayip.txt with the following command to create the resource manager for overlay ip.

overlayip.txt

primitive db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP ocf:heartbeat:aws-vpc-move-ip \
params ip=192.168.1.81 routing_table=<routetableid> interface=<ethernet> profile=<AWS CLI profile name> \
op start interval=0 timeout=180s \
op stop interval=0 timeout=180s \
op monitor interval=30s timeout=60s

eifcolocation db2_db2inst1_db2inst1_TESTDB_AWS_primary-colocation inf:

db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP:Started
db2_db2inst1_db2inst1_TESTDB-clone:Master
order order-rule-db2_db2inst1_db2inst1_TESTDB-then-primary-oip Mandatory:

db2_db2inst1_db2inst1_TESTDB-clone db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP
location prefer-node1_db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP

db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP 100: <primaryhostname>
location prefer-node2_db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP

db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP 100: <standbyhostname>

The following parameters must be replaced in the resource manager create command in the file.

- Name of the database resource agent (This can be found through crm config show | grep primitive | grep DBNAME command. For this example, we will use: db2_db2inst1_db2inst1_TESTDB)
- Overlay IP address (created earlier)
- Routing table ID (used earlier)
- AWS command-line interface (CLI) profile name
- Primary and standby host names

After the file with commands is ready, run the following command to create the overlay IP resource manager.

crm configure load update overlayip.txt

Next, create the VIP resource manager—not in managed state. Run the following command to manage and start the resource.

crm resource manage db2_db2inst1_db2inst1_TESTDB_AWS_primary-OIP

Validate the setup with crm status command.

Figure 5. Pacemaker cluster status along with overlay IP resource

Test failover with client connectivity

For the purpose of this testing, launch another EC2 instance with Db2 client installed, and catalog the Db2 database server using overlay IP.

Figure 6. Database directory list

Establish a connection with the Db2 primary instance using the cataloged alias (created earlier) using overlay IP address.

Figure 7. Connect to database

If we connect to the primary instance and check the applications connected, we can see the active connection from the client’s IP as shown in Figure 8.

Figure 8. Check client connections before failover

Next, let’s stop the primary Db2 instance and check if the Pacemaker cluster promoted the standby to primary and we can still connect to the database using the overlay IP, which now points to the new primary instance.

If we check the CRM status from the new primary instance, we can see that the Pacemaker cluster has promoted the standby database to new primary database as shown in Figure 9.

Figure 9. Automatic failover to standby

Let’s go back to our client and reestablish the connection using the cataloged DB alias created using overlay IP.

Figure 10. Database reconnection after failover

If we connect to the new promoted primary instance and check the applications connected, we can see the active connection from the client’s IP as shown in Figure 11.

Figure 11. Check client connections after failover

Cleaning up

To avoid incurring future charges, terminate all EC2 instances which were created as part of the setup referencing this blog post.

Conclusion

In this blog post, we have set up automatic failover using IBM Db2 Pacemaker with overlay (virtual) IP to route traffic to secondary database instance during failover, which helps to reconnect to the database without any manual intervention. In addition, we can also enable automatic client reroute using the overlay IP address to achieve a seamless failover connectivity to the database for mission-critical workloads.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Managing federated schema with AWS Lambda and Amazon S3

2021-09-22 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/managing-federated-schema-with-aws-lambda-and-amazon-s3/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

GraphQL schema management is one of the biggest challenges in the federated setup. IMDb has 19 subgraphs (graphlets) – each of them owns and publishes a part of the schema as a part of an independent CI/CD pipeline.

To manage federated schema effectively, IMDb introduced a component called Schema Manager. This is responsible for fetching the latest schema changes and validating them before publishing it to the Gateway.

Part 1 presents the migration from a monolithic REST API to a federated GraphQL (GQL) endpoint running on AWS Lambda. This post focuses on schema management in federated GQL systems. It shows the challenges that the teams faced when designing this component and how we addressed them. It also shares best practices and processes for schema management, based on our experience.

Comparing monolithic and federated GQL schema

In the standard, monolithic implementation of GQL, there is a single file used to manage the whole schema. This makes it easier to ensure that there are no conflicts between the new changes and the earlier schema. Everything can be validated at the build time and there is no risk that external changes break the endpoint during runtime.

This is not true for the federated GQL endpoint. The gateway fetches service definitions from the graphlets on runtime and composes the overall schema. If any of the graphlets introduces a breaking change, the gateway fails to compose the schema and won’t be able to serve the requests.

The more graphlets we federate to, the higher the risk of introducing a breaking change. In enterprise scale systems, you need a component that protects the production environment from potential downtime. It must notify graphlet owners that they are about to introduce a breaking change, preferably during development before releasing the change.

Federated schema challenges

There are other aspects of handling federated schema to consider. If you use AWS Lambda, the default schema composition increases the gateway startup time, which impacts the endpoint’s performance. If any of the declared graphlets are unavailable at the time of schema composition, there may be gateway downtime or at least an incomplete overall schema. If schemas are pre-validated and stored in a highly available store such as Amazon S3, you mitigate both of these issues.

Another challenge is schema consistency. Ideally, you want to propagate the changes to the gateway in a timely manner after a schema change is published. You also need to consider handling field deprecation and field transfer across graphlets (change of ownership). To catch potential errors early, the system should support dry-run-like functionality that will allow developers to validate changes against the current schema during the development stage.

The Schema Manager

To mitigate these challenges, the Gateway/Platform team introduces a Schema Manager component to the workload. Whenever there’s a deployment in any of the graphlet pipelines, the schema validation process is triggered.

Schema Manager fetches the most recent sub-schemas from all the graphlets and attempts to compose an overall schema. If there are no errors and conflicts, a change is approved and can be safely promoted to production.

In the case of a validation failure, the breaking change is blocked in the graphlet deployment pipeline and the owning team must resolve the issue before they can proceed with the change. Deployments of graphlet code changes also depend on this approval step, so there is no risk that schema and backend logic can get out of sync, when the approval step blocks the schema change.

Integration with the Gateway

To handle versioning of the composed schema, a manifest file stores the locations of the latest approved set of graphlet schemas. The manifest is a JSON file mapping the name of the graphlet to the S3 key of the schema file, in addition to the endpoint of the graphlet service.

The file name of each graphlet schema is a hash of the schema. The Schema Manager pulls the current manifest and uses the hash of the validated schema to determine if it has changed:

{
   "graphlets": {
     "graphletOne": {
        "schemaPath": "graphletOne/1a3121746e75aafb3ca9cccb94f23d89",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletOne"
     },
     "graphletTwo": { 
        "schemaPath": "graphletTwo/213362c3684c89160a9b2f40cd8f191a",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletTwo"
     },
     ...
  }
}

Based on these details, the Gateway fetches the graphlet schemas from S3 as part of service startup and stores them in the in-memory cache. It later polls for the updates every 5 minutes.

Using S3 as the schema store addresses the latency, availability and validation concerns of fetching schemas directly from the graphlets on runtime.

Eventual schema consistency

Since there are multiple graphlets that can be updated at the same time, there is no guarantee that one schema validation workflow will not overwrite the results of another.

For example:

SchemaUpdater 1 runs for graphlet A.
SchemaUpdater 2 runs for graphlet B.
SchemaUpdater 1 pulls the manifest v1.
SchemaUpdater 2 pulls the manifest v1.
SchemaUpdater 1 uploads manifest v2 with change to graphlet A
SchemaUpdater 2 uploads manifest v3 that overwrites the changes in v2. Contains only changes to graphlet B.

This is not a critical issue because no matter which version of the manifest wins in this scenario both manifests represent a valid schema and the gateway does not have any issues. When SchemaUpdater is run for graphlet A again, it sees that the current manifest does not contain the changes uploaded before, so it uploads again.

To reduce the risk of schema inconsistency, Schema Manager polls for schema changes every 15 minutes and the Gateway polls every 5 minutes.

Local schema development

Schema validation runs automatically for any graphlet change as a part of deployment pipelines. However, that feedback loop happens too late for an efficient schema development cycle. To reduce friction, the team uses a tool that performs this validation step without publishing any changes. Instead, it would output the results of the validation to the developer.

The Schema Validator script can be added as a dependency to any of the graphlets. It fetches graphlet’s schema definition described in Schema Definition Language (SDL) and passes it as payload to Schema Manager. It performs the full schema validation and returns any validation errors (or success codes) to the user.

Best practices for federated schema development

Schema Manager addresses the most critical challenges that come from federated schema development. However, there are other issues when organizing work processes at your organization.

It is crucial for long term maintainability of the federated schema to keep a high-quality bar for the incoming schema changes. Since there are multiple owners of sub-schemas, it’s good to keep a communication channel between the graphlet teams so that they can provide feedback for planned schema changes.

It is also good to extract common parts of the graph to a shared library and generate typings and the resolvers. This lets the graphlet developers benefit from strongly typed code. We use open-source libraries to do this.

Conclusion

Schema Management is a non-trivial challenge in federated GQL systems. The highest risk to your system availability comes with the potential of introducing breaking schema change by one of the graphlets. Your system cannot serve any requests after that. There is the problem of the delayed feedback loop for the engineers working on schema changes and the impact of schema composition during runtime on the service latency.

IMDb addresses these issues with a Schema Manager component running on Lambda, using S3 as the schema store. We have put guardrails in our deployment pipelines to ensure that no breaking change is deployed to production. Our graphlet teams are using common schema libraries with automatically generated typings and review the planned schema changes during schema working group meetings to streamline the development process.

These factors enable us to have stable and highly maintainable federated graphs, with automated change management. Next, our solution must provide mechanisms to prevent still-in-use fields from getting deleted and to allow schema changes coordinated between multiple graphlets. There are still plenty of interesting challenges to solve at IMDb.

For more serverless learning resources, visit Serverless Land.

Building federated GraphQL on AWS Lambda

2021-09-20 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-federated-graphql-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

IMDb is the world’s most popular source for movie, TV, and celebrity content. It deals with a complex business domain including movies, shows, celebrities, industry professionals, events, and a distributed ownership model. There are clear boundaries between systems and data owned by various teams.

Historically, IMDb uses a monolithic REST gateway system that serves clients. Over the years, it has become challenging to manage effectively. There are thousands of files, business logic that lacks clear ownership, and unreliable integration tests tied to the data. To fix this, the team used GraphQL (GQL). This is a query language for APIs that lets you request only the data that you need and a runtime for fulfilling those queries with your existing data.

It’s common to implement this protocol by creating a monolithic service that hosts the complete schema and resolves all fields requested by the client. It is good for applications with a relatively small domain and clear, single-threaded ownership. IMDb chose the federated approach, that allows us to federate GQL requests to all existing data teams. This post shows how to build federated GraphQL on AWS Lambda.

Overview

This article covers migration from a monolithic REST API and monolithic frontend to a federated backend system powering a modern frontend. It enumerates challenges in the earlier system and explains why federated GraphQL addresses these problems.

I present the architecture overview and describe the decisions made when designing the new system. I also present our experiences with developing and running high-traffic and high-visibility pages on the new endpoint – improvement in IMDb’s ownership model, development lifecycle, in addition to ease of scaling.

Comparing GraphQL with federated GraphQL

Federated GraphQL allows you to combine GraphQLs APIs from multiple microservices into a single API. Clients can make a single request and fetch data from multiple sources, including joining across data sources, without additional support from the source services.

This is an example schema fragment:

type TitleQuote {
  "Quote ID"
  id: ID!
  "Is this quote a spoiler?"
  isSpoiler: Boolean!
  "The lines that make up this quote"
  lines: [TitleQuoteLine!]!
  "Votes from users about this quote..."
  interestScore: InterestScore!
  "The language of this quote"
  language: DisplayableLanguage!
}
"A specific line in the Title Quote. Can be a verbal line with characters speaking or stage directions"
type TitleQuoteLine {
  "The characters who speak this line, e.g.  'Rick'. Not required: a line may be non-verbal"
  characters: [TitleQuoteCharacter!]
  "The body of the quotation line, e.g 'Here's looking at you kid. '.  Not required: you may have stage directions with no dialogue."
  text: String
  "Stage direction, e.g. 'Rick gently places his hand under her chin and raises it so their eyes meet'. Not required."
  stageDirection: String
}

This is an example monolithic query: “Get the 2 top quotes from The A-Team (title identifier: tt0084967)”:

{ 
  title(id:"tt0084967"){ 
    quotes(first:2){ 
      lines { text } 
    } 
  }
}

Here is an example response:

{ 
  "data": { 
    "title": { 
      "quotes": { 
        "lines": [
          { 
            "text": "I love it when a plan comes together!"
          },
          {
            "text": "10 years ago a crack commando unit was sent to prison by a military court for a crime they didn't commit..."
          }
        ]
      } 
    }
  }
}

This is an example federated query: “What is Jackie Chan (id nm0000329) known for? Get the text, rating and image for each title”

{
  name(id: "nm0000329") {
    knownFor(first: 4) {
      title {
        titleText {
          text
        }
        ratingsSummary {
          aggregateRating
        }
        primaryImage {
          url
        }
      }
    }
  }
}

The monolithic example fetches quotes from a single service. In the federated example, knownFor, titleText, ratingsSummary, primaryImage are fetched transparently by the gateway from separate services. IMDb federates the requests across 19 graphlets, which are transparent to the clients that call the gateway.

Architecture overview

IMDb supports three channels for users: website, iOS, and Android apps. Each of the channels can request data from a single federated GraphQL gateway, which federates the request to multiple graphlets (sub-graphs). Each of the invoked graphlets returns a partial response, which the gateway merges with responses returned by other graphlets. The client receives only the data that they requested, in the shape specified in the query. This can be especially useful when the developers must be conscious of network usage (for example, over mobile networks).

This is the architecture diagram:

There are two core components in the architecture: the Gateway and Schema Manager, which run on Lambda. The Gateway is a Node.js-based Lambda function that is built on top of open-source Apollo Gateway code. It is customized with code responsible predominantly for handling authentication, caching, metrics, and logging.

Other noteworthy components are Upstream Graphlets and an A/B Testing Service that enables A/B tests in the graph. The Gateway is connected to an Application Load Balancer, which is protected by AWS WAF and fronted by Amazon CloudFront as our CDN layer. This uses Amazon ElastiCache with Redis as the engine to cache partial responses coming from graphlets. All logs and metrics generated by the system are collected in Amazon CloudWatch.

Choosing the compute platform

This uses Lambda, since it scales on demand. IMDb uses Lambda’s Provisioned Concurrency to reduce cold start latency. The infrastructure is abstracted away so there is no need for us to manage our own capacity and handle patches.

Additionally, Application Load Balancer (ALB) has support for directing HTTP traffic to Lambda. This is an alternative to API Gateway. The workload does not need many of the features that API Gateway provides, since the gateway has a single endpoint, making ALB a better choice. ALB also supports AWS WAF.

Using Lambda, the team designed a way to fetch schema updates without needing to fetch the schema with every request. This is addressed with the Schema Manager component. This component improves latency and improves the overall customer experience.

Integration with legacy data services

The main purpose of the federated GQL migration is to deprecate a monolithic service that aggregates data from multiple backend services before sending it to the clients. Some of the data in the federated graph comes from brand new graphlets that are developed with the new system in mind.

However, much of the data powering the GQL endpoint is sourced from the existing backend services. One benefit of running on Lambda is the flexibility to choose the runtime environment that works best with the underlying data sources and data services.

For the graphlets relying on the legacy services, IMDb uses lightweight Java Lambda functions using provided client libraries written in Java. They connect to legacy backends via AWS PrivateLink, fetch the data, and shape it in the format expected by the GQL request. For the modern graphlets, we recommend the graphlet teams to explore Node.js as the first option due to improved performance and ease of development.

Caching

The gateway supports two caching modes for graphlets: static and dynamic. Static caching allows graphlet owners to specify a default TTL for responses returned by a graphlet. Dynamic caching calculates TTL based on a custom caching extension returned with the partial response. It allows graphlet owners to decide on the optimal TTL for content returned by their graphlet. For example, it can be 24 hours for queries containing only static text.

Permissions

Each of the graphlets runs in a separate AWS account. The graphlet accounts grant the gateway AWS account (as AWS principal) invoke permissions on the graphlet Lambda function. This uses a cross-account IAM role in the development environment that is assumed by stacks deployed in engineers’ personal accounts.

Experience with developing on federated GraphQL

The migration to federated GraphQL delivered on expected results. We moved the business logic closer to the teams that have the right expertise – the graphlet teams. At the same time, a dedicated platform team owns and develops the core technical pieces of the ecosystem. This includes the Gateway and Schema Manager, in addition to the common libraries and CDK constructs that can be reused by the graphlet teams. As a result, there is a clear ownership structure, which is aligned with the way IMDb teams are organized.

In terms of operational excellence of the platform team, this reduced support tickets related to business logic. Previously, these were routed to an appropriate data service team with a delay. Integration tests are now stable and independent from underlying data, which increases our confidence in the Continuous Deployment process. It also eliminates changing data as a potential root cause for failing tests and blocked pipelines.

The graphlet teams now have full ownership of the data available in the graph. They own the partial schema and the resolvers that provide data for that part of the graph. Since they have the most expertise in that area, the potential issues are identified early on. This leads to a better customer experience and overall health of the system.

The web and app developers groups are also impacted by the migration. The learning curve was aided by tools like GraphQL Playground and Apollo Client. The teams covered the learning gap quickly and started delivering new features.

One of the main pages at IMDb.com is the Title Page (for example, Shutter Island). This was successfully migrated to use the new GQL endpoint. This proves that the new, serverless federated system can serve production traffic at over 10,000 TPS.

Conclusion

A single, highly discoverable, and well-documented backend endpoint enabled our clients to experiment with the data available in the graph. We were able to clean up the backend API layer, introduce clear ownership boundaries, and give our client powerful tools to speed up their development cycle.

The infrastructure uses Lambda to remove the burden of managing, patching, and scaling our EC2 fleets. The team dedicated this time to work on features and operational excellence of our systems.

Part two will cover how IMDb manages the federated schema and the guardrails used to ensure high availability of the federated GraphQL endpoint.

For more serverless learning resources, visit Serverless Land.

Field Notes: Tracking Overall Equipment Effectiveness with AWS IoT Analytics and Amazon QuickSight

2021-09-18 Shailaja Suresh

Post Syndicated from Shailaja Suresh original https://aws.amazon.com/blogs/architecture/field-notes-tracking-overall-equipment-effectiveness-with-aws-iot-analytics-and-amazon-quicksight/

This post was co-written with Michael Brown, Senior Solutions Architect, Manufacturing at AWS.

Overall equipment effectiveness (OEE) is a measure of how well a manufacturing operation is utilized (facilities, time and material) compared to its full potential, during the periods when it is scheduled to run. Measuring OEE provides a way to obtain actionable insights into manufacturing processes to increase the overall productivity along with reduction in waste.

In order to drive process efficiencies and optimize costs, manufacturing organizations need a scalable approach to accessing data across disparate silos across their organization. In this blog post, we will demonstrate how OEE can be calculated, monitored, and scaled out using two key services: AWS IoT Analytics and Amazon QuickSight.

Overview of solution

We will use the standard OEE formulas for this example:

Table 1. OEE Calculations
Availability = Actual Time / Planned Time (in minutes)
Performance = (Total Units/Actual Time) / Ideal Run Rate
Quality = Good Units Produced / Total Units Produced
*OEE = Availability Performance * Quality**

To calculate OEE, identify the following data for the calculation and its source:

Table 2. Source of supporting data
Supporting Data	Method of Ingest
Total Planned Scheduled Production Time	Manufacturing Execution Systems (MES)
Ideal Production Rate of Machine in Units	MES
Total Production for the Scheduled time	Programmable Logic Controller (PLC), MES
Total Number of Off-Quality units produced	PLC, Quality DB
Total Unplanned Downtime in minutes	PLC

For the purpose of this exercise, we assume that the supporting data is ingested as an MQTT message.

As indicated in Figure 1, the data is ingested into AWS IoT Core and then sent to AWS IoT Analytics by an IoT rule to calculate the OEE metrics. These IoT data insights can then be viewed from a QuickSight dashboard. Specific machine states, like machine idling, could be notified to the technicians through email or SMS by Amazon Simple Notification Service (Amazon SNS). All OEE metrics can then be republished to AWS IoT Core so any other processes can consume them.

Figure 1. Tracking OEE using PLCs with AWS IoT Analytics and QuickSight

Walkthrough

The components of this solution are:

PLC – An industrial computer that has been ruggedized and adapted for the control of manufacturing processes, such as assembly lines, robotic devices, or any activity that requires high reliability, ease of programming, and process fault diagnosis.
AWS IoT Greengrass – Provides a secure way to seamlessly connect your edge devices to any AWS service and to third-party services.
AWS IoT Core – Subscribes to the IoT topics and ingests data into the AWS Cloud for analysis.
AWS IoT rule – Rules give your devices the ability to interact with AWS services. Rules are analyzed and actions are performed based on the MQTT topic stream.
Amazon SNS – Sends notifications to the operations team when the machine downtime is greater than the rule threshold.
AWS IoT Analytics – Filters, transforms, and enriches IoT data before storing it in a time-series data store for analysis. You can set up the service to collect only the data you need from your PLC and sensors and apply mathematical transforms to process the data.
QuickSight – Helps you to visualize the OEE data across multiple shifts from AWS IoT Analytics.
Amazon Kinesis Data Streams – Enables you to build custom applications that process and analyze streaming data for specialized needs.
AWS Lambda – Lets you run code without provisioning or managing servers. In this example, it gets the JSON data records from Kinesis Data Streams and passes it to AWS IOT Analytics.
AWS Database Migration Service (AWS DMS) – Helps migrate your databases to AWS with nearly no downtime. All data changes to the source database (MES, Quality Databases) that occur during the data sync are continuously replicated to the target, allowing the source database to be fully operational during the migration process.

Follow these steps to build an AWS infrastructure to track OEE:

Collect data from the factory floor using PLCs and sensors.

Here is a sample of the JSON data which will be ingested into AWS IoT Core.

In AWS IoT Analytics, a data store needs to be created which is needed to query and gather insights for OEE calculation. Refer to Getting started with AWS IoT Analytics to create a channel, pipeline, and data store. Note that AWS IoT Analytics receives data from the factory sensors and PLCs, as well as through Kinesis Data Streams from AWS DMS. In this blog post, we focus on how the data from AWS IoT Analytics is integrated with QuickSight to calculate OEE.

Create a dataset in AWS IoT Analytics.In this example, one of our targets is to determine the total number of good units produced per shift to calculate the OEE over a one-day time period across shifts. For this purpose, only the necessary data is stored in AWS IoT Analytics as datasets and partitioned for performant analysis. Because the ingested data includes data across all machine states, we want to selectively collect data only when the machine is in a running state. AWS IoT Analytics helps to gather this specific data through SQL queries as shown in Figure 2.

Figure 2. SQL query in IoT Analytics to create a dataset

Cron expressions are expressions that indicate a schedule such that the tasks can be run automatically based on a schedule and frequency. AWS IoT Analytics provides options to query for the datasets at a frequency based on cron expressions.

Because we want to produce daily reports in QuickSight, set the Cron expression as shown in Figure 3.

Figure 3. Cron expression to query data daily

Create an Amazon QuickSight dashboard to analyze the AWS IOT Analytics data.

To connect the AWS IoT Analytics dataset in this example to QuickSight, follow the steps contained in Visualizing AWS IoT Analytics data. After you have created a dataset under QuickSight, you can add calculated fields (Figure 4) as needed. We are creating the following fields to enable the dashboard to show the sum of units produced across shifts.

Figure 4. Adding calculated fields in Amazon QuickSight

We first add a calculated field as DateFromEpoch to produce a date from the ‘epochtime’ key of the JSON as shown in Figure 5.

Figure 5. DateFromEpoch calculated field

Similarly, you can create the following fields using the built-in functions available in QuickSight dataset as shown in Figures 6, 7, and 8.

Figure 6. HourofDay calculated field

Figure 7. ShiftNumber calculated field

Figure 8. ShiftSegregator calculated field

To determine the total number of good units produced, use the formula shown in Figure 9.

Figure 9. Formula for total number of good units produced

After the fields are calculated, save the dataset and create a new analysis with this dataset. Choose the stacked bar combo chart and add the dimensions and measures from Figure 10 to produce the visualization. This analysis shows the sum of good units produced across shifts using the calculated field GoodUnits.

Figure 10. Good units across shifts on Amazon QuickSight dashboard

Calculate OEE.To calculate OEE across shifts, we need to determine the values stated in Table 1. For the sake of simplicity, determine the OEE for Shift 1 and Shift 2 on 6/30.

Let us introduce the calculated field ShiftQuality as in Figure 11.

1. Calculate Quality

Good Units Produced / Total Units Produced

Figure 11. Quality calculation

Add a filter to include only Shift 1 and 2 on 6/30. Change the Range value for the bar to be from .90 to .95 to see the differences in Quality across shifts as in Figure 12.

Figure 12. Quality differences across shifts

1. Calculate Performance

(Total Units/Actual Time) / Ideal Run Rate

For this factory, we know the Ideal Production Rate is 203 units per minute per shift (100,000 units/480 minutes). We already know the actual run time for each shift by excluding the idle and stopped state times. We add a calculated field for ShiftPerformance using the previous formula. Change the range of the bar in the visual to be able to see the differences in performances across the shifts as in Figure 13.

Figure 13. Performance calculation

1. Calculate Availability

Actual Time / Planned Time (in minutes)

The planned time for a shift is 480 minutes. Add a calculated field using the previous formula.

1. Calculate OEE

OEE = Performance * Availability * Quality

Finally, add a calculated field for ShiftOEE as in Figure 14. Include this field as the Bar to be able to see the OEE differences across shifts as in Figure 15.

Figure 14. OEE calculation

Figure 15. OEE across shifts

Shift 3 on 6/28 has the higher of the two OEEs compared in this example.

Note that you can schedule a dataset refresh in QuickSight for everyday as shown in Figure 16. This way you get to see the dataset and the visuals with the most recent data.

Figure 16. Dataset refresh schedule

All the above is a one-time setup to calculate OEE.

Enable an AWS IoT rule to invoke Amazon SNS notifications when a machine is idle beyond the threshold time using AWS IoT rule.

You can create rules to invoke alerts over an Amazon SNS topic by adding an action under AWS IoT core as shown in Figures 17 and 18. In our example, we can invoke alerts to the factory operations team whenever a machine is in idle state. Refer to AWS IoT SQL reference for more information on creating rules through AWS IoT Core rule query statement.

Figure 17. Send messages through SNS

Figure 18. Set up IoT rules

Conclusion

In this blog post, we showed you how to calculate the OEE on factory IoT data by using two AWS IoT services: AWS IoT Core and AWS IoT Analytics. We used the seamless integration of QuickSight with AWS IoT Analytics and also the calculated fields feature of QuickSight to run calculations on industry data with field level formulas.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Amazon SES configuration for an external SMTP provider with Auth0

2021-09-15 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/messaging-and-targeting/amazon-ses-configuration-for-an-external-smtp-provider-with-auth0/

Many organizations are using an external identity provider to manage user identities. With an identity provider (IdP), customers can manage their user identities outside of AWS and give these external user identities permissions to use AWS resources in customer AWS accounts. The most common requirement when setting up an external identity provider is sending outgoing emails, such as verification e-mails using a link or code, welcome e-mails, MFA enrollment, password changes and blocked account e-mails. This said, most external identity providers’ existing e-mail infrastructure is limited to testing e-mails only and customers need to set up an external SMTP provider for outgoing e-mails.

Managing and running e-mail servers on-premises or deploying an EC2 instance dedicated to run a SMTP server is costly and complex. Customers have to manage operational issues such as hardware, software installation, configuration, patching, and backups.

In this blog post, we will provide step-by-step guidance showing how you can set up Amazon SES as an external SMTP provider with Auth0 to take advantage of Amazon SES capabilities like sending email securely, globally, and at scale.

Amazon Simple Email Service (SES) is a cost-effective, flexible, and scalable email service that enables developers to send email from within any application. You can configure Amazon SES quickly to support several email use cases, including transactional, marketing, or mass email communications.

Auth0 is an identity provider that provides flexible, drop-in solution to add authentication and authorization services (Identity as a Service, or IDaaS) to customer applications. Auth0’s built-in email infrastructure should be used for testing emails only. Auth0 allows you to configure your own SMTP email provider so you can more completely manage, monitor, and troubleshoot your email communications.

Overview of solution

In this blog post, we’ll show you how to perform the below steps to complete the integration between Amazon SES and Auth0

Amazon SES setup for sending emails with SMTP credentials and API credentials
Auth0 setup to configure Amazon SES as an external SMTP provider
Testing the Configuration

The following diagram shows the architecture of the solution.

Prerequisites

You must have an Auth0 account.
To ensure that emails can be sent from Auth0 to your Amazon SES SMTP, open ports and allow inbound connections from specific IP addresses. To update the list of Ips and ports , navigate to Dashboard > Branding > Email Provider. See Add IP Addresses to Allow list for details.

Amazon SES Setup

As first step, you must configure a “Sandbox” account within Amazon SES and verify a sender email address for initial testing. Once all the setup steps are successful, you can convert this account into Production and the SES service will be accepting all emails and for more details on this topic, please see the Amazon SES documentation.

1. Log in to the Amazon SES console and choose the Verify a New Email Address button.

2. Once the verification is completed, the Verification Status will change to green under Verification Status

3. You need to create SMTP credentials which will be used by Auth0 for sending emails. To create the credentials, click on SMTP settings from left menu and press the Create My SMTP Credentials button.

Please note down the Server Name as it will be required during Auth0 setup.

4. Enter a meaningful username like autho-ses-user and click on Create bottom in the bottom-right page

5. You can see the SMTP username and password on the screen and also, you can download SMTP credentials into a csv file as shown below.

Please note the SMTP User name and SMTP Password as it will be required during Auth0 setup.

6. You need Access key ID and Secret access key of the SES IAM user autho-ses-user as created in step 3 for configuring Amazon SES with API credentials in Auth0.

Navigate to the AWS IAM console and click on Users in left menu
Double click on autho-ses-user IAM user and then, click on Security credentials

Choose on Create access key button to create new Access key ID and Secret access key. You can see the Access key ID and Secret access key on the screen and also, you can download them into a csv file as shown below.

Please note down the Access key ID and Secret access key as it will be required during Auth0 setup.

Auth0 Setup

To ensure that emails can be sent from Auth0 to your Amazon SES SMTP, you need to configure Amazon SES details into Auth0. There are two ways you can use Amazon SES credentials with Auth0, one with SMTP and the other with API credentials.

1. Navigate to auth0 Dashboard, Select Branding and then, Email Provider from left menu. Enable Use my own email provider button as shown below.

2. Let us start with Auth0 configuration with Amazon SES SMTP credentials.

Click on SMTP Provider option as shown below

Provide below SMTP Provider settings as shown below and then, click on Save button complete the setup.
- From: Your from email address.
- Host: Your Amazon SES Server name as created in step 2 of Amazon SES setup. For this example, it is email-smtp.us-west-1.amazonaws.com
- Port: 465
- User Name: Your Amazon SES SMTP user name as created in step 4 of Amazon SES setup.
- Password: Your Amazon SES SMTP password as created in step 4 of Amazon SES setup.

Choose on Send test email button to test Auth0 configuration with Amazon SES SMTP credentials.
You can look at Autho logs to validate your test as shown below.

If you have configured it successfully, you should receive an email from auth0 as shown below.

3. Now, complete Auth0 configuration with Amazon SES API credentials.

Click on Amazon SES as shown below

Provide Amazon SES settings as shown below and then, click on Save button complete the setup.
- From: Your from email address.
- KeyKey Id: Your autho-ses-user IAM user’s Access key ID as created in step 5 of Amazon SES setup.
- Secret access key: Your autho-ses-user IAM user’s Secret access key as created in step 5 of Amazon SES setup.
- Region: For this example, choose us-west-1.

Click on the Send test email button to test Auth0 configuration with Amazon SES API credentials.
You can look at Auth0 logs and If you have configured successfully, you should receive an email from auth0 as illustrated in Auth0 configuration with Amazon SES SMTP credentials section.

Conclusion

In this blog post, we have demonstrated how to setup Amazon SES as an external SMTP email provider with Auth0 as Auth0’s built-in email infrastructure is limited for testing emails. We have also demonstrated how quickly and easily you can setup Amazon SES with SMTP credentials and API credentials. With this solution you can setup your own Amazon SES with Auth0 as an email provider. You can also get a JumpStart by checking the Amazon SES Developer guide, which provides guidance on Amazon SES that provides an easy, cost-effective way for you to send and receive email using your own email addresses and domains.

About the authors

Field Notes: How to Enable Cross-Account Access for Amazon Kinesis Data Streams using Kinesis Client Library 2.x

2021-09-14 Uday Narayanan

Post Syndicated from Uday Narayanan original https://aws.amazon.com/blogs/architecture/field-notes-how-to-enable-cross-account-access-for-amazon-kinesis-data-streams-using-kinesis-client-library-2-x/

Businesses today are dealing with vast amounts of real-time data they need to process and analyze to generate insights. Real-time delivery of data and insights enable businesses to quickly make decisions in response to sensor data from devices, clickstream events, user engagement, and infrastructure events, among many others.

Amazon Kinesis Data Streams offers a managed service that lets you focus on building and scaling your streaming applications for near real-time data processing, rather than managing infrastructure. Customers can write Kinesis Data Streams consumer applications to read data from Kinesis Data Streams and process them per their requirements.

Often, the Kinesis Data Streams and consumer applications reside in the same AWS account. However, there are scenarios where customers follow a multi-account approach resulting in Kinesis Data Streams and consumer applications operating in different accounts. Some reasons for using the multi-account approach are to:

Allocate AWS accounts to different teams, projects, or products for rapid innovation, while still maintaining unique security requirements.
Simplify AWS billing by mapping AWS costs specific to product or service line.
Isolate accounts for specific security or compliance requirements.
Scale resources and mitigate hard AWS service limits constrained to a single account.

The following options allow you to access Kinesis Data Streams across accounts.

Amazon Kinesis Client Library (KCL) for Java or using the MultiLang Daemon for KCL.
Amazon Kinesis Data Analytics for Apache Flink – Cross-account access is supported for both Java and Python. For detailed implementation guidance, review the AWS documentation page for Kinesis Data Analytics.
AWS Glue Streaming – The documentation for AWS Glue describes how to configure AWS Glue streaming ETL jobs to access cross-account Kinesis Data Streams.
AWS Lambda – Lambda currently does not support cross-account invocations from Amazon Kinesis, but a workaround can be used.

In this blog post, we will walk you through the steps to configure KCL for Java and Python for cross-account access to Kinesis Data Streams.

Overview of solution

As shown in Figure 1, Account A has the Kinesis data stream and Account B has the KCL instances consuming from the Kinesis data stream in Account A. For the purposes of this blog post the KCL code is running on Amazon Elastic Compute Cloud (Amazon EC2).

Figure 1. Steps to access a cross-account Kinesis data stream

The steps to access a Kinesis data stream in one account from a KCL application in another account are:

Step 1 – Create AWS Identity and Access Management (IAM) role in Account A to access the Kinesis data stream with trust relationship with Account B.

Step 2 – Create IAM role in Account B to assume the role in Account A. This role is attached to the EC2 fleet running the KCL application.

Step 3 – Update the KCL application code to assume the role in Account A to read Kinesis data stream in Account A.

Prerequisites

KCL for Java version 2.3.4 or later.
AWS Security Token Service (AWS STS) SDK.
Create a Kinesis data stream named StockTradeStream in Account A and a producer to load data into the stream. If you do not have a producer, you can use the Amazon Kinesis Data Generator to send test data into your Kinesis data stream.

Walkthrough

Step 1 – Create IAM policies and IAM role in Account A

First, we will create an IAM role in Account A, with permissions to access the Kinesis data stream created in the same account. We will also add Account B as a trusted entity to this role.

Create IAM policy kds-stock-trade-stream-policy to access Kinesis data stream in Account A using the following policy definition. This policy restricts access to specific Kinesis data stream.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt123",
            "Effect": "Allow",
            "Action": [
                "kinesis:DescribeStream",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kinesis:ListShards",
                "kinesis:DescribeStreamSummary",
                "kinesis:RegisterStreamConsumer"
            ],
            "Resource": [
                "arn:aws:kinesis:us-east-1:Account-A-AccountNumber:stream/StockTradeStream"
            ]
        },
        {
            "Sid": "Stmt234",
            "Effect": "Allow",
            "Action": [
                "kinesis:SubscribeToShard",
                "kinesis:DescribeStreamConsumer"
            ],
            "Resource": [
                "arn:aws:kinesis:us-east-1:Account-A-AccountNumber:stream/StockTradeStream/*"
            ]
        }
    ]
}

Note: The above policy assumes the name of the Kinesis data stream is StockTradeStream.

Create IAM role kds-stock-trade-stream-role in Account A.

aws iam create-role --role-name kds-stock-trade-stream-role --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"AWS\":[\"arn:aws:iam::Account-B-AccountNumber:root\"]},\"Action\":[\"sts:AssumeRole\"]}]}"

Attach the kds-stock-trade-stream-policy IAM policy to kds-stock-trade-stream-role role.

aws iam attach-role-policy --policy-arn arn:aws:iam::Account-A-AccountNumber:policy/kds-stock-trade-stream-policy --role-name kds-stock-trade-stream-role

In the above steps, you will have to replace Account-A-AccountNumber with the AWS account number of the account that has the Kinesis data stream and Account-B-AccountNumber will need to be replaced with the AWS account number of the account that has the KCL application

Step 2 – Create IAM policies and IAM role in Account B

We will now create an IAM role in account B to assume the role created in Account A in Step 1. This role will also grant the KCL application access to Amazon DynamoDB and Amazon CloudWatch in Account B. For every KCL application, a DynamoDB table is used to keep track of the shards in a Kinesis data stream that are being leased and processed by the workers of the KCL consumer application. The name of the DynamoDB table is the same as the KCL application name. Similarly, the KCL application needs access to emit metrics to CloudWatch. Because the KCL application is running in Account B, we want to maintain the DynamoDB table and the CloudWatch metrics in the same account as the application code. For this blog post, our KCL application name is StockTradesProcessor.

Create IAM policy kcl-stock-trader-app-policy, with permissions access to DynamoDB and CloudWatch in Account B, and to assume the kds-stock-trade-stream-role role created in Account A.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AssumeRoleInSourceAccount",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::Account-A-AccountNumber:role/kds-stock-trade-stream-role"
        },
        {
            "Sid": "Stmt456",
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:Scan",
                "dynamodb:PutItem",
                "dynamodb:GetItem",
                "dynamodb:UpdateItem",
                "dynamodb:DeleteItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:Account-B-AccountNumber:table/StockTradesProcessor"
            ]
        },
        {
            "Sid": "Stmt789",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

The above policy gives access to a DynamoDB table StockTradesProcessor. If you change you KCL application name, make sure you change the above policy to reflect the corresponding DynamoDB table name.

Create role kcl-stock-trader-app-role in Account B to assume role in Account A.

aws iam create-role --role-name kcl-stock-trader-app-role --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"ec2.amazonaws.com\"]},\"Action\":[\"sts:AssumeRole\"]}]}"

Attach the policy kcl-stock-trader-app-policy to the kcl-stock-trader-app-role.

aws iam attach-role-policy --policy-arn arn:aws:iam::Account-B-AccountNumber:policy/kcl-stock-trader-app-policy --role-name kcl-stock-trader-app-role

Create an instance profile with a name as kcl-stock-trader-app-role.

aws iam create-instance-profile --instance-profile-name kcl-stock-trader-app-role

Attach the kcl-stock-trader-app-role role to the instance profile.

aws iam add-role-to-instance-profile --instance-profile-name kcl-stock-trader-app-role --role-name kcl-stock-trader-app-role

Attach the kcl-stock-trader-app-role to the EC2 instances that are running the KCL code.

aws ec2 associate-iam-instance-profile --iam-instance-profile Name=kcl-stock-trader-app-role --instance-id <your EC2 instance>

In the above steps, you will have to replace Account-A-AccountNumber with the AWS account number of the account that has the Kinesis data stream, Account-B-AccountNumber will need to be replaced with the AWS account number of the account which has the KCL application and <your EC2 instance id> will need to be replaced with the correct EC2 instance id. This instance profile should be added to any new EC2 instances of the KCL application that are started.

Step 3 – Update KCL stock trader application to access cross-account Kinesis data stream

KCL application in java

To demonstrate the setup for cross-account access for KCL using Java, we have used the KCL stock trader application as the starting point and modified it to enable access to a Kinesis data stream in another AWS account.

After the IAM policies and roles have been created and attached to the EC2 instance running the KCL application, we will update the main class of the consumer application to enable cross-account access.

Setting up the integrated development environment (IDE)

To download and build the code for the stock trader application, follow these steps:

Clone the source code from the GitHub repository to your computer.

$  git clone https://github.com/aws-samples/amazon-kinesis-learning
Cloning into 'amazon-kinesis-learning'...
remote: Enumerating objects: 169, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 169 (delta 16), reused 56 (delta 8), pack-reused 92
Receiving objects: 100% (169/169), 45.14 KiB | 220.00 KiB/s, done.
Resolving deltas: 100% (29/29), done.

Create a project in your integrated development environment (IDE) with the source code you downloaded in the previous step. For this blog post, we are using Eclipse for our IDE, therefore the instructions will be specific to Eclipse project.
1. Open Eclipse IDE. Select File -> Import.
  A dialog box will open, as shown in Figure 2.

Figure 2. Create an Eclipse project

Select Maven -> Existing Maven Projects, and select Next. Then you will be prompted to select a folder location for stock trader application.

Figure 3. Select the folder for your project

Select Browse, and navigate to the downloaded source code folder location. The IDE will automatically detect maven pom.xml.

Select Finish to complete the import. IDE will take 2–3 minutes to download all libraries to complete setup stock trader project.

After the setup is complete, the IDE will look like similar to Figure 4.

Figure 4. Final view of pom.xl file after setup is complete

Open pom.xml, and replace it with the following content. This will add all the prerequisites and dependencies required to build and package the jar application.

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.amazonaws</groupId>
    <artifactId>amazon-kinesis-learning</artifactId>
    <packaging>jar</packaging>
    <name>Amazon Kinesis Tutorial</name>
    <version>0.0.1</version>
    <description>Tutorial and examples for aws-kinesis-client
    </description>
    <url>https://aws.amazon.com/kinesis</url>

    <scm>
        <url>https://github.com/awslabs/amazon-kinesis-learning.git</url>
    </scm>

    <licenses>
        <license>
            <name>Amazon Software License</name>
            <url>https://aws.amazon.com/asl</url>
            <distribution>repo</distribution>
        </license>
    </licenses>

    <properties>
        <aws-kinesis-client.version>2.3.4</aws-kinesis-client.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>software.amazon.kinesis</groupId>
            <artifactId>amazon-kinesis-client</artifactId>
            <version>2.3.4</version>
        </dependency>
        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.2</version>
        </dependency>
		<dependency>
		    <groupId>software.amazon.awssdk</groupId>
		    <artifactId>sts</artifactId>
		    <version>2.16.74</version>
		</dependency>
		<dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-simple</artifactId>
       <version>1.7.25</version>
   </dependency>
    </dependencies>

 	<build>
        <finalName>amazon-kinesis-learning</finalName>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.1</version>

                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>

                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
    
            </plugin>
        </plugins>
    </build>


</project>

Update the main class of consumer application

The updated code for the StockTradesProcessor.java class is shown as follows. The changes made to the class to enable cross-account access are highlighted in bold.

package com.amazonaws.services.kinesis.samples.stocktrades.processor;

import java.util.UUID;
import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.dynamodb.DynamoDbAsyncClient;
import software.amazon.awssdk.services.cloudwatch.CloudWatchAsyncClient;
import software.amazon.awssdk.services.kinesis.KinesisAsyncClient;
import software.amazon.awssdk.services.sts.StsClient; import software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider; import software.amazon.awssdk.services.sts.model.AssumeRoleRequest;
import software.amazon.kinesis.common.ConfigsBuilder;
import software.amazon.kinesis.common.KinesisClientUtil;
import software.amazon.kinesis.coordinator.Scheduler;

/**
 * Uses the Kinesis Client Library (KCL) 2.2.9 to continuously consume and process stock trade
 * records from the stock trades stream. KCL monitors the number of shards and creates
 * record processor instances to read and process records from each shard. KCL also
 * load balances shards across all the instances of this processor.
 *
 */
public class StockTradesProcessor {

    private static final Log LOG = LogFactory.getLog(StockTradesProcessor.class);

    private static final Logger ROOT_LOGGER = Logger.getLogger("");
    private static final Logger PROCESSOR_LOGGER =
            Logger.getLogger("com.amazonaws.services.kinesis.samples.stocktrades.processor.StockTradeRecordProcessor");

    private static void checkUsage(String[] args) {
        if (args.length != 5) {
            System.err.println("Usage: " + StockTradesProcessor.class.getSimpleName()
                    + " <application name> <stream name> <region> <role arn> <role session name>");
            System.exit(1);
        }
    }

    /**
     * Sets the global log level to WARNING and the log level for this package to INFO,
     * so that we only see INFO messages for this processor. This is just for the purpose
     * of this tutorial, and should not be considered as best practice.
     *
     */
    private static void setLogLevels() {
        ROOT_LOGGER.setLevel(Level.WARNING);
        // Set this to INFO for logging at INFO level. Suppressed for this example as it can be noisy.
        PROCESSOR_LOGGER.setLevel(Level.WARNING);
    }
    
    private static AwsCredentialsProvider roleCredentialsProvider(String roleArn, String roleSessionName, Region region) { AssumeRoleRequest assumeRoleRequest = AssumeRoleRequest.builder() .roleArn(roleArn) .roleSessionName(roleSessionName) .durationSeconds(900) .build(); LOG.warn("Initializing assume role request session: " + assumeRoleRequest.roleSessionName()); StsClient stsClient = StsClient.builder().region(region).build(); StsAssumeRoleCredentialsProvider stsAssumeRoleCredentialsProvider = StsAssumeRoleCredentialsProvider .builder() .stsClient(stsClient) .refreshRequest(assumeRoleRequest) .asyncCredentialUpdateEnabled(true) .build(); LOG.warn("Initializing sts role credential provider: " + stsAssumeRoleCredentialsProvider.prefetchTime().toString()); return stsAssumeRoleCredentialsProvider; }

    public static void main(String[] args) throws Exception {
        checkUsage(args);

        setLogLevels();

        String applicationName = args[0];
        String streamName = args[1];
        Region region = Region.of(args[2]);
        String roleArn = args[3]; String roleSessionName = args[4]; 
        
        if (region == null) {
            System.err.println(args[2] + " is not a valid AWS region.");
            System.exit(1);
        }
        
        AwsCredentialsProvider awsCredentialsProvider = roleCredentialsProvider(roleArn,roleSessionName, region); KinesisAsyncClient kinesisClient = KinesisClientUtil.createKinesisAsyncClient(KinesisAsyncClient.builder().region(region).credentialsProvider(awsCredentialsProvider));
        DynamoDbAsyncClient dynamoClient = DynamoDbAsyncClient.builder().region(region).build();
        CloudWatchAsyncClient cloudWatchClient = CloudWatchAsyncClient.builder().region(region).build();
        StockTradeRecordProcessorFactory shardRecordProcessor = new StockTradeRecordProcessorFactory();
        ConfigsBuilder configsBuilder = new ConfigsBuilder(streamName, applicationName, kinesisClient, dynamoClient, cloudWatchClient, UUID.randomUUID().toString(), shardRecordProcessor);

        Scheduler scheduler = new Scheduler(
                configsBuilder.checkpointConfig(),
                configsBuilder.coordinatorConfig(),
                configsBuilder.leaseManagementConfig(),
                configsBuilder.lifecycleConfig(),
                configsBuilder.metricsConfig(),
                configsBuilder.processorConfig(),
                configsBuilder.retrievalConfig()
        );
        int exitCode = 0;
        try {
            scheduler.run();
        } catch (Throwable t) {
            LOG.error("Caught throwable while processing data.", t);
            exitCode = 1;
        }
        System.exit(exitCode);

    }

}

Let’s review the changes made to the code to understand the key parts of how the cross-account access works.

AssumeRoleRequest assumeRoleRequest = AssumeRoleRequest.builder() .roleArn(roleArn) .roleSessionName(roleSessionName) .durationSeconds(900) .build();

AssumeRoleRequest class is used to get the credentials to access the Kinesis data stream in Account A using the role that was created. The value of the variable assumeRoleRequest is passed to the StsAssumeRoleCredentialsProvider.

StsClient stsClient = StsClient.builder().region(region).build();

StsAssumeRoleCredentialsProvider stsAssumeRoleCredentialsProvider = StsAssumeRoleCredentialsProvider .builder() .stsClient(stsClient) .refreshRequest(assumeRoleRequest) .asyncCredentialUpdateEnabled(true) .build();

StsAssumeRoleCredentialsProvider periodically sends an AssumeRoleRequest to the AWS STS to maintain short-lived sessions to use for authentication. Using refreshRequest, these sessions are updated asynchronously in the background as they get close to expiring. As asynchronous refresh is not set by default, we explicitly set it to true using asyncCredentialUpdateEnabled.

AwsCredentialsProvider awsCredentialsProvider = roleCredentialsProvider(roleArn,roleSessionName, region);

KinesisAsyncClient kinesisClient = KinesisClientUtil.createKinesisAsyncClient(KinesisAsyncClient.builder().region(region).credentialsProvider(awsCredentialsProvider));

KinesisAsyncClient is the client for accessing Kinesis asynchronously. We can create an instance of KinesisAsyncClient by passing to it the credentials to access the Kinesis data stream in Account A through the assume role. The values of Kinesis, DynamoDB, and the CloudWatch client along with the stream name, application name is used to create a ConfigsBuilder instance.
The ConfigsBuilder instance is used to create the KCL scheduler (also known as KCL worker in KCL versions 1.x).
The scheduler creates a new thread for each shard (assigned to this consumer instance), which continuously loops to read records from the data stream. It then invokes the record processor instance (StockTradeRecordProcessor in this example) to process each batch of records received. This is the class which will contain your record processing logic. The Developing Custom Consumers with Shared Throughput Using KCL section of the documentation will provide more details on the working of KCL.

KCL application in python

In this section we will show you how to configure a KCL application written in Python to access a cross-account Kinesis data stream.

A. Steps 1 and 2 from earlier remain the same and will need to be completed before moving ahead. After the IAM roles and policies have been created, log into the EC2 instance and clone the amazon-kinesis-client-python repository using the following command.

git clone https://github.com/awslabs/amazon-kinesis-client-python.git

B. Navigate to the amazon-kinesis-client-python directory and run the following commands.

sudo yum install python-pip
sudo pip install virtualenv
virtualenv /tmp/kclpy-sample-env
source /tmp/kclpy-sample/env/bin/activate
pip install amazon_kclpy

C. Next, navigate to amazon-kinesis-client-python/samples and open the sample.properties file. The properties file has properties such as streamName, application name, and credential information that lets you customize the configuration for your use case.

D. We will modify the properties file to change the stream name and application name, and to add the credentials to enable access to a Kinesis data stream in a different account. You can replace the sample.properties file and replace with the following snippet. The bolded sections show the changes we have made.

# The script that abides by the multi-language protocol. This script will

# be executed by the MultiLangDaemon, which will communicate with this script

# over STDIN and STDOUT according to the multi-language protocol.

executableName = sample_kclpy_app.py

# The name of an Amazon Kinesis stream to process.

streamName = StockTradeStream

# Used by the KCL as the name of this application. Will be used as the name

# of an Amazon DynamoDB table which will store the lease and checkpoint

# information for workers with this application name

applicationName = StockTradesProcessor

# Users can change the credentials provider the KCL will use to retrieve credentials.

# The DefaultAWSCredentialsProviderChain checks several other providers, which is

# described here:

# http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html

#AWSCredentialsProvider = DefaultAWSCredentialsProviderChain

AWSCredentialsProvider = STSAssumeRoleSessionCredentialsProvider|arn:aws:iam::Account-A-AccountNumber:role/kds-stock-trade-stream-role|kinesiscrossaccount

AWSCredentialsProviderDynamoDB = DefaultAWSCredentialsProviderChain

AWSCredentialsProviderCloudWatch = DefaultAWSCredentialsProviderChain

# Appended to the user agent of the KCL. Does not impact the functionality of the

# KCL in any other way.

processingLanguage = python/2.7

# Valid options at TRIM_HORIZON or LATEST.

# See http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html#API_GetShardIterator_RequestSyntax

initialPositionInStream = LATEST

# The following properties are also available for configuring the KCL Worker that is created

# by the MultiLangDaemon.

# The KCL defaults to us-east-1

#regionName = us-east-1

In the above step, you will have to replace Account-A-AccountNumber with the AWS account number of the account that has the kinesis stream.

We use the STSAssumeRoleSessionCredentialsProvider class and pass to it the role created in Account A which have permissions to access the Kinesis data stream. This gives the KCL application in Account B permissions to read the Kinesis data stream in Account A. The DynamoDB lease table and the CloudWatch metrics are in Account B. Hence, we can use the DefaultAWSCredentialsProviderChain for AWSCredentialsProviderDynamoDB and AWSCredentialsProviderCloudWatch in the properties file. You can now save the sample.properties file.

E. Next, we will change the application code to print the data read from the Kinesis data stream to standard output (STDOUT). Edit the sample_kclpy_app.py under the samples directory. You will add all your application code logic in the process_record method. This method is called for every record in the Kinesis data stream. For this blog post, we will add a single line to the method to print the records to STDOUT, as shown in Figure 5.

Figure 5. Add custom code to process_record method

F. Save the file, and run the following command to build the project with the changes you just made.

cd amazon-kinesis-client-python/
python setup.py install

G. Now you are ready to run the application. To start the KCL application, run the following command from the amazon-kinesis-client-python directory.

`amazon_kclpy_helper.py --print_command --java /usr/bin/java --properties samples/sample.properties`

This will start the application. Your application is now ready to read the data from the Kinesis data stream in another account and display the contents of the stream on STDOUT. When the producer starts writing data to the Kinesis data stream in Account A, you will start seeing those results being printed.

Clean Up

Once you are done testing the cross-account access make sure you clean up your environment to avoid incurring cost. As part of the cleanup we recommend you delete the Kinesis data stream, StockTradeStream, the EC2 instances that the KCL application is running on, and the DynamoDB table that was created by the KCL application.

Conclusion

In this blog post, we discussed the techniques to configure your KCL applications written in Java and Python to access a Kinesis data stream in a different AWS account. We also provided sample code and configurations which you can modify and use in your application code to set up the cross-account access. Now you can continue to build a multi-account strategy on AWS, while being able to easily access your Kinesis data streams from applications in multiple AWS accounts.

Building a serverless GIF generator with AWS Lambda: Part 2

2021-09-13 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-gif-generator-with-aws-lambda-part-2/

In part 1 of this blog post, I explain how a GIF generation service can support a front-end application for video streaming. I compare the performance of a server-based and serverless approach and show how parallelization can significantly improve processing time. I introduce an example application and I walk through the solution architecture.

In this post, I explain the scaling behavior of the example application and consider alternative approaches. I also look at how to manage memory, temporary space, and files in this type of workload. Finally, I discuss the cost of this approach and how to determine if a workload can use parallelization.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account. This walkthrough creates some resources covered in the AWS Free Tier but others incur cost.

Scaling up the AWS Lambda workers with Amazon EventBridge

There are two AWS Lambda functions in the example application. The first detects the length of the source video and then generates batches of events containing start and end times. These events are put onto the Amazon EventBridge default event bus.

An EventBridge rule matches the events and invokes the second Lambda function. This second function receives the events, which have the following structure:

{
    "version": "0",
    "id": "06a1596a-1234-1234-1234-abc1234567",
    "detail-type": "newVideoCreated",
    "source": "custom.gifGenerator",
    "account": "123456789012",
    "time": "2021-0-17T11:36:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "key": "long.mp4",
        "start": 2250,
        "end": 2279,
        "length": 3294.024,
        "tsCreated": 1623929798333
    }
}

The detail attribute contains the unique start and end time for the slice of work. Each Lambda invocation receives a different start and end time and works on a 30-second snippet of the whole video. The function then uses FFMPEG to download the original video from the source Amazon S3 bucket and perform the processing for its allocated time slice.

The EventBridge rule matches events and invokes the target Lambda function asynchronously. The Lambda service scales up the number of execution environments in response to the number of events:

The first function produces batches of events almost simultaneously but the worker function takes several seconds to process a single request. If there is no existing environment available to handle the request, the Lambda scales up to process the work. As a result, you often see a high level of concurrency when running this application, which is how parallelization is achieved:

Lambda continues to scale up until it reaches the initial burst concurrency quotas in the current AWS Region. These quotas are between 500 and 3000 execution environments per minute initially. After the initial burst, concurrency scales by an additional 500 instances per minute.

If the number of events is higher, Lambda responds to EventBridge with a throttling error. The EventBridge service retries the events with exponential backoff for 24 hours. Once Lambda is scaled sufficiently or existing execution environments become available, the events are then processed.

This means that under exceptional levels of heavy load, this retry pattern adds latency to the overall GIF generation task. To manage this, you can use Provisioned Concurrency to ensure that more execution environments are available during periods of very high load.

Alternative ways to scale the Lambda workers

The asynchronous invocation mode for Lambda allows you to scale up worker Lambda functions quickly. This is the mode used by EventBridge when Lambda functions are defined as targets in rules. The other benefit of using EventBridge to decouple the two functions in this example is extensibility. Currently, the events have only a single consumer. However, you can add new capabilities to this application by building new event consumers, without changing the producer logic. Note that using EventBridge in this architecture costs $1 per million events put onto the bus (this cost varies by Region). Delivery to targets in EventBridge is free.

This design could similarly use Amazon SNS, which also invokes consuming Lambda functions asynchronously. This costs $0.50 per million messages and delivery to Lambda functions is free (this cost varies by Region). Depending on if you use EventBridge capabilities, SNS may be a better choice for decoupling the two Lambda functions.

Alternatively, the first Lambda function could invoke the second function by using the invoke method of the Lambda API. By using the AWS SDK for JavaScript, one Lambda function can invoke another directly from the handler code. When the InvocationType is set to ‘Event’, this invocation occurs asynchronously. That means that the calling function does not wait for the target function to finish before continuing.

This direct integration between two Lambda services is the lowest latency alternative. However, this limits the extensibility of the solution in the future without modifying code.

Managing memory, temp space, and files

You can configure the memory for a Lambda function up to 10,240 MB. However, the temporary storage available in /tmp is always 512 MB, regardless of memory. Increasing the memory allocation proportionally increases the amount of virtual CPU and network bandwidth available to the function. To learn more about how this works in detail, watch Optimizing Lambda performance for your serverless applications.

The original video files used in this workload may be several gigabytes in size. Since these may be larger than the /tmp space available, the code is designed to keep the movie file in memory. As a result, this solution works for any length of movie that can fit into the 10 GB memory limit.

The FFMPEG application expects to work with local file systems and is not designed to work with object stores like Amazon S3. It can also read video files from HTTP endpoints, so the example application loads the S3 object over HTTPS instead of downloading the file and using the /tmp space. To achieve this, the code uses the getSignedUrl method of the S3 class in the SDK:

 	// Configure S3
 	const AWS = require('aws-sdk')
 	AWS.config.update({ region: process.env.AWS_REGION })
 	const s3 = new AWS.S3({ apiVersion: '2006-03-01' }) 

 	// Get signed URL for source object
	const params = {
		Bucket: record.s3.bucket.name, 
		Key: record.s3.object.key, 
		Expires: 300
	}
	const url = s3.getSignedUrl('getObject', params)

The resulting URL contains credentials to download the S3 object over HTTPs. The Expires attributes in the parameters determines how long the credentials are valid for. The Lambda function calling this method must have appropriate IAM permissions for the target S3 bucket.

The GIF generation Lambda function stores the output GIF and JPG in the /tmp storage space. Since the function can be reused by subsequent invocations, it’s important to delete these temporary files before each invocation ends. This prevents the function from using all of the /tmp space available. This is handled by the tmpCleanup function:

const fs = require('fs')
const path = require('path')
const directory = '/tmp/'

// Deletes all files in a directory
const tmpCleanup = async () => {
    console.log('Starting tmpCleanup')
    fs.readdir(directory, (err, files) => {
        return new Promise((resolve, reject) => {
            if (err) reject(err)

            console.log('Deleting: ', files)                
            for (const file of files) {
                const fullPath = path.join(directory, file)
                fs.unlink(fullPath, err => {
                    if (err) reject (err)
                })
            }
            resolve()
        })
    })
}

When the GenerateFrames parameter is set to true in the AWS SAM template, the worker function generates one frame per second of video. For longer videos, this results in a significant number of files. Since one of the dimensions of S3 pricing is the number of PUTs, this function increases the cost of the workload when using S3.

For applications that are handling large numbers of small files, it can be more cost effective to use Amazon EFS and mount the file system to the Lambda function. EFS charges based upon data storage and throughput, instead of number of files. To learn more about using EFS with Lambda, read this Compute Blog post.

Calculating the cost of the worker Lambda function

While parallelizing Lambda functions significantly reduces the overall processing time in this case, it’s also important to calculate the cost. To process the 3-hour video example in part 1, the function uses 345 invocations with 4096 MB of memory. Each invocation has an average duration of 4,311 ms.

Using the AWS Pricing Calculator, and ignoring the AWS Free Tier allowance, the costs to process this video is approximately $0.10.

There are additional charges for other services used in the example application, such as EventBridge and S3. However, in terms of compute cost, this may compare favorably with server-based alternatives that you may have to scale manually depending on traffic. The exact cost depends upon your implementation and latency needs.

Deciding if a workload can be parallelized

The GIF generation workload is a good candidate for parallelization. This is because each 30-second block of work is independent and there is no strict ordering requirement. The end result is not impacted by the order that the GIFs are generated in. Each GIF also takes several seconds to generate, which is why the time saving comparison with the sequential, server-based approach is so significant.

Not all workloads can be parallelized and in many cases the work duration may be much shorter. This workload interacts with S3, which can scale to any level of read or write traffic created by the worker functions. You may use other downstream services that cannot scale this way, which may limit the amount of parallel processing you can use.

To learn more about designing and operating Lambda-based applications, read the Lambda Operator Guide.

Conclusion

Part 2 of this blog post expands on some of the advanced topics around scaling Lambda in parallelized workloads. It explains how the asynchronous invocation mode of Lambda scales and different ways to scale the worker Lambda function.

I cover how the example application manages memory, files, and temporary storage space. I also explain how to calculate the compute cost of using this approach, and considering if you can use parallelization in a workload.

For more serverless learning resources, visit Serverless Land.

Field Notes: Automate Disaster Recovery for AWS Workloads with Druva

2021-09-03 Girish Chanchlani

Post Syndicated from Girish Chanchlani original https://aws.amazon.com/blogs/architecture/field-notes-automate-disaster-recovery-for-aws-workloads-with-druva/

This post was co-written by Akshay Panchmukh, Product Manager, Druva and Girish Chanchlani, Sr Partner Solutions Architect, AWS.

The Uptime Institute’s Annual Outage Analysis 2021 report estimated that 40% of outages or service interruptions in businesses cost between $100,000 and $1 million, while about 17% cost more than $1 million. To guard against this, it is critical for you to have a sound data protection and disaster recovery (DR) strategy to minimize the impact on your business. With the greater adoption of the public cloud, most companies are either following a hybrid model with critical workloads spread across on-premises data centers and the cloud or are all in the cloud.

In this blog post, we focus on how Druva, a SaaS based data protection solution provider, can help you implement a DR strategy for your workloads running in Amazon Web Services (AWS). We explain how to set up protection of AWS workloads running in one AWS account, and fail them over to another AWS account or Region, thereby minimizing the impact of disruption to your business.

Overview of the architecture

In the following architecture, we describe how you can protect your AWS workloads from outages and disasters. You can quickly set up a DR plan using Druva’s simple and intuitive user interface, and within minutes you are ready to protect your AWS infrastructure.

Figure 1. Druva architecture

Druva’s cloud DR is built on AWS using native services to provide a secure operating environment for comprehensive backup and DR operations. With Druva, you can:

Seamlessly create cross-account DR sites based on source sites by cloning Amazon Virtual Private Clouds (Amazon VPCs) and their dependents.
Set up backup policies to automatically create and copy snapshots of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances to DR Regions based on recovery point objective (RPO) requirements.
Set up service level objective (SLO) based DR plans with options to schedule automated tests of DR plans and ensure compliance.
Monitor implementation of DR plans easily from the Druva console.
Generate compliance reports for DR failover and test initiation.

Other notable features include support for automated runbook initiation, selection of target AWS instance types for DR, and simplified orchestration and testing to help protect and recover data at scale. Druva provides the flexibility to adopt evolving infrastructure across geographic locations, adhere to regulatory requirements (such as, GDPR and CCPA), and recover workloads quickly following disasters, helping meet your business-critical recovery time objectives (RTOs). This unified solution offers taking snapshots as frequently as every five minutes, improving RPOs. Because it is a software as a service (SaaS) solution, Druva helps lower costs by eliminating traditional administration and maintenance of storage hardware and software, upgrades, patches, and integrations.

We will show you how to set up Druva to protect your AWS workloads and automate DR.

Step 1: Log into the Druva platform and provide access to AWS accounts

After you sign into the Druva Cloud Platform, you will need to grant Druva third-party access to your AWS account by pressing Add New Account button, and following the steps as shown in Figure 2.

Figure 2. Add AWS account information

Druva uses AWS Identity and Access Management (IAM) roles to access and manage your AWS workloads. To help you with this, Druva provides an AWS CloudFormation template to create a stack or stack set that generates the following:

IAM role
IAM instance profile
IAM policy

The generated Amazon Resource Name (ARN) of the IAM role is then linked to Druva so that it can run backup and DR jobs for your AWS workloads. Note that Druva follows all security protocols and best practices recommended by AWS. All access permissions to your AWS resources and Regions are controlled by IAM.

After you have logged into Druva and set up your account, you can now set up DR for your AWS workloads. The following steps will allow you to set up DR for AWS infrastructure.

Step 2: Identify the source environment

A source environment refers to a logical grouping of Amazon VPCs, subnets, security groups, and other infrastructure components required to run your application.

Figure 3. Create a source environment

In this step, create your source environment by selecting the appropriate AWS resources you’d like to set up for failover. Druva currently supports Amazon EC2 and Amazon RDS as sources that can be protected. With Druva’s automated DR, you can failover these resources to a secondary site with the press of a button.

Figure 4. Add resources to a source environment

Note that creating a source environment does not create or update existing resources or configurations in your AWS account. It only saves this configuration information with Druva’s service.

Step 3: Clone the environment

The next step is to clone the source environment to a Region where you want to failover in case of a disaster. Druva supports cloning the source environment to another Region or AWS account that you have selected. Cloning essentially replicates the source infrastructure in the target Region or account, which allows the resources to be failed over quickly and seamlessly.

Figure 5. Clone the source environment

Step 4: Set up a backup policy

You can create a new backup policy or use an existing backup policy to create backups in the cloned or target Region. This enables Druva to restore instances using the backup copies.

Figure 6. Create a new backup policy or use an existing backup policy

Figure 7. Customize the frequency of backup schedules

Step 5: Create the DR plan

A DR plan is a structured set of instructions designed to recover resources in the event of a failure or disaster. DR aims to get you back to the production-ready setup with minimal downtime. Follow these steps to create your DR plan.

Create DR Plan: Press Create Disaster Recovery Plan button to open the DR plan creation page.

Figure 8. Create a disaster recovery plan

Name: Enter the name of the DR plan.
Service Level Objective (SLO): Enter your RPO and RTO.

- Recovery Point Objective – Example: If you set your RPO as 24 hours, and your backup was scheduled daily at 8:00 PM, but a disaster occurred at 7:59 PM, you would be able to recover data that was backed up on the previous day at 8:00 PM. You would lose the data generated after the last backup (24 hours of data loss).
- Recovery Time Objective – Example: If you set your RTO as 30 hours, when a disaster occurred, you would be able to recover all critical IT services within 30 hours from the point in time the disaster occurs.

Figure 9. Add your RTO and RPO requirements

Create your plan based off the source environment, target environment, and resources.

Environments
Source Account	By default, this is the Druva account in which you are currently creating the DR plan.
Source Environment	Select the source environment applicable within the Source Account (your Druva account in which you’re creating the DR plan).
Target Account	Select the same or a different target account.
Target Environment	Select the Target Environment, applicable within the Target Account.

Resources
Create Policy	If you do not have a backup policy, then you can create one.
Add Resources	Add resources from the source environment that you want to restore. Make sure the verification column shows a ‘Valid Backup Policy’ status. This ensures that the backup policy is frequently creating copies as per the RPO defined previously.

Figure 10. Create DR plan based off the source environment, target environment, and resources

Identify target instance type, test plan instance type, and run books for this DR plan.

Target Instance Type

Target Instance Type can be selected. If instance type is not selected then:

Select an instance type which is large in size.
Select an instance type which is smaller.
Fail the creation of the instance.

Test Plan Instance Type

There are many options. To reduce incurring cost, the lower instance type can be selected from all available AWS instance types.

Run Books

Select this option if you would like to shutdown the source server after failover occurs.

Figure 11. Identify target instance type, test plan instance type, and run books

Step 6: Test the DR plan

After you have defined your DR plan, it is time to test it so that you can—when necessary—initiate a failover of resources to the target Region. You can now easily try this on the resources in the cloned environment without affecting your production environment.

Figure 12. Test the DR plan

Testing your DR plan will help you to find answers for some of the questions like: How long did the recovery take? Did I meet my RTO and RPO objectives?

Step 7: Initiate the DR plan

After you have successfully tested the DR plan, it can easily be initiated with the click of a button to failover your resources from the source Region or account to the target Region or account.

Conclusion

With the growth of cloud-based services, businesses need to ensure that mission-critical applications that power their businesses are always available. Any loss of service has a direct impact on the bottom line, which makes business continuity planning a critical element to any organization. Druva offers a simple DR solution which will help you keep your business always available.

Druva provides unified backup and cloud DR with no need to manage hardware, software, or costs and complexity. It helps automate DR processes to ensure your teams are prepared for any potential disasters while meeting compliance and auditing requirements.

With Druva, you can easily validate your RTO and RPO with automated regular DR testing, cross-account DR for protection against attacks and accidental deletions, and ensure backups are isolated from your primary production account for DR planning. With cross-Region DR, you can duplicate the entire Amazon VPC environment across Regions to protect you against Regionwide failures. In conclusion, Druva is a complete solution built with a goal to protect your native AWS workloads from any disasters.

To learn more, visit: https://www.druva.com/use-cases/aws-cloud-backup/

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building a serverless distributed application using a saga orchestration pattern

2021-09-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-distributed-application-using-a-saga-orchestration-pattern/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx (Developer Acceleration).

This post shows how to use the saga design pattern to preserve data integrity in distributed transactions across multiple services. In a distributed transaction, multiple services can be called before a transaction is completed. When the services store data in different data stores, it can be challenging to maintain data consistency across these data stores.

To maintain consistency in a transaction, relational databases provide two-phase commit (2PC). This consists of a prepare phase and a commit phase. In the prepare phase, the coordinating process requests the transaction’s participating processes (participants) to promise to commit or rollback the transaction. In the commit phase, the coordinating process requests the participants to commit the transaction. If the participants cannot agree to commit in the prepare phase, then the transaction is rolled back.

In distributed systems architected with microservices, two-phase commit is not an option as the transaction is distributed across various databases. In this case, one solution is to use the saga pattern.

A saga consists of a sequence of local transactions. Each local transaction in saga updates the database and triggers the next local transaction. If a transaction fails, then the saga runs compensating transactions to revert the database changes made by the preceding transactions.

There are two types of implementations for the saga pattern: choreography and orchestration.

Saga choreography

The saga choreography pattern depends on the events emitted by the microservices. The saga participants (microservices) subscribe to the events and they act based on the event triggers. For example, the order service in the following diagram emits an OrderPlaced event. The inventory service subscribes to that event and updates the inventory when the OrderPlaced event is emitted. Similarly the participant services act based on the context of the emitted event.

Saga orchestration

The saga orchestration pattern has a central coordinator called orchestrator. The saga orchestrator manages and coordinates the entire transaction lifecycle. It is aware of the series of steps to be performed to complete the transaction. To run a step, it sends a message to the participant microservice to perform the operation. The participant microservice completes the operation and sends a message to the orchestrator. Based on the received message, the orchestrator decides which microservice to run next in the transaction:

You can use AWS Step Functions to implement the saga orchestration when the transaction is distributed across multiple databases.

Overview

This example uses a Step Functions workflow to implement the saga orchestration pattern, using the following architecture:

When a customer calls the API, the invocation occurs, and pre-processing occurs in the Lambda function. The function starts the Step Functions workflow to start processing the distributed transaction.

The Step Functions workflow calls the individual services for order placement, inventory update, and payment processing to complete the transaction. It sends an event notification for further processing. The Step Functions workflow acts as the orchestrator to coordinate the transactions. If there is any error in the workflow, the orchestrator runs the compensatory transactions to ensure that the data integrity is maintained across various services.

When pre-processing is not required, you can also trigger the Step Functions workflow directly from API Gateway without the Lambda function.

The Step Functions workflow

The following diagram shows the steps that are run inside the Step Functions workflow. The green boxes show the steps that are run successfully. The order is placed, inventory is updated, and payment is processed before a Success state is returned to the caller.

The orange boxes indicate the compensatory transactions that are run when any one of the steps in the workflow fails. If the workflow fails at the Update inventory step, then the orchestrator calls the Revert inventory and Remove order steps before returning a Fail state to the caller. These compensatory transactions ensure that the data integrity is maintained. The inventory reverts to original levels and the order is reverted back.

This preceding workflow is an example of a distributed transaction. The transaction data is stored across different databases and each service writes to its own database.

Prerequisites

For this walkthrough, you need:

An AWS account
An AWS user with AdministratorAccess (see the instructions on the AWS Identity and Access Management (IAM) console)
Access to the following AWS services: Amazon API Gateway, AWS Lambda, AWS Step Functions, and Amazon DynamoDB.
Node.js installed
.NET Core 3.1 SDK installed
JetBrains Rider or Microsoft Visual Studio 2017 or later (or Visual Studio Code)
Postman to make the API call

Setting up the environment

For this walkthrough, use the AWS CDK code in the GitHub Repository to create the AWS resources. These include IAM roles, REST API using API Gateway, DynamoDB tables, the Step Functions workflow and Lambda functions.

You need an AWS access key ID and secret access key for configuring the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
Clone the repo:
git clone https://github.com/aws-samples/saga-orchestration-netcore-blog
After cloning, this is the directory structure:
The Lambda functions in the saga-orchestration directory must be packaged and copied to the cdk-saga-orchestration\lambdas directory before deployment. Run these commands to process the PlaceOrderLambda function:
```
cd PlaceOrderLambda/src/PlaceOrderLambda 
dotnet lambda package
cp bin/Release/netcoreapp3.1/PlaceOrderLambda.zip ../../../../cdk-saga-orchestration/lambdas
```
Repeat the same commands for all the Lambda functions in the saga-orchestration directory.

Build the CDK code before deploying to the console:

cd cdk-saga-orchestration/src/CdkSagaOrchestration
dotnet build

Install the aws-cdk package:
```
npm install -g aws-cdk 
```
The cdk synth command causes the resources defined in the application to be translated into an AWS CloudFormation template. The cdk deploy command deploys the stacks into your AWS account. Run:
```
cd cdk-saga-orchestration
cdk synth 
cdk deploy
```
CDK deploys the environment to AWS. You can monitor the progress using the CloudFormation console. The stack name is CdkSagaOrchestrationStack:

The Step Functions configuration

The CDK creates the Step Functions workflow, DistributedTransactionOrchestrator. The following snippet defines the workflow with AWS CDK for .NET:

var stepDefinition = placeOrderTask
    .Next(new Choice(this, "Is order placed")
        .When(Condition.StringEquals("$.Status", "ORDER_PLACED"), updateInventoryTask
            .Next(new Choice(this, "Is inventory updated")
                .When(Condition.StringEquals("$.Status", "INVENTORY_UPDATED"),
                    makePaymentTask.Next(new Choice(this, "Is payment success")
                        .When(Condition.StringEquals("$.Status", "PAYMENT_COMPLETED"), successState)
                        .When(Condition.StringEquals("$.Status", "ERROR"), revertPaymentTask)))
                .When(Condition.StringEquals("$.Status", "ERROR"), waitState)))
        .When(Condition.StringEquals("$.Status", "ERROR"), failState));

Compare the states language definition for the state machine with the definition above. Also observe the inputs and outputs for each step and how the conditions have been configured. The steps with type Task call a Lambda function for the processing. The steps with type Choice are decision-making steps that define the workflow.

Setting up the DynamoDB table

The Orders and Inventory DynamoDB tables are created using AWS CDK. The following snippet creates a DynamoDB table with AWS CDK for .NET:

var inventoryTable = new Table(this, "Inventory", new TableProps
{
    TableName = "Inventory",
    PartitionKey = new Attribute
    {
        Name = "ItemId",
        Type = AttributeType.STRING
    },
    RemovalPolicy = RemovalPolicy.DESTROY
});

Open the DynamoDB console and select the Inventory table.
Choose Create Item.

Select Text, paste the following contents, then choose Save.

{
  "ItemId": "ITEM001",
  "ItemName": "Soap",
  "ItemsInStock": 1000,
  "ItemStatus": ""
}

Create two more items in the Inventory table:

{
  "ItemId": "ITEM002",
  "ItemName": "Shampoo",
  "ItemsInStock": 500,
  "ItemStatus": ""
}

{
  "ItemId": "ITEM003",
  "ItemName": "Toothpaste",
  "ItemsInStock": 2000,
  "ItemStatus": ""
}

The Lambda functions UpdateInventoryLambda and RevertInventoryLambda increment and decrement the ItemsInStock attribute value. The Lambda functions PlaceOrderLambda and UpdateOrderLambda insert and delete items in the Orders table. These are invoked by the saga orchestration workflow.

Triggering the saga orchestration workflow

The API Gateway endpoint, SagaOrchestratorAPI, is created using AWS CDK. To invoke the endpoint:

From the API Gateway service page, select the SagaOrchestratorAPI:
Select Stages in the left menu panel:
Select the prod stage and copy the Invoke URL:
From Postman, open a new tab. Select POST in the dropdown and enter the copied URL in the textbox. Move to the Headers tab and add a new header with the key ‘Content-Type’ and value as ‘application/json’:

In the Body tab, enter the following input and choose Send.

{
  "ItemId": "ITEM001",
  "CustomerId": "ABC/002",
  "MessageId": "",
  "FailAtStage": "None"
}

You see the output:
Open the Step Functions console and view the execution. The graph inspector shows that the execution has completed successfully.
Check the items in the DynamoDB tables, Orders & Inventory. You can see an item in the Orders table indicating that an order is placed. The ItemsInStock in the Inventory table has been deducted.
To simulate the failure workflow in the saga orchestrator, send the following JSON as body in the Postman call. The FailAtStage parameter injects the failure in the workflow. Select Send in Postman after updating the Body:
```
{
  "ItemId": "ITEM002",
  "CustomerId": "DEF/002",
  "MessageId": "",
  "FailAtStage": "UpdateInventory"
}
```
Open the Step Functions console to see the execution.
While the function waits in the wait state, look at the items in the DynamoDB tables. A new item is added to the Orders table and the stock for Shampoo is deducted in the Inventory table.
Once the wait completes, the compensatory transaction steps are run:
In the graph inspector, select the Update Inventory step. On the right pane, click on the Step output tab. The status is ERROR, which changes the control flow to run the compensatory transactions.
Look at the items in the DynamoDB table again. The data is now back to a consistent state, as the compensatory transactions have run to preserve data integrity:

The Step Functions workflow implements the saga orchestration pattern. It performs the coordination across distributed services and runs the transactions. It also performs compensatory transactions to preserve the data integrity.

Cleaning up

To avoid incurring additional charges, clean up all the resources that have been created. Run the following command from a terminal window. This deletes all the resources that were created as part of this example.

cdk destroy

Conclusion

This post showed how to implement the saga orchestration pattern using API Gateway, Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This can help maintain data integrity in distributed transactions across multiple services. Step Functions makes it easier to implement the orchestration in the saga pattern.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about the features, refer to the AWS CDK Features page.

Field Notes: How to Deploy End-to-End CI/CD in the China Regions Using AWS CodePipeline

2021-09-01 Ashutosh Pateriya

Post Syndicated from Ashutosh Pateriya original https://aws.amazon.com/blogs/architecture/field-notes-how-to-deploy-end-to-end-ci-cd-in-the-china-regions-using-aws-codepipeline/

This post was co-authored by Ravi Intodia, Cloud Archiect, Infosys Technologies Ltd, Nirmal Tomar, Principal Consultant, Infosys Technologies Ltd and Ashutosh Pateriya, Solution Architect, AWS.

Today’s businesses must contend with fast-changing competitive environments, expanding security needs, and scalability issues. Businesses must find a way to reconcile the need for operational stability with the need for quick product development. Continuous integration and continuous delivery (CI/CD) enables rapid software iterations while maintaining system stability and security.

With an increase in AWS Cloud and DevOps adoption, many organizations seek solutions which go beyond geographical boundaries. AWS CodePipeline, along with its related services, lets you integrate and deploy your solutions across multiple AWS accounts and Regions. However, it becomes more challenging when you want to deploy your application in multiple AWS Regions as well as in China, due to the unavailability of AWS CodePipeline in the Beijing and Ningxia Regions.

In this blog post, you will learn how to overcome the unique challenges when deploying applications across many parts of the world, including China. For this solution, we will use the power and flexibility of AWS CodeBuild to implement AWS Command Line Interface (AWS CLI) commands to perform custom actions that are not directly supported by CodePipeline or AWS CodeDeploy.

CodePipeline for multi-account and multi-Region deployment consists of the following components:

ArtifactStore and encryption keys – In the AWS account which hosts CodePipeline, there should be an Amazon Simple Storage Service (Amazon S3) bucket and an AWS Key Management Service (AWS KMS) key for each Region where resources need to be deployed.
CodeBuild and CodePipeline roles – In the AWS account which hosts CodePipeline there should be roles created that can be used or assumed by CodeBuild and CodePipeline projects for performing required actions.
Cross-account roles – In each AWS account where cross-account deployments are required, an AWS role with the necessary permissions must be created. The CodePipeline role of the deploying account must be allowed to assume this role for all required accounts. Cross-account roles will also have access to the required S3 buckets and AWS KMS keys for deploying accounts.

Figure 1. High-level solution for AWS Regions

Although the solution works for most Regions, we encounter challenges when we try to expand our current worldwide solutions into the China Regions.

The challenges are as follows:

Cross-account roles – Cross-account roles cannot be created between accounts in non-China Regions and the China Regions. This means that CodeDeploy will be unable to assume the target account role necessary to complete component deployment.
Availability of services – Services required to configure a cloud native CI/CD pipeline are unavailable in the China Regions.
Connectivity – There is no direct network connectivity available between the China Regions and other AWS Regions.
User management – Accounts by users in China are distinct from AWS Region user accounts, and must be maintained independently.

Due to the lack of cross-account roles and the CodePipeline service, setting up a worldwide CI/CD pipeline that includes the China Regions is not automatically supported.

High-level solution

In the proposed solution, we will build and deploy the application to both Regions using the AWS CI/CD services from the non-China Region, and we will create an access key in a China Region with access to deploy the application using services, such as AWS Lambda, AWS Elastic Beanstalk, Amazon Elastic Container Service, and Amazon Elastic Kubernetes Service. This access key is stored in a non-China account as an SSM parameter after encryption. On committing, the CodePipeline in the non-China Region is initiated, and it builds the package and deploys the application in both Regions from a single place.

Solution architecture

Figure 2. High-level solution for cross-account deployment from AWS Regions to a China Region

In this architecture, AWS CLI commands are used to set an AWS profile of CodeBuild instance with China credentials (retrieved from the AWS Systems Manager Parameter Store). This enables a CodeBuild instance to run an AWS CloudFormation package and deploy commands directly on the China account, thereby deploying required resources in the desired China Region.

This solution is not relying on any AWS CI/CD services like CodeDeploy in the China Region. With this solution we can create a complete CI/CD pipeline running in an AWS Region that can deploy an application in both Regions.

The following key components are needed for deployment:

AWS Identity and Access Management (IAM) user credentials – An IAM user needs to be created in the target account in China.
SSM parameter (secure string) – China IAM user access key (secret access key needs to be saved as a secure string SSM parameter in the deployment AWS account).
Update CloudFormation templates – CloudFormation templates need to be updated to support China Region mappings (such as using “arn:aws-cn” instead of “arn:aws”).
Enhance CodeBuild to support build and deployment – CodeBuild buildspec.yml needs to be enhanced to perform build and deployment to China accounts, as mentioned in the following.

Prerequisites

Two AWS accounts: One AWS account outside of China, and one account in China.
Practical experience in deploying Lambda functions using CodeBuild, CodeDeploy, and CodePipeline, and using AWS CLI. Because this example focuses specifically on extending CodePipeline from Regions outside of China to deploy in China Region, we are not going to explore a standard CodePipeline set up.

Detailed Implementation

This solution is built using CodePipeline, CodeCommit, CodeBuild, AWS CloudFormation templates, and IAM.

Steps

One-time key generation in an account in China with necessary access to deploy application, including creation of one S3 bucket for CodeBuild artifacts.
Note: As a best practice, we suggest rotation of the access key every 30 days.
Complete the setup of CodePipeline to deploy application in Regions outside of China, as well as including China Region.

As a demonstration, let’s deploy a Lambda function in us-east-1 and cn-north-1 and discuss the steps in detail. The same steps can be followed to deploy any other AWS service.

Part 1 – In the account based in China Region: cn-north-1

Create an S3 bucket with default encryption enabled for CodeBuild artifacts.

Create an IAM user (with programmatic access only) with the required permissions to deploy Lambda functions and related resources. The IAM user will also have access to the S3 bucket created for CodeBuild artifacts.To create an IAM policy, refer to the AWS IAM Policy resource.

Part 2 – In AWS account based in non-China Region: us-east-1

Create two SSM parameters of type secure string.

SSM Parameter Name	SSM Parameter Value
/China/Dev/UserAccessKey	<Value of China IAM User Access Key>
/China/Dev/UserAccessKey	<Value of China IAM User Access Key>

To create secure SSM parameters using the AWS CLI, refer to the Create a Systems Manager parameter (AWS CLI) tutorial.

To create secure SSM parameters using the AWS Management Console, refer to the Create a Systems Manager parameter (console) tutorial.

Note: Creating secure SSM parameters is not supported by CloudFormation templates. Also, as a security best practice, you should not have any sensitive information as part of CloudFormation templates to avoid any possible security breach.

Create an AWS KMS key for encrypting CodeBuild or CodePipeline artifacts (for cross-Region deployments, create AWS KMS key in all Regions, and create SSM parameters for each in the Region having CodePipeline).
Create artifacts S3 bucket for CodeBuild or CodePipeline artifacts.
Create CI/CD related roles. For CI/CD service roles, refer to:
https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create-service-role.html
https://docs.aws.amazon.com/codebuild/latest/userguide/setting-up.html#setting-up-service-role
Create a CodeCommit repository.
Create SSM parameters for the following.

SSM Parameter Name	SSM Parameter Value
/China/Dev/DeploymentS3Bucket	<Artifact Bucket Name in China Region>
/US/Dev/CodeBuildRole	<Role ARN of CodeBuild Service Role>
/US/Dev/CodePipelineRole	<Role ARN of Codepipeline Service Role>
/US/Dev/CloudformationRole	<Role ARN of Cloudformation Service Role>
/US/Dev/DeploymentS3Bucket	<Artifact Bucket Name in Pipeline Region>
/US/Dev/CodeBuildImage	<Code Build Image Details>
/US/Stage/CrossAccountStageRole	<Role ARN for Cross Account Service Role for Stage>
/US/Prod/CrossAccountStageRole	<Role ARN for Cross Account Service Role for Prod>

In CodeCommit, push the Lambda code and CloudFormation template for deploying Lambda resources (Lambda function, Lambda role, Lambda log group, and so forth).
In CodeCommit, push two buildspec yml files, one for us-east-1, and one for cn-north-1.
1. buildspec.yml: For us-east-1

# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2
phases:
  install:
    runtime-versions:
      python: 3.7
  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      
  build:
    commands:
      - echo "Starting build `date` in `pwd`"
      - echo "Starting SAM packaging `date` in `pwd`"
      - aws cloudformation package --template-file cloudformation_template.yaml --s3-bucket ${S3_BUCKET} --output-template-file transform-packaged.yaml
      # Additional package commands for cross-region deployments
      - echo "SAM packaging completed on `date`"
      - echo "Build completed `date` in `pwd`"
     
artifacts:
  type: zip
  files:
    - transform-packaged.yaml
    # - additional artifacts for cross-region deployments 

  discard-paths: yes

cache:
  paths:
    - '/root/.cache/pip'

1. buildspec-china.yml: For cn-north-1
  buildspec-china.yml will be customized for performing build and deployment both. Refer to the following for details.

# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2    
phases:
  install:
    runtime-versions:
      python: 3.7

  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      # Setting China Region IAM User Profile
      - echo "Start setting User Profile  `date` in `pwd`"
      - USER_ACCESS_KEY=`aws ssm get-parameter --name ${USER_ACCESS_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - USER_SECRET_KEY=`aws ssm get-parameter --name ${USER_SECRET_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - aws configure --profile china set aws_access_key_id ${USER_ACCESS_KEY}
      - aws configure --profile china set aws_secret_access_key ${USER_SECRET_KEY}
      - echo "Setting User Profile Completed `date` in `pwd`"

  build:
    commands:
      # Creating Deployment Package 
      - echo "Start build/packaging  `date` in `pwd`"
      - S3_BUCKET=`aws ssm get-parameter --name ${S3_BUCKET_SSM} --query Parameter.Value --output text`
      - zip -q -r package.zip *
      - >
        bash -c '
        aws cloudformation package 
        --template-file cloudformation_template_china.yaml
        --s3-bucket ${S3_BUCKET}
        --output-template-file transformed-template-china.yaml
        --profile china
        --region cn-north-1'
      - echo "Completed build/packaging  `date` in `pwd`"
  post_build:
    commands:
      # Deploying 
      - echo "Start deployment  `date` in `pwd`"
      - >
        bash -c '
        aws cloudformation deploy 
        --capabilities CAPABILITY_NAMED_IAM
        --template-file transformed-template-china.yaml
        --stack-name ${ProjectName}-app-stack-dev
        --profile china
        --region cn-north-1'
      - echo "Completed deployment  `date` in `pwd`"
artifacts:
  type: zip
  files:
    - package.zip
    - transformed-template-china.yaml

Environment Variables: USER_ACCESS_KEY_SSM, USER_SECRET_KEY_SSM and S3_BUCKET_SSM

After creating and committing the previous files, your CodeCommit repository will look like the following.

Now that we have a CodeCommit repository, next we will create a CodePipeline for Lambda with the following stages:

Source – Use the previously created CodeCommit repository (CodeRepository-US-East-1) as the source.
Build – The CodeBuild project uses buildspec.yml by default, and takes output of Source stage as input and builds artifacts for us-east-1.
Deploy
1. Code-Deploy Project for deploying to us-east-1
  This takes output of the previous CodeBuild stage as input and performs deployment in two steps: create-changeset and execute-changeset (assuming the required role attached to the Code-Deploy for deployment).
2. CodeBuild Project for deploying to cn-north-1
  This takes output of Source stage as input and performs build and deployment both to cn-north-1 using buildspec-china.yml. Also, it uses China IAM user credentials and bucket SSM parameters from environment variables.CodeBuild project details are outlined in the following image.

Optional – Add further steps like manual approval, deployment to higher environment, and so forth, as required.

Congratulations! you have just created a CodePipeline with Lambda deployed in both a non-China Region and a China Region. Your CodePipeline should appear similar to the following.

Figure 3. CodePipeline implementation for both Regions

Note: Actual CodePipeline view will be vertical only where all environment deployment will be one after the other. For the purpose of this example, we have placed them side-by-side to more easily showcase multiple environments.

CodePipeline Implementation Steps

We have created this pipeline with the following high-level steps, and you can add or remove steps as needed.

Step 1. After you commit the source code, CodePipeline will launch in the non-China Region and fetch the source code.

Step 2. Build the package using using buildspec.yml.

Step 3. Deploy the application in both Regions by following the subsection steps for development environment.

1. Create the changeset for the development environment.
2. Implement the changeset for the development environment.

Step 4. Repeat step 3, but deploy the application in the staging environment.

Step 5. Wait for approval from your administrator or application owner before deploying application in production environment.

Step 6. Repeat steps 3 and 4 to deploy the application in the production environment.

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this blog post.

Delete the CloudFormation stack created in the non-China Region.
Delete the SSM parameter created to store the access key.
Delete the access created in the China Region.

Conclusion

In this blog post, we have explored the question: how can you use AWS services to implement CI/CD in a China Region and keep them in sync with an AWS Region? Although we are using us-east-1 as an example here, this solution will work for any Region where CodePipeline services are available, including the China Region.

The question has been answered by dividing it into three problem statements as follows.

Problem 1: CodePipeline is not available in the China Regions.
Solution: Set up CodePipeline in a non-China Region and deploy to a China Region.

Problem 2: AWS cross-account roles are not possible between a non-China Region and the China Regions.
Solution: Use the power and flexibility of CodeBuild to build your application and also deploy your application using the AWS CLI.

Problem 3: Keep a non-China Region and the China Regions in sync.
Solution: Maintain all code and managing deployments from a common deployment AWS account.

Reference:

AWS CodeCommit | Managed Source Control Service

AWS CodePipeline | Continuous Integration and Continuous Delivery

AWS CodeBuild – Fully Managed Build Service

Field Notes: How to Boost Your Search Results Using Relevance Tuning with Amazon Kendra

2021-08-31 Sam Palani

Post Syndicated from Sam Palani original https://aws.amazon.com/blogs/architecture/field-notes-how-to-boost-your-search-results-using-amazon-kendra-relevance-tuning/

One challenge enterprises face when they implement an intelligent search solution for their large data sources, is the ability to quickly provide relevant search results. When working with large data sources, not all features or attributes within your data will be equally relevant to all your users. We want to prioritize identifying and boosting specific attributes for your users to provide the most relevant search results.

Relevance in Amazon Kendra tuning allows you to give a boost to a result in the response when the query includes terms that match the attribute. For example, you might have two similar documents but one is created more recently. A good practice is to boost the relevance for the newer (or earlier) document.

Relevance tuning in Amazon Kendra can be performed manually at the Index level or at the query level. In this blog post, we show how to tune an existing index that is connected to external data sources, and ultimately optimize your internal search results.

Solution overview

We will walk through how you can manually tune your index using boosting techniques to achieve the best results. This enables you to prioritize the results from a specific data source so your users get the most relevant results when they perform searches.

Figure 1. Amazon Kendra setup

Figure 1 illustrates a standard Amazon Kendra setup. An Amazon Kendra index is connected to different Amazon Simple Storage Service (Amazon S3) buckets with multiple data sources.

There are two types of user personas. The first persona is administrators who are responsible for managing the index and performing administrative tasks such as access control, index tuning, and so forth. The second persona is users who access Amazon Kendra either directly or through a custom application that can make API search requests on an Amazon Kendra index. You can use relevance tuning to boost results from one of these data sources to provide a more relevant search result.

Prerequisites

This solution requires the following:

AWS account
IAM users and roles
Amazon Kendra index
Amazon Kendra data source that is synced and available

If you do not have these prerequisites set up, you might check out Create and query an index that walks you through how to create and query an index in Amazon Kendra.

Furthermore, the AWS services you use in this tutorial are within the AWS Free Tier under a 30-day trial.

Step 1 – Check facet definition

First, review your facet definition and confirm it is facetable, displayable, searchable, and sortable.

In the Amazon Kendra console, select your Amazon Kendra index, then select Facet definition in the Data management panel. Confirm that _data_source_id has all of its attributes checked.

Figure 2. Check facet definition

Step 2 – Review data sources

Next, verify that you have at least two data sources for your Amazon Kendra index.

In the Amazon Kendra console, select your Amazon Kendra index, and then select Data sources in the Data management panel. Confirm that your data sources are correctly synced and available. In our example, data-source-2 is an earlier version and contains unprocessed documents compared to sample-datasource that has newer versions and has more relevant content.

Figure 3. Verify data sources

Step 3 – Perform a regular Amazon Kendra search

Next, we will test a regular search without any relevance tuning. Select Search console, and enter the search term Amazon Kendra VPC. Review your search results.

Figure 4. Regular Amazon Kendra search

In our example search results, the document from the second data source 39_kendra-dg_kendra-dg appears as the third result.

Step 4 – Relevance tuning through boosting

Now we will boost the first data source so documents from the first data source are displayed ahead of the other data sources.

Select data source, and boost the first data source sample-datasource to 8. Press the Save button to save your tuning. Wait several seconds for the changes to propagate.

Figure 5. Boost sample-datasource

Step 5 – Perform the search after boosting

Next, we will test the search with relevance tuning applied. In the search text box enter the search term Amazon Kendra VPC. Review your search results.

Figure 6. Searching after boosting

Notice that the search result no longer contains the document from the second data source.

Cleaning up

To avoid incurring any future charges, remove any index created specifically for this tutorial. In the Amazon Kendra console, select your index. Then select Actions, and choose Delete.

Figure 7. Delete index

Conclusion

In this blog post, we showed you how relevance tuning can be used to produce results ranked by their relevance. We also walked you through an example regarding how to manually perform relevance tuning at the index level in Amazon Kendra to boost your search results.

In addition to relevance tuning at the index level, you can also perform relevance tuning at the query level. Finally, check out the What is Amazon Kendra? and Relevance tuning with Amazon Kendra blog posts to learn more.

Create a custom Amazon S3 Storage Lens metrics dashboard using Amazon QuickSight

2021-08-30 Jignesh Gohel

Post Syndicated from Jignesh Gohel original https://aws.amazon.com/blogs/big-data/create-amazon-s3-storage-lens-metrics-dashboard-amazon-quicksight/

Companies use Amazon Simple Storage Service (Amazon S3) for its flexibility, durability, scalability, and ability to perform many things besides storing data. This has led to an exponential rise in the usage of S3 buckets across numerous AWS Regions, across tens or even hundreds of AWS accounts. To optimize costs and analyze security posture, Amazon S3 Storage Lens provides a single view of object storage usage and activity across your entire Amazon S3 storage. S3 Storage Lens includes an embedded dashboard to understand, analyze, and optimize storage with over 29 usage and activity metrics, aggregated for your entire organization, with drill-downs for specific accounts, Regions, buckets, or prefixes. In addition to being accessible in a dashboard on the Amazon S3 console, the raw data can also be scheduled for export to an S3 bucket.

For most customers, the S3 Storage Lens dashboard will cover all your needs. However, you may require specialized views of your S3 Storage Lens metrics, including combining data across multiple AWS accounts, or with external data sources. For such cases, you can use Amazon QuickSight, which is a scalable, serverless, embeddable, machine learning (ML)-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include ML-powered insights.

In this post, you learn how to use QuickSight to create simple customized dashboards to visualize S3 Storage Lens metrics. Specifically, this solution demonstrates two customization options:

Combining S3 Storage Lens metrics with external sources and being able to filter and visualize the metrics based on one or multiple accounts
Restricting users to view Amazon S3 metrics data only for specific accounts

You can further customize these dashboards based on your needs.

Solution architecture

The following diagram shows the high-level architecture of this solution. In addition to S3 Storage Lens and QuickSight, we use other AWS Serverless services like AWS Glue and Amazon Athena.

Solution Architecture for Amazon S3 Storage Lens custom metrics

The data flow includes the following steps:

S3 Storage Lens collects the S3 metrics and exports them daily to a user-defined S3 bucket. Note that first we need to activate S3 Storage Lens from the Amazon S3 console and configure it to export the file either in CSV or Apache Parquet format.
An AWS Glue crawler scans the data from the S3 bucket and populates the AWS Glue Data Catalog with tables. It automatically infers schema, format, and data types from the S3 bucket.
You can schedule the crawler to run at regular intervals to keep metadata, table definitions, and schemas in sync with data in the S3 bucket. It automatically detects new partitions in Amazon S3 and adds the partition’s metadata to the AWS Glue table.
Athena performs the following actions:
- Uses the table populated by the crawler in Data Catalog to fetch the schema.
- Queries and analyzes the data in Amazon S3 directly using standard SQL.
QuickSight performs the following actions:
- Uses the Athena connector to import the Amazon S3 metrics data.
- Fetches the external data from a custom CSV file.

To demonstrate this, we have a sample CSV file that contains the mapping of AWS account numbers to team names owning these accounts. QuickSight combines these datasets using the data source join feature.

When the combined data is available in QuickSight, users can create custom analysis and dashboards, apply appropriate QuickSight permissions, and share dashboards with other users.

At a high level, this solution requires you to complete the following steps:

Enable S3 Storage Lens in your organization’s payer account or designate a member account. For instructions to have a member account as a delegated administrator, see Enabling a delegated administrator account for Amazon S3 Storage Lens.
Set up an AWS Glue crawler, which populates the Data Catalog to query S3 Storage Lens data using Athena.
Use QuickSight to import data (using the Athena connector) and create custom visualizations and dashboards that can be shared across multiple QuickSight users or groups.

Enable and configure the S3 Storage Lens dashboard

S3 Storage Lens includes an interactive dashboard available on the Amazon S3 console. It shows organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to improve cost-efficiency and apply data protection best practices. First you need to activate S3 Storage Lens via the Amazon S3 console. After it’s enabled, you can access an interactive dashboard containing preconfigured views to visualize storage usage and activity trends, with contextual recommendations. Most importantly, it also provides the ability to export metrics in CSV or Parquet format to an S3 bucket of your choice for further use. We use this export metrics feature in our solution. The following steps provide details on how you can enable this feature in your account.

On the Amazon S3 console, under Storage Lens in the navigation pane, choose Dashboards.
Choose Create dashboard.

Create S3 Storage Dashboard

Provide the appropriate details in the Create dashboard
- Make sure to select Include all accounts in your organization, Include Regions, and Include all Regions.

S3 Storage Lens Dashboard Configure

S3 Storage Lens has two tiers: Free metrics, which is free of charge, automatically available for all Amazon S3 customers, and contains 15 usage-related metrics; and Advanced metrics and recommendations, which has an additional charge, but includes all 29 usage and activity metrics with 15-month data retention, and contextual recommendations. For this solution, we select Free metrics. If you need additional metrics, you may select Advanced metrics.

For Metrics export, select Enable.
For Choose and output format, select Apache Parquet.
For Destination bucket, select This account.
For Destination, enter your S3 bucket path.

S3 Storage Lens Metrics Export Configuration

We highly recommend following security best practices for the S3 bucket you use, along with server-side encryption available with export. You can use an Amazon S3 key (SSE-S3) or AWS Key Management Service key (SSE-KMS) as encryption key types.

Choose Create dashboard.

The data population process can take up to 48 hours. Proceed to the next steps only after the dashboard is available.

Set up the AWS Glue crawler

AWS Glue is serverless, fully managed extract, transform, and load (ETL) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. We can use the AWS Glue to discover data, transform it, and make it available for search and querying. The AWS Glue Data Catalog is an index to the location, schema, and runtime metrics of your data. Athena uses this metadata definition to query data available in Amazon S3 using simple SQL statements.

The AWS Glue crawler populates the Data Catalog with tables from various sources, including Amazon S3. When the crawler runs, it classifies data to determine the format, schema, and associated properties of the raw data, performs grouping of data into tables or partitions, and writes metadata to the Data Catalog. You can configure the crawler to run at specific intervals to make sure the Data Catalog is in sync with the underlying source data.

For our solution, we use these services to catalog and query exported S3 Storage Lens metrics data. First, we create the crawler via the AWS Glue console. For the purpose of this example, we provide an AWS CloudFormation template that deploys the required AWS resources. This template creates the CloudFormation stack with three AWS resources in your AWS account:

AWS Identity and Access Management (IAM) role
AWS Glue database
AWS Glue crawler

When you create your stack with the CloudFormation template, provide the following information:

AWS Glue database name
AWS Glue crawler name
S3 URL path pointing to the reports folder where S3 Storage Lens has exported metrics data. For example, s3://[Name of the bucket]/StorageLens/o-lcpjprs6wq/s3-storage-lense-parquet-v1/V_1/reports/.

After the stack is complete, navigate to the AWS Glue console and confirm that a new crawler job is listed on the Crawlers page. When the crawler runs for the first time, it creates the table reports in the Data Catalog. The Data Catalog may need to be periodically refreshed, so this job is configured to run every day at midnight to sync the data. You can change this configuration to your desired schedule.

After the crawler job runs, we can confirm that the data is accessible using the following query in Athena (make sure to run this query in the database provided in the CloudFormation template):

select * from reports limit 10

Running this query should return results similar to the following screenshot.

Query Results

Create a QuickSight dashboard

When the data is available to access using Athena, we can use QuickSight to create customized analytics and publish dashboards across multiple users. This process involves creating a new QuickSight dataset, creating the analysis using this dataset, creating the dashboard, and configuring user permissions and security.

To get started, you must be signed in to QuickSight using the same payer account. If you’re signing into QuickSight for the first time, you’re prompted to complete the initial signup process (for example, choosing QuickSight Enterprise Edition). You’re also required to provide QuickSight access to your S3 bucket and Athena. For instructions on adding permissions, see Insufficient Permissions When Using Athena with Amazon QuickSight.

In the QuickSight navigation pane, choose Datasets.
Choose New dataset and select Athena.

QuickSight Create Dataset

For Data source name, enter a name.
Choose Create data source.

QuickSight create Athena Dataset

For Catalog, choose AwsDataCatalog.
For Database, choose the AWS Glue database that contains the table for S3 Storage Lens.
For Tables, select your table (for this post, reports).

QuickSight table selection

Choose Edit dataset and choose the query mode SPICE.
Change the format of report_date and dt to Date.
Choose Save.

We can use the cross data source join feature in QuickSight to connect external data sources to the S3 Storage Lens dataset. For this example, let’s say we want to visualize the number of S3 buckets mapped to the internal teams. This data is external to S3 Storage Lens and stored in a CSV file, which contains the mapping between the account numbers and internal team names.

Account to Team Mapping

To import this data into our existing dataset, choose Add data.

QuickSight add external data

Choose Upload a file to import the CSV file to the dataset.

QuickSight Upload External File

We’re redirected to the join configuration screen. From here, we can configure the join type and join clauses to connect these two data sources. For more information about the cross data source join functionality, see Joining across data sources on Amazon QuickSight.

For this example, we need to perform the left join on columns aws_account_number (from the reports table) and Account (from the Account-to-Team-mapping table). This left join returns all records from the reports table and matching records from Account-to-Team-mapping.
Choose Apply after selecting this configuration.

QuickSight DataSet Join

Choose Save & visualize.

From here, you can create various analyses and visualizations on the imported datasets. For instructions on creating visualizations, see Creating an Amazon QuickSight Visual. We provide a sample template you can use to get the basic dashboard. This dashboard provides metrics for total Amazon S3 storage size, object count, S3 bucket by internal team, and more. It also allows authorized users to filter the metrics based on accounts and report dates. This is a simple report that can be further customized based on your needs.

Quicksight Final Dashboard

S3 Storage Lens’s IAM security policies don’t apply to imported data into QuickSight. So before you share this dashboard with anyone, one might want to restrict access according to the security requirement and business role of the user. For a comprehensive set of security features, see AWS Security in Amazon QuickSight. For implementation examples, see Applying row-level and column-level security on Amazon QuickSight dashboards. In our example, instead of all users having access to view S3 Storage Lens data for all accounts, you might want to restrict user access to only specific accounts.

QuickSight provides a feature called row-level security that can restrict user access to only a subset of table rows (or records). You can base the selection of these subsets of rows on filter conditions defined on specific columns.

For our current example, we want to allow user access to view the Amazon S3 metrics dashboard only for a few accounts. For this, we can use the column aws_account_number as filter criteria with account number values. We can implement this by creating a CSV file with columns named UserName and aws_account_number, and adding the rows for users and a list of account numbers (comma-separated). In the following example file, we have added a sample value for the user awslabs-qs-1 with a specific account. This means that user awslabs-qs-1 can only see the rows (or records) that match with the corresponding aws_account_number values specified in the permission CSV.

QuickSight Permissions file

For instructions on applying a permission rule file, see Using Row-Level Security (RLS) to Restrict Access to a Dataset.

You can further customize this QuickSight analysis to produce additional visualizations, apply additional permissions, and publish it to enterprise users and groups with various levels of security.

Conclusion

Harnessing the knowledge of S3 Storage Lens metrics with other custom data enables you to discover anomalies and identify cost-efficiencies across accounts. In this post, we used serverless components to build a workflow to use this data for real-time visualization. You can use this workflow to scale up and design an enterprise-level solution with a multi-account strategy and control fine-grained access to its data using the QuickSight row-level security feature.

About the Authors

Jignesh Gohel is a Technical Account Manager at AWS. In this role, he provides advocacy and strategic technical guidance to help plan and build solutions using best practices, and proactively keep customers’ AWS environments operationally healthy. He is passionate about building modular and scalable enterprise systems on AWS using serverless technologies. Besides work, Jignesh enjoys spending time with family and friends, traveling and exploring the latest technology trends.

Suman Koduri is a Global Category Lead for Data & Analytics category in AWS Marketplace. He is focused towards business development activities to further expand the presence and success of Data & Analytics ISVs in AWS Marketplace. In this role, he leads the scaling, and evolution of new and existing ISVs, as well as field enablement and strategic customer advisement for the same. In his spare time, he loves running half marathon’s and riding his motorcycle.

Building a serverless GIF generator with AWS Lambda: Part 1

2021-08-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-gif-generator-with-aws-lambda-part-1/

Many video streaming services show GIF animations in the frontend when users fast forward and rewind throughout a video. This helps customers see a preview and makes the user interface more intuitive.

Generating these GIF files is a compute-intensive operation that becomes more challenging to scale when there are many videos. Over a typical 2-hour movie with GIF previews every 30 seconds, you need 240 separate GIF animations to support this functionality.

In a server-based solution, you can use a library like FFmpeg to load the underlying MP4 file and create the GIF exports. However, this may be a serial operation that slows down for longer videos or when there are many different videos to process. For a service providing thousands of videos to customers, they need a solution where the encoding process keeps pace with the number of videos.

In this two-part blog post, I show how you can use a serverless approach to create a scalable GIF generation service. I explain how you can use parallelization in AWS Lambda-based workloads to reduce processing time significantly.

The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account. This walkthrough creates some resources covered in the AWS Free Tier but others incur cost. To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

Overview

To show the video player functionality, visit the demo front end. This loads the assets for an example video that is already processed. In the example, there are GIF animations every 30 seconds and freeze frames for every second of video. Move the slider to load the frame and GIF animation at that point in the video:

The demo site defaults to an existing video. After deploying the backend application, you can test your own videos here.
Move the slider to select a second in the video playback. This simulates a typical playback bar used in video application frontends.
This displays the frame at the chosen second, which is a JPG file created by the backend application.
The GIF animation for the selected playback point is a separate GIF file, created by the backend application.

Comparing server-based and serverless solutions

The example application uses FFmpeg, an open source application to record and process video. FFmpeg creates the GIF animations and individual frames for each second of video. You can use FFmpeg directly from a terminal or in scripts and applications. In comparing the performance, I use the AWS re:Invent 2019 keynote video, which is almost 3 hours long.

To show the server-based approach, I use a Node.js application in the GitHub repo. This loops through the video length in 30-second increments, calling FFmpeg with the necessary command line parameters:

const main = async () => {
	const length = 10323
	const inputVideo = 'test.mp4'
	const ffTmp = './output'
	const snippetSize = 30
 	const baseFilename = inputVideo.split('.')[0]

	console.time('task')
	for (let start = 0; start < length; start += snippetSize) {
		const gifName = `${baseFilename}-${start}.gif`
		const end = start + snippetSize -1

		console.log('Now creating: ', gifName)
		// Generates gif in local tmp
		await execPromise(`${ffmpegPath} -loglevel error -ss ${start} -to ${end} -y -i "${inputVideo}" -vf "fps=10,scale=240:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 ${ffTmp}/${gifName}`)
		await execPromise(`${ffmpegPath} -loglevel error -ss ${start} -to ${end} -i "${inputVideo}" -vf fps=1 ${ffTmp}/${baseFilename}-${start}-frame-%d.jpg`)
	}
	console.timeEnd('task')
}

Running this script from my local development machine, it takes almost 21 minutes to complete the operation and creates over 10,000 output files.

After deploying the example serverless application in the GitHub repo, I upload the video to the source Amazon S3 bucket using the AWS CLI:

aws s3 cp ./reinvent.mp4 s3://source-bucket-name --acl public-read

The completed S3 PutObject operation starts the GIF generation process. The function that processes the GIF files emits the following metrics on the Monitor tab in the Lambda console:

There are 345 invocations to process the source file into 30-second GIF animations.
The average duration for each invocation is 4,311 ms. The longest is 9,021 ms.
No errors occurred in processing the video.
Lambda scaled up to 344 concurrent execution environments.

After approximately 10 seconds, the conversion is complete and there are over 10,000 objects in the application’s output S3 bucket:

The main reason that the task duration is reduced from nearly 21 minutes to around 10 seconds is parallelization. In the server-based approach, each 30 second GIF is processed sequentially. In the Lambda-based solution, all of the 30-second clips are generated in parallel, at around the same time:

Solution architecture

The example application uses the following serverless architecture:

When the original MP4 video is put into the source S3 bucket, it invokes the first Lambda function.
The Snippets Lambda function detects the length of the video. It determines the number of 30-second snippets and then puts events onto the default event bus. There is one event for each snippet.
An Amazon EventBridge rule matches for events created by the first Lambda function. It invokes the second Lambda function.
The Process MP4 Lambda function receives the event as a payload. It loads the original video using FFmpeg to generate the GIF and per-second frames.
The resulting files are stored in the output S3 bucket.

The first Lambda function uses the following code to detect the length of the video and create events for EventBridge:

const createSnippets = async (record) => {
	// Get signed URL for source object
	const params = {
		Bucket: record.s3.bucket.name, 
		Key: record.s3.object.key, 
		Expires: 300
	}
	const url = s3.getSignedUrl('getObject', params)

	// Get length of source video
	const metadata = await ffProbe(url)
	const length = metadata.format.duration
	console.log('Length (seconds): ', length)

	// Build data array for DynamoDB
	const items = []
	const snippetSize = parseInt(process.env.SnippetSize)

	for (let start = 0; start < length; start += snippetSize) {
		items.push({
			key: record.s3.object.key,
			start,
			end: (start + snippetSize - 1),
			length,
			tsCreated: Date.now()
		})
	}
	// Send events to EventBridge
	await writeBatch(items)
}

The eventbridge.js file contains a function that sends the event array to the default bus in EventBridge. It uses the putEvents method in the EventBridge JavaScript SDK to send events in batches of 10:

const writeBatch = async (items) => {

    console.log('writeBatch items: ', items.length)

    for (let i = 0; i < items.length; i += BATCH_SIZE ) {
        const tempArray = items.slice(i, i + BATCH_SIZE)

        // Create new params array
        const paramsArray = tempArray.map((item) => {
            return {
                DetailType: 'newVideoCreated',
                Source: 'custom.gifGenerator',
                Detail: JSON.stringify ({
                    ...item
                })
            }
        })

        // Create params object for DDB DocClient
        const params = {
            Entries: paramsArray
        }

        // Write to DDB
        const result = await eventbridge.putEvents(params).promise()
        console.log('Result: ', result)
    }
}

The second Lambda function is invoked by an EventBridge rule, matching on the Source and DetailType values. Both the Lambda function and the rule are defined in the AWS SAM template:

  GifsFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: gifsFunction/
      Handler: app.handler
      Runtime: nodejs14.x
      Timeout: 30
      MemorySize: 4096
      Layers:
        - !Ref LayerARN
      Environment:
        Variables:
          GenerateFrames: !Ref GenerateFrames
          GifsBucketName: !Ref GifsBucketName
          SourceBucketName: !Ref SourceBucketName
      Policies:
        - S3ReadPolicy:
            BucketName: !Ref SourceBucketName
        - S3CrudPolicy:
            BucketName: !Ref GifsBucketName
      Events:
        Trigger:
          Type: EventBridgeRule 
          Properties:
            Pattern:        
              source:
                - custom.gifGenerator
              detail-type:
                - newVideoCreated

This Lambda function receives an event payload specifying the start and end time for the video clip it should process. The function then calls the FFmpeg application to generate the output files. It stores these in the local /tmp storage before uploading to the output S3 bucket:

const processMP4 = async (event) => {
    // Get settings from the incoming event
    const originalMP4 = event.detail.Key 
    const start =  event.detail.start
    const end =  event.detail.end

    // Get signed URL for source object
    const params = {
        Bucket: process.env.SourceBucketName, 
        Key: originalMP4, 
        Expires
    }
    const url = s3.getSignedUrl('getObject', params)
    console.log('processMP4: ', { url, originalMP4, start, end })

    // Extract frames from MP4 (1 per second)
    console.log('Create GIF')
    const baseFilename = params.Key.split('.')[0]
    
    // Create GIF
    const gifName = `${baseFilename}-${start}.gif`
    // Generates gif in local tmp
    await execPromise(`${ffmpegPath} -loglevel error -ss ${start} -to ${end} -y -i "${url}" -vf "fps=10,scale=240:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 ${ffTmp}/${gifName}`)
    // Upload gif to local tmp
    // Generate frames
    if (process.env.GenerateFrames === 'true') {    
        console.log('Capturing frames')
        await execPromise(`${ffmpegPath} -loglevel error -ss ${start} -to ${end} -i "${url}" -vf fps=1 ${ffTmp}/${baseFilename}-${start}-frame-%d.jpg`)
    }

    // Upload all generated files
    await uploadFiles(`${baseFilename}/`)

    // Cleanup temp files
    await tmpCleanup()
}

Creating and using the FFmpeg Lambda layer

FFmpeg uses operating system-specific binaries that may be different on your development machine from the Lambda execution environment. The easiest way to test the code on a local machine and deploy to Lambda with the appropriate binaries is to use a Lambda layer.

As described in the example application’s README file, you can create the FFmpeg Lambda layer by deploying the ffmpeg-lambda-layer application in the AWS Serverless Application Repository. After deployment, the Layers menu in the Lambda console shows the new layer. Copy the version ARN and use this as a parameter in the AWS SAM deployment:

On your local machine, download and install the FFmpeg binaries for your operating system. The package.json file for the Lambda functions uses two npm installer packages to help ensure the Node.js code uses the correct binaries when tested locally:

{
  "name": "gifs",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "James Beswick",
  "license": "MIT-0",
  "dependencies": {
  },
  "devDependencies": {
    "@ffmpeg-installer/ffmpeg": "^1.0.20",
    "@ffprobe-installer/ffprobe": "^1.1.0",
    "aws-sdk": "^2.927.0"
  }
}

Any npm packages specified in the devDependencies section of package.json are not deployed to Lambda by AWS SAM. As a result, local testing uses local ffmpeg binaries and Lambda uses the previously deployed Lambda layer. This approach also helps to reduce the size of deployment uploads and can help make the testing and deployment cycle faster in development.

Using FFmpeg from a Lambda function

FFmpeg is an application that’s generally used via a terminal. It takes command line parameters as options to determine input files, output locations, and the type of processing needed. To use terminal commands from Lambda, both functions use the asynchronous child_process module in Node.js. The execPromise function wraps this module in a Promise so the main function can use async/await syntax:

const { exec } = require('child_process')

const execPromise = async (command) => {
    console.log(command)
    return new Promise((resolve, reject) => {
        const ls = exec(command, function (error, stdout, stderr) {
          if (error) {
            console.error('Error: ', error)
            reject(error)
          }
          if (stdout) console.log('stdout: ', stdout)
          if (stderr) console.error('stderr: ' ,stderr)
        })
        
        ls.on('exit', (code) => {
          console.log('execPromise finished with code ', code)
          resolve()
        })
    })
}

As a result, you can then call FFmpeg by constructing a command line with options parameters and passing to execPromise:

await execPromise(`${ffmpegPath} -loglevel error -ss ${start} -to ${end} -i "${url}" -vf fps=1 ${ffTmp}/${baseFilename}-${start}-frame-%d.jpg`)

Alternatively, you can also use the fluent-ffmpeg npm library, which exposes the command line options as methods. The example application uses this in the ffmpeg-promisify.js file:

const ffmpeg = require("fluent-ffmpeg")
   ffmpeg(source)
      .noAudio()
      .size(`${IMG_WIDTH}x${IMG_HEIGHT}`)
      .setStartTime(startFormatted)
      .setDuration(snippetSize - 1)
      .output(outputFile)
      .on("end", async (err) => {
        // do work
      })
      .on("error", function (err) {
        console.error('FFMPEG error: ', err)
      })
      .run()

Deploying the application

In the GitHub repository, there are detailed deployment instructions for the example application. The repo contains separate directories for the demo frontend application, server-based script, and the two versions of backend service.

After deployment, you can test the application by uploading an MP4 video to the source S3 bucket. The output GIF and JPG files are written to the application’s destination S3 bucket. The files from each MP4 file are grouped in a folder in the bucket:

Frontend application

The frontend application allows you to visualize the outputs of the backend application. There is also a hosted version of this application. This accepts custom parameters to load graphics resources from S3 buckets in your AWS account. Alternatively, you can run the frontend application locally.

To launch the frontend application:

After cloning the repo, change to the frontend directory.
Run npm install to install Vue.js and all the required npm modules from package.json.
Run npm run serve to start the development server. After building the modules in the project, the terminal shows the local URL where the application is running:
Open a web browser and navigate to http://localhost:8080 to see the application:

Conclusion

In part 1 of this blog post, I explain how a GIF generation service can support a front-end application for video streaming. I compare the performance of a server-based and serverless approach and show how parallelization can significantly improve processing time. I walk through the solution architecture used in the example application and show how you can use FFmpeg in Lambda functions.

Part 2 covers advanced topics around this implementation. It explains the scaling behavior and considers alternative approaches, and looks at the cost of using this service.

For more serverless learning resources, visit Serverless Land.

Use Amazon ECS Fargate Spot with CircleCI to deploy and manage applications in a cost-effective way

2021-08-26 Pritam Pal

Post Syndicated from Pritam Pal original https://aws.amazon.com/blogs/devops/deploy-apps-cost-effective-way-with-ecs-fargate-spot-and-circleci/

This post is written by Pritam Pal, Sr EC2 Spot Specialist SA & Dan Kelly, Sr EC2 Spot GTM Specialist

Customers are using Amazon Web Services (AWS) to build CI/CD pipelines and follow DevOps best practices in order to deliver products rapidly and reliably. AWS services simplify infrastructure provisioning and management, application code deployment, software release processes automation, and application and infrastructure performance monitoring. Builders are taking advantage of low-cost, scalable compute with Amazon EC2 Spot Instances, as well as AWS Fargate Spot to build, deploy, and manage microservices or container-based workloads at a discounted price.

Amazon EC2 Spot Instances let you take advantage of unused Amazon Elastic Compute Cloud (Amazon EC2) capacity at steep discounts as compared to on-demand pricing. Fargate Spot is an AWS Fargate capability that can run interruption-tolerant Amazon Elastic Container Service (Amazon ECS) tasks at up to a 70% discount off the Fargate price. Since tasks can still be interrupted, only fault tolerant applications are suitable for Fargate Spot. However, for flexible workloads that can be interrupted, this feature enables significant cost savings over on-demand pricing.

CircleCI provides continuous integration and delivery for any platform, as well as your own infrastructure. CircleCI can automatically trigger low-cost, serverless tasks with AWS Fargate Spot in Amazon ECS. Moreover, CircleCI Orbs are reusable packages of CircleCI configuration that help automate repeated processes, accelerate project setup, and ease third-party tool integration. Currently, over 1,100 organizations are utilizing the CircleCI Amazon ECS Orb to power/run 250,000+ jobs per month.

Customers are utilizing Fargate Spot for a wide variety of workloads, such as Monte Carlo simulations and genomic processing. In this blog, I utilize a python code with the Tensorflow library that can run as a container image in order to train a simple linear model. It runs the training steps in a loop on a data batch and periodically writes checkpoints to S3. If there is a Fargate Spot interruption, then it restores the checkpoint from S3 (when a new Fargate Instance occurs) and continues training. We will deploy this on AWS ECS Fargate Spot for low-cost, serverless task deployment utilizing CircleCI.

Concepts

Before looking at the solution, let’s revisit some of the concepts we’ll be using.

Capacity Providers: Capacity providers let you manage computing capacity for Amazon ECS containers. This allows the application to define its requirements for how it utilizes the capacity. With capacity providers, you can define flexible rules for how containerized workloads run on different compute capacity types and manage the capacity scaling. Furthermore, capacity providers improve the availability, scalability, and cost of running tasks and services on Amazon ECS. In order to run tasks, the default capacity provider strategy will be utilized, or an alternative strategy can be specified if required.

AWS Fargate and AWS Fargate Spot capacity providers don’t need to be created. They are available to all accounts and only need to be associated with a cluster for utilization. When a new cluster is created via the Amazon ECS console, along with the Networking-only cluster template, the FARGATE and FARGATE_SPOT capacity providers are automatically associated with the new cluster.

CircleCI Orbs: Orbs are reusable CircleCI configuration packages that help automate repeated processes, accelerate project setup, and ease third-party tool integration. Orbs can be found in the developer hub on the CircleCI orb registry. Each orb listing has usage examples that can be referenced. Moreover, each orb includes a library of documented components that can be utilized within your config for more advanced purposes. Since the 2.0.0 release, the AWS ECS Orb supports the capacity provider strategy parameter for running tasks allowing you to efficiently run any ECS task against your new or existing clusters via Fargate Spot capacity providers.

Solution overview

Fargate Spot helps cost-optimize services that can handle interruptions like Containerized workloads, CI/CD, or Web services behind a load balancer. When Fargate Spot needs to interrupt a running task, it sends a SIGTERM signal. It is best practice to build applications capable of responding to the signal and shut down gracefully.

This walkthrough will utilize a capacity provider strategy leveraging Fargate and Fargate Spot, which mitigates risk if multiple Fargate Spot tasks get terminated simultaneously. If you’re unfamiliar with Fargate Spot, capacity providers, or capacity provider strategies, read our previous blog about Fargate Spot best practices here.

Prerequisites

Our walkthrough will utilize the following services:

GitHub as a code repository
AWS Fargate/Fargate Spot for running your containers as ECS tasks
CircleCI for demonstrating a CI/CD pipeline. We will utilize CircleCI Cloud Free version, which allows 2,500 free credits/week and can run 1 job at a time.

We will run a Job with CircleCI ECS Orb in order to deploy 4 ECS Tasks on Fargate and Fargate Spot. You should have the following prerequisites:

An AWS account
A GitHub account

Walkthrough

Step 1: Create AWS Keys for Circle CI to utilize.

Head to AWS IAM console, create a new user, i.e., circleci, and select only the Programmatic access checkbox. On the set permission page, select Attach existing policies directly. For the sake of simplicity, we added a managed policy AmazonECS_FullAccess to this user. However, for production workloads, employ a further least-privilege access model. Download the access key file, which will be utilized to connect to CircleCI in the next steps.

Step 2: Create an ECS Cluster, Task definition, and ECS Service

2.1 Open the Amazon ECS console

2.2 From the navigation bar, select the Region to use

2.3 In the navigation pane, choose Clusters

2.4 On the Clusters page, choose Create Cluster

2.5 Create a Networking only Cluster ( Powered by AWS Fargate)

Amazon ECS Create Cluster

This option lets you launch a cluster in your existing VPC to utilize for Fargate tasks. The FARGATE and FARGATE_SPOT capacity providers are automatically associated with the cluster.

2.6 Click on Update Cluster to define a default capacity provider strategy for the cluster, then add FARGATE and FARGATE_SPOT capacity providers each with a weight of 1. This ensures Tasks are divided equally among Capacity providers. Define other ratios for splitting your tasks between Fargate and Fargate Spot tasks, i.e., 1:1, 1:2, or 3:1.

ECS Update Cluster Capacity Providers

2.7 Here we will create a Task Definition by using the Fargate launch type, give it a name, and specify the task Memory and CPU needed to run the task. Feel free to utilize any Fargate task definition. You can use your own code, add the code in a container, or host the container in Docker hub or Amazon ECR. Provide a name and image URI that we copied in the previous step and specify the port mappings. Click Add and then click Create.

We are also showing an example of a python code using the Tensorflow library that can run as a container image in order to train a simple linear model. It runs the training steps in a loop on a batch of data, and it periodically writes checkpoints to S3. Please find the complete code here. Utilize a Dockerfile to create a container from the code.

Sample Docker file to create a container image from the code mentioned above.

FROM ubuntu:18.04
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt EXPOSE 5000 CMD python tensorflow_checkpoint.py

Below is the Code Snippet we are using for Tensorflow to Train and Checkpoint a Training Job.


def train_and_checkpoint(net, manager):
  ckpt.restore(manager.latest_checkpoint).expect_partial()
  if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
  else:
    print("Initializing from scratch.")
  for _ in range(5000):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        list_of_files = glob.glob('tf_ckpts/*.index')
        latest_file = max(list_of_files, key=os.path.getctime)
        upload_file(latest_file, 'pythontfckpt', object_name=None)
        list_of_files = glob.glob('tf_ckpts/*.data*')
        latest_file = max(list_of_files, key=os.path.getctime)
        upload_file(latest_file, 'pythontfckpt', object_name=None)
        upload_file('tf_ckpts/checkpoint', 'pythontfckpt', object_name=None)

2.8 Next, we will create an ECS Service, which will be used to fetch Cluster information while running the job from CircleCI. In the ECS console, navigate to your Cluster, From Services tab, then click create. Create an ECS service by choosing Cluster default strategy from the Capacity provider strategy dropdown. For the Task Definition field, choose webapp-fargate-task, which is the one we created earlier, enter a service name, set the number of tasks to zero at this point, and then leave everything else as default. Click Next step, select an existing VPC and two or more Subnets, keep everything else default, and create the service.

Step 3: GitHub and CircleCI Configuration

Create a GitHub repository, i.e., circleci-fargate-spot, and then create a .circleci folder and a config file config.yml. If you’re unfamiliar with GitHub or adding a repository, check the user guide here.

For this project, the config.yml file contains the following lines of code that configure and run your deployments.

version: '2.1'
orbs:
  aws-ecs: circleci/[email protected]
  aws-cli: circleci/[email protected]
  orb-tools: circleci/[email protected]
  shellcheck: circleci/[email protected]
  jq: circleci/[email protected]

jobs:  

  test-fargatespot:
      docker:
        - image: cimg/base:stable
      steps:
        - aws-cli/setup
        - jq/install
        - run:
            name: Get cluster info
            command: |
              SERVICES_OBJ=$(aws ecs describe-services --cluster "${ECS_CLUSTER_NAME}" --services "${ECS_SERVICE_NAME}")
              VPC_CONF_OBJ=$(echo $SERVICES_OBJ | jq '.services[].networkConfiguration.awsvpcConfiguration')
              SUBNET_ONE=$(echo "$VPC_CONF_OBJ" |  jq '.subnets[0]')
              SUBNET_TWO=$(echo "$VPC_CONF_OBJ" |  jq '.subnets[1]')
              SECURITY_GROUP_IDS=$(echo "$VPC_CONF_OBJ" |  jq '.securityGroups[0]')
              CLUSTER_NAME=$(echo "$SERVICES_OBJ" |  jq '.services[].clusterArn')
              echo "export SUBNET_ONE=$SUBNET_ONE" >> $BASH_ENV
              echo "export SUBNET_TWO=$SUBNET_TWO" >> $BASH_ENV
              echo "export SECURITY_GROUP_IDS=$SECURITY_GROUP_IDS" >> $BASH_ENV=$SECURITY_GROUP_IDS=$SECURITY_GROUP_IDS" >> $BASH_ENV" >> $BASH_ENV
              echo "export CLUSTER_NAME=$CLUSTER_NAME" >> $BASH_ENV
        - run:
            name: Associate cluster
            command: |
              aws ecs put-cluster-capacity-providers \
                --cluster "${ECS_CLUSTER_NAME}" \
                --capacity-providers FARGATE FARGATE_SPOT  \
                --default-capacity-provider-strategy capacityProvider=FARGATE,weight=1 capacityProvider=FARGATE_SPOT,weight=1\                --region ${AWS_DEFAULT_REGION}
        - aws-ecs/run-task:
              cluster: $CLUSTER_NAME
              capacity-provider-strategy: capacityProvider=FARGATE,weight=1 capacityProvider=FARGATE_SPOT,weight=1
              launch-type: ""
              task-definition: webapp-fargate-task
              subnet-ids: '$SUBNET_ONE, $SUBNET_TWO'
              security-group-ids: $SECURITY_GROUP_IDS
              assign-public-ip : ENABLED
              count: 4

workflows:
  run-task:
    jobs:
      - test-fargatespot

Now, Create a CircleCI account. Choose Login with GitHub. Once you’re logged in from the CircleCI dashboard, click Add Project and add the project circleci-fargate-spot from the list shown.

When working with CircleCI Orbs, you will need the config.yml file and environment variables under Project Settings.

The config file utilizes CircleCI version 2.1 and various Orbs, i.e., AWS-ECS, AWS-CLI, and JQ. We will use a job test-fargatespot, which uses a Docker image, and we will setup the environment. In config.yml we are using the jq tool to parse JSON and fetch the ECS cluster information like VPC config, Subnets, and Security Groups needed to run an ECS task. As we are utilizing the capacity-provider-strategy, we will set the launch type parameter to an empty string.

In order to run a task, we will demonstrate how to override the default Capacity Provider strategy with Fargate & Fargate Spot, both with a weight of 1, and to divide tasks equally among Fargate & Fargate Spot. In our example, we are running 4 tasks, so 2 should run on Fargate and 2 on Fargate Spot.

Parameters like ECS_SERVICE_NAME, ECS_CLUSTER_NAME and other AWS access specific details are added securely under Project Settings and can be utilized by other jobs running within the project.

Add the following environment variables under Project Settings

- AWS_ACCESS_KEY_ID – From Step 1
- AWS_SECRET_ACCESS_KEY – From Step 1
- AWS_DEFAULT_REGION – i.e. : – us-west-2
- ECS_CLUSTER_NAME – From Step 2
- ECS_SERVICE_NAME – From Step 2
- SECURITY_GROUP_IDS – Security Group that will be used to run the task

Circle CI Environment Variables

Step 4: Run Job

Now in the CircleCI console, navigate to your project, choose the branch, and click Edit Config to verify that config.xml is correctly populated. Check for the ribbon at the bottom. A green ribbon means that the config file is valid and ready to run. Click Commit & Run from the top-right menu.

Click build Status to check its progress as it runs.

CircleCI Project Dashboard

A successful build should look like the one below. Expand each section to see the output.

CircleCI Job Configuration

Return to the ECS console, go to the Tasks Tab, and check that 4 new tasks are running. Click each task for the Capacity provider details. Two tasks should have run with FARGATE_SPOT as a Capacity provider, and two should have run with FARGATE.

Congratulations!

You have successfully deployed ECS tasks utilizing CircleCI on AWS Fargate and Fargate Spot. If you have used any sample web applications, then please use the public IP address to see the page. If you have used the sample code that we provided, then you should see Tensorflow training jobs running on Fargate instances. If there is a Fargate Spot interruption, then it restores the checkpoint from S3 when a new Fargate Instance comes up and continues training.

Cleaning up

In order to avoid incurring future charges, delete the resources utilized in the walkthrough. Go to the ECS console and Task tab.

Delete any running Tasks.
Delete ECS cluster.
Delete the circleci user from IAM console.

Cost analysis in Cost Explorer

In order to demonstrate a cost breakdown between the tasks running on Fargate and Fargate Spot, we left the tasks running for a day. Then, we utilized Cost Explorer with the following filters and groups in order discover the savings by running Fargate Spot.

Apply a filter on Service for ECS on the right-side filter, set Group by to Usage Type, and change the time period to the specific day.

Cost analysis in Cost Explorer

The cost breakdown demonstrates how Fargate Spot usage (indicated by “SpotUsage”) was significantly less expensive than non-Spot Fargate usage. Current Fargate Spot Pricing can be found here.

Conclusion

In this blog post, we have demonstrated how to utilize CircleCI to deploy and manage ECS tasks and run applications in a cost-effective serverless approach by using Fargate Spot.

Author bio

	Pritam is a Sr. Specialist Solutions Architect on the EC2 Spot team. For the last 15 years, he evangelized DevOps and Cloud adoption across industries and verticals. He likes to deep dive and find solutions to everyday problems.
	Dan is a Sr. Spot GTM Specialist on the EC2 Spot Team. He works closely with Amazon Partners to ensure that their customers can optimize and modernize their compute with EC2 Spot.

Sending mobile push notifications and managing device tokens with serverless applications

2021-08-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-mobile-push-notifications-and-managing-device-tokens-with-serverless-application/

This post is written by Rafa Xu, Cloud Architect, Serverless and Joely Huang, Cloud Architect, Serverless.

Amazon Simple Notification Service (SNS) is a fast, flexible, fully managed push messaging service in the cloud. SNS can send mobile push notifications directly to applications on mobile devices such as message alerts and badge updates. SNS sends push notifications to a mobile endpoint created by supplying a mobile token and platform application.

When publishing mobile push notifications, a device token is used to generate an endpoint. This identifies where the push notification is sent (target destination). To push notifications successfully, the token must be up to date and the endpoint must be validated and enabled.

A common challenge when pushing notifications is keeping the token up to date. Tokens can automatically change due to reasons such as mobile operating system (OS) updates and application store updates.

This post provides a serverless solution to this challenge. It also provides a way to publish push notifications to specific end users by maintaining a mapping between users, endpoints, and tokens.

Overview

To publish mobile push notifications using SNS, generate an SNS endpoint to use as a destination target for the push notification. To create the endpoint, you must supply:

A mobile application token: The mobile operating system (OS) issues the token to the application. It is a unique identifier for the application and mobile device pair.
Platform Application Amazon Resource Name (ARN): SNS provides this ARN when you create a platform application object. The platform application object requires a valid set of credentials issued by the mobile platform, which you provide to SNS.

Once the endpoint is generated, you can store and reuse it again. This prevents the application from creating endpoints indefinitely, which could exhaust the SNS endpoint limit.

To reuse the endpoints and successfully push notifications, there are a number of challenges:

Mobile application tokens can change due to a number of reasons, such as application updates. As a result, the publisher must update the platform endpoint to ensure it uses an up-to-date token.
Mobile application tokens can become invalid. When this happens, messages won’t be published, and SNS disables the endpoint with the invalid token. To resolve this, publishers must retrieve a valid token and re-enable the platform endpoint
Mobile applications can have many users, each user could have multiple devices, or one device could have multiple users. To send a push notification to a specific user, a mapping between the user, device, and platform endpoints should be maintained.

For more information on best practices for managing mobile tokens, refer to this post.

Follow along the blog post to learn how to implement a serverless workflow for managing and maintaining valid endpoints and user mappings.

Solution overview

The solution uses the following AWS services:

Amazon API Gateway: Provides a token registration endpoint URL used by the mobile application. Once called, it invokes an AWS Lambda function via the Lambda integration.
Amazon SNS: Generates and maintains the target endpoint and manages platform application objects.
Amazon DynamoDB: Serverless database for storing endpoints that also maintains a mapping between the user, endpoint, and mobile operating system.
AWS Lambda: Retrieves endpoints from DynamoDB, validates and generates endpoints, and publishes notifications by making requests to SNS.

The following diagram represents a simplified interaction flow between the AWS services:

To register the token, the mobile app invokes the registration token endpoint URL generated by Amazon API Gateway. The token registration happens every time a user logs in or opens the application. This ensures that the token and endpoints are always valid during the application usage.

The mobile application passes the token, user, and mobileOS as parameters to API Gateway, which forwards the request to the Lambda function.

The Lambda function validates the token and endpoint for the user by making API calls to DynamoDB and SNS:

The Lambda function checks DynamoDB to see if the endpoint has been previously created.
1. If the endpoint does not exist, it creates a platform endpoint via SNS.
Obtain the endpoint attributes from SNS:
1. Check the “enabled” endpoint attribute and set to “true” to enable the platform endpoint, if necessary.
2. Validate the “token” endpoint attribute with the token provided in the API Gateway request. If it does not match, update the “token” attribute.
3. Send a request to SNS to update the endpoint attributes.
If a new endpoint is created, update DynamoDB with the new endpoint.
Return a successful response to API Gateway.

Deploying the AWS Serverless Application Model (AWS SAM) template

Use the AWS SAM template to deploy the infrastructure for this workflow. Before deploying the template, first create a platform application in SNS.

Navigate to the SNS console. Select Push Notifications on the left-hand menu to create a platform application:
This shows the creation of a platform application for iOS applications:
To install AWS SAM, visit the installation page.

To deploy the AWS SAM template, navigate to the directory where the template is located. Run the commands in the terminal:

git clone https://github.com/aws-samples/serverless-mobile-push-notification
cd serverless-mobile-push-notification
sam build
sam deploy --guided

Lambda function code snippets

The following section explains code from the Lambda function for the workflow.

Create the platform endpoint

If the endpoint exists, store it as a variable in the code. If the platform endpoint does not exist in the DynamoDB database, create a new endpoint:

        need_update_ddb = False
        response = table.get_item(Key={'username': username, 'appos': appos})
        if 'Item' not in response:
            # create endpoint
            response = snsClient.create_platform_endpoint(
                PlatformApplicationArn=SUPPORTED_PLATFORM[appos],
                Token=token,
            )
            devicePushEndpoint = response['EndpointArn']
            need_update_ddb = True
        else:
            # update the endpoint
            devicePushEndpoint = response['Item']['endpoint']

Check and update endpoint attributes

Check that the token attribute for the platform endpoint matches the token received from the mobile application through the request. This also checks for the endpoint “enabled” attribute and re-enables the endpoint if necessary:

response = snsClient.get_endpoint_attributes(
                EndpointArn=devicePushEndpoint
            )
            endpointAttributes = response['Attributes']

            previousToken = endpointAttributes['Token']
            previousStatus = endpointAttributes['Enabled']
            if previousStatus.lower() != 'true' or previousToken != token:
                snsClient.set_endpoint_attributes(
                    EndpointArn=devicePushEndpoint,
                    Attributes={
                        'Token': token,
                        'Enabled': 'true'
                    }
                )

Update the DynamoDB table with the newly generated endpoint

If a platform endpoint is newly created, meaning there is no item in the DynamoDB table, create a new item in the table:

        if need_update_ddb:
            table.update_item(
                Key={
                    'username': username,
                    'appos': appos
                },
                UpdateExpression="set endpoint=:e",
                ExpressionAttributeValues={
                    ':e': devicePushEndpoint
                },
                ReturnValues="UPDATED_NEW"
            )

As best practice, the code cleans up the table, in case there are multiple entries for the same endpoint mapped to different users. This can happen when the mobile application is used by multiple users on the same device. When one user logs out and a different user logs in, this creates a new entry in the DynamoDB table to map the endpoint with the new user.

As a result, you must remove the entry that maps the same endpoint to the previously logged in user. This way, you only keep the endpoint that matches the user provided by the mobile application through the request.

result = table.query(
    # Add the name of the index you want to use in your query.
        IndexName="endpoint-index",
        KeyConditionExpression=Key('endpoint').eq(devicePushEndpoint),
    )
    for item in result['Items']:
        if item['username'] != username and item['appos'] == appos:
            print(f"deleting orphan item: username {username}, os {appos}".format(username=item['username'], appos=appos))
            table.delete_item(
                Key={
                    'username': item['username'],
                    'appos': appos
                },
            )

Conclusion

This blog shows how to deploy a serverless solution for validating and managing SNS platform endpoints and tokens. To publish push notifications successfully, use SNS to check the endpoint attribute and ensure it is mapped to the correct token and the endpoint is enabled.

This approach uses DynamoDB to store the device token and platform endpoints for each user. This allows you to send push notifications to specific users, retrieve, and reuse previously created endpoints. You create a Lambda function to facilitate the workflow, including validating the DynamoDB item for storing an enabled and up-to-date token.

Visit this link to learn more about Amazon SNS mobile push notifications: http://docs.aws.amazon.com/sns/latest/dg/SNSMobilePush.html

For more serverless learning resources, visit Serverless Land.

Field Notes: Deploy and Visualize ROS Bag Data on AWS using rviz and Webviz for Autonomous Driving

2021-08-26 Aubrey Oosthuizen

Post Syndicated from Aubrey Oosthuizen original https://aws.amazon.com/blogs/architecture/field-notes-deploy-and-visualize-ros-bag-data-on-aws-using-rviz-and-webviz-for-autonomous-driving/

In the automotive industry, ROS bag files are frequently used to capture drive data from test vehicles configured with cameras, LIDAR, GPS, and other input devices. The data for each device is stored as a topic in the ROS bag file. Developers and engineers need to visualize and inspect the contents of ROS bag files to identify issues or replay the drive data.

There are a couple of challenges to be addressed by migrating the visualization workflow into Amazon Web Services (AWS):

Search, identify, and stream scenarios for ADAS engineers. Visualization tool should be ready instantly, load the data for a certain scenario over a search API, and show the first result through data streaming, to provide a good user experience.
Native integration with the tool chain. Many customers implement the Data Catalog, data storage, and search API in AWS native services. This visualization tool should be integrated into such a tool chain directly.

Overview of solutions

This blog post describes three solutions on how to deploy and visualize ROS bag data on AWS by using two popular visualization tools:

rviz is the standard open-source visualization tool used by the ROS community and has a large set of tools and plugin support.
Webviz is an open-source tool created by Cruise that provides modular and configurable browser-based visualization.

In the autonomous driving data lake reference architecture, both visualization tools are covered in the step 10: Provide an advanced analytics and visualization toolchain including search function for particular scenarios using AWS AppSync, Amazon QuickSight (KPI reporting and monitoring), and Webviz, rviz, or other tooling for visualization.

Prerequisites

Confirm you have followed the guide for working with the AWS CDK in TypeScript
Install the AWS Command Line Interface (AWS CLI) v2
Configure the AWS CLI
For solution 2: Any Virtual Network Computing (VNC) compatible viewer can be used. You can find the binaries for your OS for TigerVNC from GitHub.

Solution 1 – Visualize ROS bag files using rviz on AWS RoboMaker virtual desktops

AWS RoboMaker provides simulation and testing infrastructure for robotics as a managed service. This includes out of the box support for virtual desktops with ROS tooling preconfigured and installed. When you launch a virtual desktop, AWS RoboMaker launches the NICE DCV web browser client. This client provides access to your AWS Cloud9 desktop and streaming applications.

Launch rviz on AWS RoboMaker virtual desktop

Follow the guide on creating a new development environment to provision a new integrated development environment (IDE) and open it. After your AWS Cloud9 IDE is open, you can launch a new virtual desktop by pressing the Virtual Desktop button at the top center of the IDE, and selecting Launch Virtual Desktop. This might take a couple of seconds to open in a new browser tab.

After your virtual desktop is loaded, you can run rviz by opening the terminal and running the following commands:

$ source /opt/ros/melodic/setup.bash
$ roscore &
$ rosrun rviz rviz

Solution 2 – Visualize ROS bag files using rviz on EC2 and TigerVNC

Note: We strongly recommend using the AWS RoboMaker managed solution for provisioning virtual desktops for your visualization needs. In cases where this is not possible due to different Linux distributions or versions, this solution allows an alternative method for setting up a virtual desktop on EC2.

In this solution (source code) we use AWS Cloud Development Kit (AWS CDK) to deploy a new Ubuntu 18.04 AMI EC2 instance to your AWS account and preconfigure it with rviz, TigerVNC, and Ubuntu Desktop.

Figure 1. Architecture for solution 2 (visualize ROS bag files using rviz on EC2 and TigerVNC)

Open a shell terminal that has your AWS CLI configured. Run the following commands to clone the code and install the corresponding nodejs dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-rviz.git rviz-infra
$ cd rviz-infra
$ npm install

Note: Review the README to understand the project structure and commands.

Next, configure your project-specific settings, like your AWS Account, Region, VNC password, VPC to deploy the Amazon EC2 machine into, and EC2 instance type by running the bootstrap script:

$ npm run project-bootstrap

You will be prompted for various inputs to bootstrap the project, including the VNC password to use. Most of these input values will be stored into your cdk.json file, and the VNC password will be stored in the AWS Systems Manager Parameter Store, a capability of AWS Systems Manager.

Run the following command to deploy your stack into your AWS account.

$ cdk deploy

After the stack has been deployed, and the EC2 instance provisioned, its user-data script will initiate and install TigerVNC and the required ROS tooling.

To see the progress, let’s connect to the instance using SSH and then tail the bootstrapping log.

$ ./ssm.sh ssh
$ sudo su ubuntu
$ tail -f /var/log/cloud-init-output.log

It takes approximately 15 minutes for the user-data bootstrapping script to finish. When it finishes, you will see the message “rviz-setup bootstrapping completed. You can now log in via VNC”.

Open a new shell terminal in the project root and start a port forwarding session using SSM through the ssm helper script:

$ ./ssm.sh vnc

After you see the waiting for connections output, you can open your VNC viewer and connect to localhost:5901.

When prompted for a password, enter the one you used when previously running the bootstrapping script.

You now have access to your Ubuntu Desktop environment.

Running rviz

If you opted to install sample data, the Ford AV Sample Data has been downloaded and installed on the EC2 instance already. To visualize it in rviz, a few helper scripts have been created and can be run by opening a new terminal and initiating the following commands:

$ cd /home/ubuntu/catkin_ws
$ ./0-run-all.sh

This helper script will open and run rviz on the Ford sample data.

Figure 2. Ford AV Dataset visualized with rviz on Ubuntu Desktop

You should now be able to use this server to run and visualize your ROS bag files using rviz.

Solution 3 – Visualize ROS bag files using Webviz on Amazon Elastic Container Service (Amazon ECS)

The third solution (source code) uses AWS CDK to deploy Webviz as a container running on AWS Fargate, fronted by an Application Load Balancer (ALB). In addition, the Infrastructure as Code (IaC) can either create a new Amazon S3 bucket or import an existing one. The S3 bucket would have its cross-origin resource sharing (CORS) rules updated to allow streaming bag files from your ALB domain.

The bucket and ROS bag files won’t need to be made public since we will use presigned Amazon S3 URLs for authorizing the streaming of the files.

Finally, the AWS CDK code deploys an AWS Lambda function that can be invoked to generate a properly formatted Webviz streaming URL that contains the HTTP encoded and presigned URL for streaming your bag files directly in the browser.

Figure 3. Architecture for solution 3 (visualize ROS bag files using Webviz on Amazon ECS)

To clone the repository and install its dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-webviz.git webviz-infra
$ cd webviz-infra
$ npm install

Note: Review the project README to understand the different files, project structures, and commands.

Next, modify the cdk.json to specify your specific project configurations (for example, region, bucket name, and whether you wish to import an existing bucket or create a new one).

{
  "context": {
    ....
    "bucketName": "<bucket_name>", // [required] Name of bucket to use or create
    "bucketExists": true, // [required] Should create or update existing bucket
    "generateUrlFunctionName": "generate_ros_streaming_url", // [required] Name of lambda function
    "scenarioDB": { // [optional] Configuration of SceneDescription table
      "partitionKey": "bag_file",
      "sortKey": "scene_id",
      "region": "eu-west-1",
      "tableName": "SceneDescriptions"
    }
  }
}

To deploy the CDK stack into your AWS account run the following command.

$ cdk deploy

You can now access your Webviz instance by opening the URL value for the webvizserviceServiceURL output from the previous deploy command.

Figure 4. Example showing Webviz being served by our ALB

Custom layouts for Webviz can be imported through json configs. The solution contains an example config in the project root. This custom layout contains the topic configurations and window layouts specific to our ROS bag format and should be modified according to your ROS bag topics.

Select Config → Import/Export Layout
Copy and paste the contents of layout.json

Figure 5. Configuring custom layout for Webviz

Next, you need some ROS bag files to stream into your S3 bucket. If you configured AWS CDK to use an existing bucket, and the bucket already contains some ROS bag files, then you can skip the next step.

Upload ROS bag files

You can use the AWS CLI to copy a local bag file to the specified S3 bucket using the aws s3 cp command. You can also copy files between S3 buckets with the aws s3 cp command.

Generate streaming URL with helper script

The code repository contains a Python helper script in the project root to invoke your deployed Lambda function generate_ros_streaming_url locally with the correct payload.

To run the helper script, run the following command in your terminal:

$ python get_url.py \
--bucket-name <bucket_name> \
--key <ros_bag_key>

Response format: http://webviz-lb-<account>.<region>.elb.amazonaws.com?remote-bag-url=<presigned-url>

The response URL can be opened directly in your browser to visualize your targeted ROS bag file.

Generate a streaming URL through Lambda function in the AWS Console

You can generate a streaming URL by invoking your Lambda function generate_ros_streaming_url with the following example payload in the AWS console.

{
"key": "<ros_bag_key>",
"bucket": "<bucket_name>",
"seek_to": "<ros_timestamp>"
}

The seek_to value informs the Lambda function to add a parameter to jump to the specified ROS timestamp when generating the streaming URL.

Example output:

{
"statusCode": 200,
"body":
"{\"url\": \"http://webviz-lb-<domain>.<region>.elb.amazonaws.com?remote-bag-url=<PRESIGNED_ENCODED_URL>&seek-to=<ros_timestamp>\"}"
}

This body output URL can be opened in your browser to start visualizing your bag files directly.

By using a Lambda function to generate the streaming URL, you have the flexibility to integrate it with other user interfaces or dashboards. For example, if you use Amazon QuickSight to visualize different detected scenarios, you can define a customer action to invoke the Lambda function through API Gateway to get a streaming URL for the target scenario.

Similarly, custom web applications can be used to visualize the scenes and their corresponding ROS bag files stored in a metadata store. Invoke the Lambda function from your web server to generate and return a visualization URL that can be use by the web application.

Using the streaming URL

Open the streaming URL in your browser. If you added a seek_to value while generating the URL, it should jump to that point in the bag file.

Figure 6. Example visualization of ROS bag file streamed from Amazon S3

That’s it. You should now start to see your ROS bag file being streamed directly from Amazon S3 in your browser.

Deploying Webviz as part of the Autonomous Driving Data Lake Reference Solution

This solution forms part of the autonomous driving data lake solution which consists of a reference architecture and corresponding field notes and open-source code modules:

If this Webviz solution is deployed in conjunction with Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow (ASD) it supports some out-of-the-box integration with solution 3.

You can configure the solution’s cdk.json to specify the relevant values for the SceneDescription table created by the ASD CDK code. Redeploy the stack after changing this using $ cdk deploy.

With these values, the Lambda function generate_ros_streaming_url now supports an additional payload format:

{
"record_id": “<scene_description_table_partition_key>”,
"scene_id": “<scene_description_table_sort_key>”
}

The get_url.py script also supports the additional scene lookup parameters. To look up a scene stored in your SceneDescription table run the following commands in your terminal:

$ python get_url.py –-record <scene_description_table_partition_key> --scene <scene_description_table_sort_key>

Invoking the generate_ros_streaming_url with the record and scene parameters will result in a lookup of the ROS bag file for the scene from DynamoDB, presigning the ROS bag file and returning an URL to stream the file directly in your browser.

Cleaning Up

To clean up the AWS RoboMaker development environment, review Deleting an Environment.

For the CDK application, you can destroy your CDK stack by running $ cdk destroy from your terminal. Some buckets will need to be manually emptied and deleted from the AWS console.

Conclusion

This blog post illustrated how to deploy two common tools used to visualize ROS bag files, using three different solutions. First, we showed you how to set up an AWS RoboMaker development environment and run rviz. Second, we showed you how to deploy an Amazon EC2 machine that automatically configures Ubuntu-Desktop with TigerVNC and rviz preinstalled. Third, we showed you how to deploy Webviz on Fargate and configure a bucket to allow streaming bag files. Finally, you learned how streaming URLs can be generated and integrated into your custom scenario detection and visualization tools.

We hope you found this post interesting and helpful in extending your autonomous vehicle solutions, and invite your comments and feedback.

Field Notes: How to Scale OpenTravel Messaging Architecture with Amazon Kinesis Data Streams

2021-08-25 Danny Gagne

Post Syndicated from Danny Gagne original https://aws.amazon.com/blogs/architecture/field-notes-how-to-scale-opentravel-messaging-architecture-with-amazon-kinesis-data-streams/

The travel industry relies on OpenTravel messaging systems to collect and distribute travel data—like hotel inventory and pricing—to many independent ecommerce travel sites. These travel sites need immediate access to the most current hotel inventory and pricing data. This allows shoppers access to the available rooms at the right prices. Each time a room is reserved, unreserved, or there is a price change, an ordered message with minimal latency must be sent from the hotel system to the travel site.

Overview of Solution

In this blog post, we will describe the architectural considerations needed to scale a low latency FIFO messaging system built with Amazon Kinesis Data Streams.

The architecture must satisfy the following constraints:

Messages must arrive in order to each destination, with respect to their hotel code.
Messages are delivered to multiple destinations independently.

Kinesis Data Streams is a managed service that enables real-time processing of streaming records. The service provides ordering of records, as well as the ability to read and replay records in the same order to multiple Amazon Kinesis applications. Kinesis data stream applications can also consume data in parallel from a stream through parallelization of consumers.

We will start with an architecture that supports one hotel with one destination, and scale it to support 1,000 hotels and 100 destinations (38,051 records per second). With the OpenTravel use case, the order of messages matters only within a stream of messages from a given hotel.

Smallest scale: One hotel with one processed delivery destination. One million messages a month.

Figure 1: Architecture showing a system sending messages from 1 hotel to 1 destination (.3805 RPS)

Full Scale: 1,000 Hotels with 100 processed delivery destination. 100 billion messages a month.

Figure 2: Architecture showing a system sending messages from 1 hotel to 1 destination (38,051 RPS)

In the example application, OpenTravel XML message data is the record payload. The record is written into the stream with a producer. The record is read from the stream shard by the consumer and sent to several HTTP destinations.

Each Kinesis data stream shard supports writes up to 1,000 records per second, up to a maximum data write total of 1 MB per second. Each shard supports reads up to five transactions per second, up to a maximum data read total of 2 MB per second. There is no upper quota on the number of streams you can have in an account.

Streams	Shards	Writes/Input limit	Reads/Output limit
1	1	1 MB per second 1,000 records per second	2 MB per second
1	500	500 MB per second 500,000 records per second	1,000 MB per second

OpenTravel message

In the following OpenTravel message, there is only one field that is important to our use case: HotelCode. In OTA (OpenTravel Agency) messaging, order matters, but it only matters in the context of a given hotel, specified by the HotelCode. As we scale up our solution, we will use the HotelCode as a partition key. Each destination receives the same messages. If we are sending 100 destinations, then we will send the message 100 times. The average message size is 4 KB.

<HotelAvailNotif>
   <request>
   <POS>
   <Source>
   <RequestorID ID = "1000">
   </RequestorID>
   <BookingChannel>
   <CompanyName Code = "ClientTravelAgency1">
   </CompanyName>
   </BookingChannel>
   </Source>
   </POS>
   <AvailStatusMessages HotelCode="100">
   <AvailStatusMessage BookingLimit="9">
   <StatusApplicationControl Start="2013-12-20" End="2013-12-25" RatePlanCode="BAR" InvCode="APT" InvType="ROOM" Mon="true" Tue="true" Weds="true" Thur="false" Fri="true" Sat="true" Sun="true"/>
   <LengthsOfStay ArrivalDateBased="true">
   <LengthOfStay Time="2" TimeUnit="Day" MinMaxMessageType="MinLOS"/>
   <LengthOfStay Time="8" TimeUnit="Day" MinMaxMessageType="MaxLOS"/>
   </LengthsOfStay>
   <RestrictionStatus Status="Open" SellThroughOpenIndicator="false" MinAdvancedBookingOffset="5"/>
   </AvailStatusMessage>
   </AvailStatusMessages>
</request>
</HotelAvailNotif>

Source: https://github.com/XML-Travelgate/xtg-content-articles-pub/blob/master/docs/hotel-push/HotelPUSH.md

Producer and consumer message error handling

Asynchronous message processing presents challenges with error handling, because an error can be logged to the producer or consumer application logs. Short-lived errors can be resolved with a delayed retry. However, there are cases when the data is bad and the message will always cause an error; these messages should be added to a dead letter queue (DLQ).

Producer retries and consumer retries are some of reasons why records may be delivered more than once. Kinesis Data Streams does not automatically remove duplicate records. If the destination needs a strict guarantee of uniqueness, the message must have a primary key to remove duplicates when processed in the client (Handling Duplicate Records). In most cases, the destination can mitigate duplicate messages by processing the same message multiple times in a way that produces the same result (idempotence). If a destination endpoint becomes unavailable or is not performant, the consumer should back off and retry until the endpoint is available. The failure of a single destination endpoint should not impact the performance of delivery of messages to other destinations.

Messaging with one hotel and one destination

When starting at the smallest scale—one hotel and one destination—the system must support one million messages per month. This volume expects input of 4 GB of message data per month at a rate of 0.3805 records per second, and .0015 MB per second both input and output. For the system to process one hotel to one destination, it needs at least one producer, one stream, one shard, and a single consumer.

Hotels/Destinations	Streams	Shards	Consumers (standard)	Input	Output
1 hotel, 1 destination (4-KB average record size)	1	1	1	0.3805 records per second, or .0015 MB per second	0.3805 records per second, or .0015 MB per second
Maximum stream limits	1	1	1	250 (4-KB records) per second 1 MB per second (per stream)	500 (4-KB records) per second 2 MB per second (per stream)

With this design, the single shard supports writes up to 250 records per second, up to a maximum data write total of 1 MB per second. PutRecords request supports up to 500 records and each record in the request can be as large as 1 MB. Because the average message size is 4 KB, it enables 250 records per second to be written to the shard.

Figure 3: Architecture using Amazon Kinesis Data Streams (1 hotel, 1 destination)

The single shard can also support consumer reads up to five transactions per second, up to a maximum data read total of 2 MB per second. The producer writes to the shard in FIFO order. A consumer pulls from the shard and delivers to the destination in the same order as received. In this one hotel and one destination example, the consumer read capacity would be 500 records per second.

Figure 4: Records passing through the architecture

Messaging with 1,000 hotels and one destination

Next, we will maintain a single destination and scale the number of hotels from one hotel to 1,000 hotels. This scale increases the producer inputs from one million messages to one billion messages a month. This volume expects an input of 4 TB of message data per month at a rate of 380 records per second, and 1.486 MB per second.

Hotels/Destinations	Streams	Shards	Consumers (standard)	Input	Output
1,000 hotels, 1 destination (4-KB average record size)	1	2	1	380 records per second, or 1.486 MB per second	380 records per second, or 1.486 MB per second
Maximum stream limits	1	2	1	500 (4-KB records) per second 2 MB per second (per stream)	1,000 (4-KB records) per second 4 MB per second (per stream)

The volume of incoming messages increased to 1.486 MB per second, requiring one additional shard. A shard can be split using the API to increase the overall throughput capacity. The write capacity is increased by distributing messages to the shards. We will use the HotelCode as a partition key, to maintain order of messages with respect to a given hotel. This will distribute the writes into the two shards. The messages for each HotelCode will be written into the same shard and maintain FIFO order. By using the HotelCode as a partition key, it doubles the stream write capacity to 2 MB per second.

A single consumer can read the records from each shard at 2 MB per second per shard. The consumer will iterate through each shard, processing the records in FIFO order based on the HotelCode. It then sends messages to the destination in the same order they were sent from the producer.

Messaging with 1,000 hotels and five destinations

As we scale up further, we maintain 1,000 hotels and scale the delivery to five destinations. This increases the consumer reads from one billion messages to five billion messages a month. It has the same input of 4 TB of message data per month at a rate of 380 records per second, and 1.486 MB per second. The output is now 326 GB of message data per month at a rate of 1,902.5 records per second, and 7.43 MB per second.

Hotels/Destinations	Streams	Shards	Consumers (standard)	Input	Output
1,000 hotels, 5 destinations (4-KB average record size)	1	4	5	380 records per second, or 1.486 MB per second	1,902.5 records per second, or 7.43 MB per second
Maximum stream capacity	1	4	5	1,000 (4-KB records) records per second (per shard) 4 MB per second (per stream)	2,000 (4-KB records) records per second per shard with a standard consumer 8 MB per second (per stream)

To support this scale, we increase the number of consumers to match the destinations. A dedicated consumer for each destination allows the system to have committed capacity to each destination. If a destination fails or becomes slow, it will not impact other destination consumers.

Figure 5: Four Kinesis shards to support read throughput

We increase to four shards to support additional read throughput required for the increased destination consumers. Each destination consumer iterates through the messages in each shard in FIFO order based on the HotelCode. It then sends the messages to the assigned destination maintaining the hotel specific ordering. Like in the previous 1,000-to-1 design, the messages for each HotelCode are written into the same shard while maintaining its FIFO order.

The distribution of messages per HotelCode should be monitored with the metrics like WriteProvisionedThroughputExceeded and ReadProvisionedThroughputExceeded to ensure an even distribution between shards, because HotelCode is the partition key. If there is uneven distribution, it may require a dedicated shard for each HotelCode. In the following examples, it is assumed that there is an even distribution of messages.

Figure 6: Architecture showing 1,000 Hotels, 4 Shards, 5 Consumers, and 5 Destinations

Messaging with 1,000 hotels and 100 destinations

With 1,000 hotels and 100 destinations, the system processes 100 billion messages per month. It continues to have the same number of incoming messages, 4 TB of message data per month, at a rate of 380 records per second, and 1.486 MB per second. The number of destinations has increased from 4 to 100, resulting in 390 TB of message data per month, at a rate of 38,051 records per second, and 148.6 MB per second.

Hotels/Destinations	Streams	Shards	Consumers (standard)	Input	Output
1,000 hotels, 100 destinations (4-KB average record size)	1	78	100	380 records per second, or 1.486 MB per second	38,051 records per second, or 148.6 MB per second
Maximum stream capacity	1	78	100	19,500 (4-KB records) records per second (per stream) 78 MB per second (per stream)	39,000 (4-KB records) records per second per stream with a standard consumer 156 MB per second (per stream)

The number of shards increases from four to 78 to support the required read throughput needed for 100 consumers, to attain a read capacity of 156 MB per second with 78 shards.

Figure 7: Architecture with 1,000 hotels, 78 Kinesis shards, 100 consumers, and 100 destinations

Next, we will look at a different architecture using a different type of consumer called the enhanced fan-out consumer. The enhanced fan-out consumer will improve the efficiency of the stream throughput and processing efficiency.

Lambda enhanced fan-out consumers

Enhanced fan-out consumers can increase the per shard read consumption throughput through event-based consumption, reduce latency with parallelization, and support error handling. The enhanced fan-out consumer increases the read capacity of consumers from a shared 2 MB per second, to a dedicated 2 MB per second for each consumer.

When using Lambda as an enhanced fan-out consumer, you can use the Event Source Mapping Parallelization Factor to have one Lambda pull from one shard concurrently with up to 10 parallel invocations per consumer. Each parallelized invocation contains messages with the same partition key (HotelCode) and maintains order. The invocations complete each message before processing with the next parallel invocation.

Figure 8: Lambda fan-out consumers with parallel invocations, maintaining order

Consumers can gain performance improvements by setting a larger batch size for each invocation. When setting the Event Source Mapping batch size, the consumer can set the maximum number of records that the Lambda will retrieve from the stream at the time of invoking the function. The Event Source Mapping batch window can be used to adjust the time to wait to gather records for the batch before invoking the function.

The Lambda Event Source Mapping has a setting ReportBatchItemFailures that can be used to keep track of the last successfully processed record. When the next invocation of the Lambda starts the batch from the checkpoint, it starts with the failed record. This will occur until the maximum number of retries for the failed record occurs and it is expired. If this feature is enabled and a failure occurs, Lambda will prioritize checkpointing, over other set mechanisms, to minimize duplicate processing.

Lambda has built-in support for sending old or exhausted retries to an on-failure destination with the option of Amazon Simple Queue Service as a DLQ or SNS topic. The Lambda consumer can be configured with maximum record retries and maximum record age, so repeated failures are sent to a DLQ and handled with a separate Lambda function.

Figure 9: Lambda fan-out consumers with parallel invocations, and error handling

Messaging with Lambda fan-out consumers at scale

In the 1,000 hotel and 100 destination scenario, we will scale by shard and stream. Lambda fan-out consumers have a hard quota of 20 fanout-out consumers per stream. With one consumer per destination, 100 fan-out consumers will need five streams. The producer will write to one stream, which has a consumer that writes to the five streams that our destination consumers are reading from.

Hotels/Destinations	Streams	Shards	Consumers (fan-out)	Input	Output
1,000 hotels, 100 destinations (4-KB average record size)	5	10 (2 each stream)	100	380 records per second, or 74.3 MB per second	38,051 records per second, or 148.6 MB per second
Maximum stream capacity	5	10 (2 each stream)	100	500 (4-KB records) per second per stream 2 MB per second per stream	40,000 (4-KB records) per second per stream 200,000 (4-KB records) per second with 5 streams 40 MB per second per stream 200 MB per second with 5 streams

Figure 10: Architecture 1,000 hotels, 100 destinations, multiple streams, and fan-out consumers

This architecture increases throughput to 200 MB per second, with only 12 shards, compared to 78 shards with standard consumers.

Conclusion

In this blog post, we explained how to use Kinesis Data Streams to process low latency ordered messages at scale with OpenTravel data. We reviewed the aspects of efficient processing and message consumption, scaling considerations, and error handling scenarios. We explored multiple architectures, one dimension of scale at a time, to demonstrate several considerations for scaling the OpenTravel messaging system.

References

Build Next-Generation Microservices with .NET 5 and gRPC on AWS

2021-08-23 Matt Cline

Post Syndicated from Matt Cline original https://aws.amazon.com/blogs/devops/next-generation-microservices-dotnet-grpc/

Modern architectures use multiple microservices in conjunction to drive customer experiences. At re:Invent 2015, AWS senior project manager Rob Brigham described Amazon’s architecture of many single-purpose microservices – including ones that render the “Buy” button, calculate tax at checkout, and hundreds more.

Microservices commonly communicate with JSON over HTTP/1.1. These technologies are ubiquitous and human-readable, but they aren’t optimized for communication between dozens or hundreds of microservices.

Next-generation Web technologies, including gRPC and HTTP/2, significantly improve communication speed and efficiency between microservices. AWS offers the most compelling experience for builders implementing microservices. Moreover, the addition of HTTP/2 and gRPC support in Application Load Balancer (ALB) provides an end-to-end solution for next-generation microservices. ALBs can inspect and route gRPC calls, enabling features like health checks, access logs, and gRPC-specific metrics.

This post demonstrates .NET microservices communicating with gRPC via Application Load Balancers. The microservices run on AWS Graviton2 instances, utilizing a custom-built 64-bit Arm processor to deliver up to 40% better price/performance than x86.

Architecture Overview

Modern Tacos is a new restaurant offering delivery. Customers place orders via mobile app, then they receive real-time status updates as their order is prepared and delivered.

The tutorial includes two microservices: “Submit Order” and “Track Order”. The Submit Order service receives orders from the app, then it calls the Track Order service to initiate order tracking. The Track Order service provides streaming updates to the app as the order is prepared and delivered.

Each microservice is deployed in an Amazon EC2 Auto Scaling group. Each group is behind an ALB that routes gRPC traffic to instances in the group.

Shows the communication flow of gRPC traffic from users through an ALB to EC2 instances. — This architecture is simplified to focus on ALB and gRPC functionality. Microservices are often deployed in
containers for elastic scaling, improved reliability, and efficient resource utilization. ALB, gRPC, and .NET all work equally effectively in these architectures.

Comparing gRPC and JSON for microservices

Microservices typically communicate by sending JSON data over HTTP. As a text-based format, JSON is readable, flexible, and widely compatible. However, JSON also has significant weaknesses as a data interchange format. JSON’s flexibility makes enforcing a strict API specification difficult — clients can send arbitrary or invalid data, so developers must write rigorous data validation code. Additionally, performance can suffer at scale due to JSON’s relatively high bandwidth and parsing requirements. These factors also impact performance in constrained environments, such as smartphones and IoT devices. gRPC addresses all of these issues.

gRPC is an open-source framework designed to efficiently connect services. Instead of JSON, gRPC sends messages via a compact binary format called Protocol Buffers, or protobuf. Although protobuf messages are not human-readable, they utilize less network bandwidth and are faster to encode and decode. Operating at scale, these small differences multiply to a significant performance gain.

gRPC APIs define a strict contract that is automatically enforced for all messages. Based on this contract, gRPC implementations generate client and server code libraries in multiple programming languages. This allows developers to use higher-level constructs to call services, rather than programming against “raw” HTTP requests.

gRPC also benefits from being built on HTTP/2, a major revision of the HTTP protocol. In addition to the foundational performance and efficiency improvements from HTTP/2, gRPC utilizes the new protocol to support bi-directional streaming data. Implementing real-time streaming prior to gRPC typically required a completely separate protocol (such as WebSockets) that might not be supported by every client.

gRPC for .NET developers

Several recent updates have made gRPC more useful to .NET developers. .NET 5 includes significant performance improvements to gRPC, and AWS has broad support for .NET 5. In May 2021, the .NET team announced their focus on a gRPC implementation written entirely in C#, called “grpc-dotnet”, which follows C# conventions very closely.

Instead of working with JSON, dynamic objects, or strings, C# developers calling a gRPC service use a strongly-typed client, automatically generated from the protobuf specification. This obviates much of the boilerplate validation required by JSON APIs, and it enables developers to use rich data structures. Additionally, the generated code enables full IntelliSense support in Visual Studio.

For example, the “Submit Order” microservice executes this code in order to call the “Track Order” microservice:

using var channel = GrpcChannel.ForAddress("https://track-order.example.com");

var trackOrderClient = new TrackOrder.Protos.TrackOrder.TrackOrderClient(channel);

var reply = await trackOrderClient.StartTrackingOrderAsync(new TrackOrder.Protos.Order
{
    DeliverTo = "Address",
    LastUpdated = Timestamp.FromDateTime(DateTime.UtcNow),
    OrderId = order.OrderId,
    PlacedOn = order.PlacedOn,
    Status = TrackOrder.Protos.OrderStatus.Placed
});

This code calls the StartTrackingOrderAsync method on the Track Order client, which looks just like a local method call. The method intakes a data structure that supports rich data types like DateTime and enumerations, instead of the loosely-typed JSON. The methods and data structures are defined by the Track Order service’s protobuf specification, and the .NET gRPC tools automatically generate the client and data structure classes without requiring any developer effort.

Configuring ALB for gRPC

To make gRPC calls to targets behind an ALB, create a load balancer target group and select gRPC as the protocol version. You can do this through the AWS Management Console, AWS Command Line Interface (CLI), AWS CloudFormation, or AWS Cloud Development Kit (CDK).

Screenshot of the AWS Management Console, showing how to configure a load balancer's target group for gRPC communication.

This CDK code creates a gRPC target group:

var targetGroup = new ApplicationTargetGroup(this, "TargetGroup", new ApplicationTargetGroupProps
{
    Protocol = ApplicationProtocol.HTTPS,
    ProtocolVersion = ApplicationProtocolVersion.GRPC,
    Vpc = vpc,
    Targets = new IApplicationLoadBalancerTarget {...}
});

gRPC requests work with target groups utilizing HTTP/2, but the gRPC protocol enables additional features including health checks, request count metrics, access logs that differentiate gRPC requests, and gRPC-specific response headers. gRPC also works with native ALB features like stickiness, multiple load balancing algorithms, and TLS termination.

Deploy the Tutorial

The sample provisions AWS resources via the AWS Cloud Development Kit (CDK). The CDK code is provided in C# so that .NET developers can use a familiar language.

The solution deployment steps include:

Configuring a domain name in Route 53.
Deploying the microservices.
Running the mobile app on AWS Device Farm.

The source code is available on GitHub.

Prerequisites

For this tutorial, you should have these prerequisites:

Sign up for an AWS account.
Complete the AWS CDK Getting Started guide.
Install the AWS CLI and set up your AWS credentials for command-line use – or, you can use the AWS Tools for PowerShell and set up your AWS credentials for PowerShell.
Create a public hosted zone in Amazon Route 53 for a domain name that you control. This will be the “parent” domain name for the microservices.
Install Visual Studio 2019.
Clone the GitHub repository to your computer.
Open a terminal (such as Bash) or a PowerShell prompt.

Configure the environment variables needed by the CDK. In the sample commands below, replace AWS_ACCOUNT_ID with your numeric AWS account ID. Replace AWS_REGION with the name of the region where you will deploy the sample, such as us-east-1 or us-west-2.

If you’re using a *nix shell such as Bash, run these commands:

export CDK_DEFAULT_ACCOUNT=AWS_ACCOUNT_ID
export CDK_DEFAULT_REGION=AWS_REGION

If you’re using PowerShell, run these commands:

$Env:CDK_DEFAULT_ACCOUNT="AWS_ACCOUNT_ID"
$Env:CDK_DEFAULT_REGION="AWS_REGION"
Set-DefaultAWSRegion -Region AWS_REGION

Throughout this tutorial, replace RED TEXT with the appropriate value.

Save the directory path where you cloned the GitHub repository. In the sample commands below, replace EXAMPLE_DIRECTORY with this path.

In your terminal or PowerShell, run these commands:

cd EXAMPLE_DIRECTORY/src/ModernTacoShop/Common/cdk
cdk bootstrap --context domain-name=PARENT_DOMAIN_NAME
cdk deploy --context domain-name=PARENT_DOMAIN_NAME

The CDK output includes the name of the S3 bucket that will store deployment packages. Save the name of this bucket. In the sample commands below, replace SHARED_BUCKET_NAME with this name.

Deploy the Track Order microservice

Compile the Track Order microservice for the Arm microarchitecture utilized by AWS Graviton2 processors. The TrackOrder.csproj file includes a target that automatically packages the compiled microservice into a ZIP file. You will upload this ZIP file to S3 for use by CodeDeploy. Next, you will utilize the CDK to deploy the microservice’s AWS infrastructure, and then install the microservice on the EC2 instance via CodeDeploy.

The CDK stack deploys these resources:

An Amazon EC2 Auto Scaling group.
An Application Load Balancer (ALB) using gRPC, targeting the Auto Scaling group and configured with microservice health checks.
A subdomain for the microservice, targeting the ALB.
A DynamoDB table used by the microservice.
CodeDeploy infrastructure to deploy the microservice to the Auto Scaling group.

If you’re using the AWS CLI, run these commands:

cd EXAMPLE_DIRECTORY/src/ModernTacoShop/TrackOrder/src/
dotnet publish --runtime linux-arm64 --self-contained
aws s3 cp ./bin/TrackOrder.zip s3://SHARED_BUCKET_NAME
etag=$(aws s3api head-object --bucket SHARED_BUCKET_NAME \
    --key TrackOrder.zip --query ETag --output text)
cd ../cdk
cdk deploy

The CDK output includes the name of the CodeDeploy deployment group. Use this name to run the next command:

aws deploy create-deployment --application-name ModernTacoShop-TrackOrder \
    --deployment-group-name TRACK_ORDER_DEPLOYMENT_GROUP_NAME \
    --s3-location bucket=SHARED_BUCKET_NAME,bundleType=zip,key=TrackOrder.zip,etag=$etag \
    --file-exists-behavior OVERWRITE

If you’re using PowerShell, run these commands:

cd EXAMPLE_DIRECTORY/src/ModernTacoShop/TrackOrder/src/
dotnet publish --runtime linux-arm64 --self-contained
Write-S3Object -BucketName SHARED_BUCKET_NAME `
    -Key TrackOrder.zip `
    -File ./bin/TrackOrder.zip
Get-S3ObjectMetadata -BucketName SHARED_BUCKET_NAME `
    -Key TrackOrder.zip `
    -Select ETag `
    -OutVariable etag
cd ../cdk
cdk deploy

The CDK output includes the name of the CodeDeploy deployment group. Use this name to run the next command:

New-CDDeployment -ApplicationName ModernTacoShop-TrackOrder `
    -DeploymentGroupName TRACK_ORDER_DEPLOYMENT_GROUP_NAME `
    -S3Location_Bucket SHARED_BUCKET_NAME `
    -S3Location_BundleType zip `
    -S3Location_Key TrackOrder.zip `
    -S3Location_ETag $etag[0] `
    -RevisionType S3 `
    -FileExistsBehavior OVERWRITE

Deploy the Submit Order microservice

The steps to deploy the Submit Order microservice are identical to the Track Order microservice. See that section for details.

If you’re using the AWS CLI, run these commands:

cd EXAMPLE_DIRECTORY/src/ModernTacoShop/SubmitOrder/src/
dotnet publish --runtime linux-arm64 --self-contained
aws s3 cp ./bin/SubmitOrder.zip s3://SHARED_BUCKET_NAME
etag=$(aws s3api head-object --bucket SHARED_BUCKET_NAME \
    --key SubmitOrder.zip --query ETag --output text)
cd ../cdk
cdk deploy

The CDK output includes the name of the CodeDeploy deployment group. Use this name to run the next command:

aws deploy create-deployment --application-name ModernTacoShop-SubmitOrder \
    --deployment-group-name SUBMIT_ORDER_DEPLOYMENT_GROUP_NAME \
    --s3-location bucket=SHARED_BUCKET_NAME,bundleType=zip,key=SubmitOrder.zip,etag=$etag \
    --file-exists-behavior OVERWRITE

If you’re using PowerShell, run these commands:

cd EXAMPLE_DIRECTORY/src/ModernTacoShop/SubmitOrder/src/
dotnet publish --runtime linux-arm64 --self-contained
Write-S3Object -BucketName SHARED_BUCKET_NAME `
    -Key SubmitOrder.zip `
    -File ./bin/SubmitOrder.zip
Get-S3ObjectMetadata -BucketName SHARED_BUCKET_NAME `
    -Key SubmitOrder.zip `
    -Select ETag `
    -OutVariable etag
cd ../cdk
cdk deploy

The CDK output includes the name of the CodeDeploy deployment group. Use this name to run the next command:

New-CDDeployment -ApplicationName ModernTacoShop-SubmitOrder `
    -DeploymentGroupName SUBMIT_ORDER_DEPLOYMENT_GROUP_NAME `
    -S3Location_Bucket SHARED_BUCKET_NAME `
    -S3Location_BundleType zip `
    -S3Location_Key SubmitOrder.zip `
    -S3Location_ETag $etag[0] `
    -RevisionType S3 `
    -FileExistsBehavior OVERWRITE

Data flow diagram

Architecture diagram showing the complete data flow of the sample gRPC microservices application. — The app submits an order via gRPC.

The Submit Order ALB routes the gRPC call to an instance.

The Submit Order instance stores order data.

The Submit Order instance calls the Track Order service via gRPC.

The Track Order ALB routes the gRPC call to an instance.

The Track Order instance stores tracking data.

The app calls the Track Order service, which streams the order’s location during delivery.

Test the microservices

Once the CodeDeploy deployments have completed, test both microservices.

First, check the load balancers’ status. Go to Target Groups in the AWS Management Console, which will list one target group for each microservice. Click each target group, then click “Targets” in the lower details pane. Every EC2 instance in the target group should have a “healthy” status.

Next, verify each microservice via gRPCurl. This tool lets you invoke gRPC services from the command line. Install gRPCurl using the instructions, and then test each microservice:

grpcurl submit-order.PARENT_DOMAIN_NAME:443 modern_taco_shop.SubmitOrder/HealthCheck
grpcurl track-order.PARENT_DOMAIN_NAME:443 modern_taco_shop.TrackOrder/HealthCheck

If a service is healthy, it will return an empty JSON object.

Run the mobile app

You will run a pre-compiled version of the app on AWS Device Farm, which lets you test on a real device without managing any infrastructure. Alternatively, compile your own version via the AndroidApp.FrontEnd project within the solution located at EXAMPLE_DIRECTORY/src/ModernTacoShop/AndroidApp/AndroidApp.sln.

Go to Device Farm in the AWS Management Console. Under “Mobile device testing projects”, click “Create a new project”. Enter “ModernTacoShop” as the project name, and click “Create Project”. In the ModernTacoShop project, click the “Remote access” tab, then click “Start a new session”. Under “Choose a device”, select the Google Pixel 3a running OS version 10, and click “Confirm and start session”.

Screenshot of the AWS Device Farm showing a Google Pixel 3a.

Once the session begins, click “Upload” in the “Install applications” section. Unzip and upload the APK file located at EXAMPLE_DIRECTORY/src/ModernTacoShop/AndroidApp/com.example.modern_tacos.grpc_tacos.apk.zip, or upload an APK that you created.

Screenshot of the gRPC microservices demo Android app, showing the map that displays streaming location data.

Screenshot of the gRPC microservices demo Android app, on the order preparation screen.

Once the app has uploaded, drag up from the bottom of the device screen in order to reach the “All apps” screen. Click the ModernTacos app to launch it.

Once the app launches, enter the parent domain name in the “Domain Name” field. Click the “+” and “-“ buttons next to each type of taco in order to create your order, then click “Submit Order”. The order status will initially display as “Preparing”, and will switch to “InTransit” after about 30 seconds. The Track Order service will stream a random route to the app, updating with new position data every 5 seconds. After approximately 2 minutes, the order status will change to “Delivered” and the streaming updates will stop.

Once you’ve run a successful test, click “Stop session” in the console.

Cleaning up

To avoid incurring charges, use the cdk destroy command to delete the stacks in the reverse order that you deployed them.

You can also delete the resources via CloudFormation in the AWS Management Console.

In addition to deleting the stacks, you must delete the Route 53 hosted zone and the Device Farm project.

Conclusion

This post demonstrated multiple next-generation technologies for microservices, including end-to-end HTTP/2 and gRPC communication over Application Load Balancer, AWS Graviton2 processors, and .NET 5. These technologies enable builders to create microservices applications with new levels of performance and efficiency.

Matt Cline

Matt Cline is a Solutions Architect at Amazon Web Services, supporting customers in his home city of Pittsburgh PA. With a background as a full-stack developer and architect, Matt is passionate about helping customers deliver top-quality applications on AWS. Outside of work, Matt builds (and occasionally finishes) scale models and enjoys running a tabletop role-playing game for his friends.

Ulili Nhaga

Ulili Nhaga is a Cloud Application Architect at Amazon Web Services in San Diego, California. He helps customers modernize, architect, and build highly scalable cloud-native applications on AWS. Outside of work, Ulili loves playing soccer, cycling, Brazilian BBQ, and enjoying time on the beach.

AWS Config RDK: Deploying the Custom Rules using the Terraform

2021-08-21 Madhu Sarma

Post Syndicated from Madhu Sarma original https://aws.amazon.com/blogs/devops/aws-config-rdk-deploying-the-custom-rules-using-the-terraform/

To help customers using the Terraform for multi-cloud infrastructure deployment, we have introduced a new feature in the AWS Config Rule Development Kit (RDK) that allows you to export custom AWS Config rules to Terraform files so that you can deploy the RDK rules with Terraform.

This blog post is a complement to the previous post – How to develop custom AWS Config rules using the Rule Development Kit. Here I will show you how to prototype, develop, and deploy custom AWS Config rules. The steps for prototyping and developing the custom AWS Config rules remain identical, while a variation exists in the deployment step, which I’ll walk you through in detail. I would encourage you to review the previous blog post, so that you can follow along here.

In this post, you will learn how to export the custom AWS Config rule to Terraform files and deploy to AWS using the Terraform.

Background

RDK doesn’t support the Terraform for rules deployment, which is impacting customers using the Terraform (“Infrastructure As Code”) to provision AWS infrastructure. Therefore, we have provided one more option to deploy the rules by using the Terraform.

Getting Started

The first step is making sure that you installed the latest RDK version. After you have defined an AWS Config rule and prototyped using the AWS Config RDK as described in the previous blog post, follow the steps below to deploy the various AWS Config components across the compliance and satellite accounts.

Prerequisites

Validate that you downloaded the RDK that supports “export”, using the command “rdk export -h”, and you should see the below output. If the installed RDK doesn’t support the export feature, then update it by using the command “pip install rdk”

(venv) 8c85902e4110:7RDK test$ rdk export -h 
 
usage: rdk export [-h] [-s RULESETS] [--all] [--lambda-layers LAMBDA_LAYERS]  
                  [--lambda-subnets LAMBDA_SUBNETS]  
                  [--lambda-security-groups LAMBDA_SECURITY_GROUPS]  
                  [--lambda-role-arn LAMBDA_ROLE_ARN]  
                  [--rdklib-layer-arn RDKLIB_LAYER_ARN] -v {0.11,0.12} -f  
                  {terraform}  
                  [<rulename> [<rulename> ...]]  
  
Used to export the Config Rule to terraform file.  
  
positional arguments:  
  <rulename>            Rule name(s) to export to a file.  
  
optional arguments:  
  -h, --help            show this help message and exit  
  -s RULESETS, --rulesets RULESETS  
                        comma-delimited list of RuleSet names  
  --all, -a             All rules in the working directory will be deployed.  
  --lambda-layers LAMBDA_LAYERS  
                        [optional] Comma-separated list of Lambda Layer ARNs  
                        to deploy with your Lambda function(s).  
  --lambda-subnets LAMBDA_SUBNETS  
                        [optional] Comma-separated list of Subnets to deploy  
                        your Lambda function(s).  
  --lambda-security-groups LAMBDA_SECURITY_GROUPS  
                        [optional] Comma-separated list of Security Groups to  
                        deploy with your Lambda function(s).  
  --lambda-role-arn LAMBDA_ROLE_ARN  
                        [optional] Assign existing iam role to lambda  
                        functions. If omitted, new lambda role will be  
                        created.  
  --rdklib-layer-arn RDKLIB_LAYER_ARN  
                        [optional] Lambda Layer ARN that contains the desired  
                        rdklib. Note that Lambda Layers are region-specific.  
  -v {0.11,0.12}, --version {0.11,0.12}  
                        Terraform version  
  -f {terraform}, --format {terraform}  
                        Export Format

Create your rule

Create your rule by using the command below which creates the MY_FIRST_RULE rule.

7RDK test$ rdk create MY_FIRST_RULE  --runtime python3.6 --resource-types AWS::EC2::SecurityGroup  
Running create!  
Local Rule files created.

This creates the three files below. Edit the “MY_FIRST_RULE.py” as per your business requirement, as described in the “Edit” section of this blog.

7RDK test$ cd MY_FIRST_RULE/ 
(venv) 8c85902e4110:MY_FIRST_RULE test$ls 
MY_FIRST_RULE.py        MY_FIRST_RULE_test.py   parameters.json

Export your rule to Terraform

Use the command below to export your rule to the Terraform files, which supports the two versions of Terraform (0.11 and 0.12). Use the “-v” argument to specify the version.

test$ cd ..  
7RDK test$ rdk export MY_FIRST_RULE -f terraform -v 0.12  
Running export  
Found Custom Rule.  
Zipping MY_FIRST_RULE  
Zipping complete.  
terraform version: 0.12  
Export completed.This will generate three .tf files.  
7RDK test$

This creates the four files.

<< rule-name >>_rule.tf :
- This script uploads the rule to the Amazon S3 bucket, deploys the lambda, and creates the AWS config rule and the required IAM roles/policies.
<< rule-name >>_variables.tf: Terraform variable definitions.
<< rule-name >>.tfvars.json: Terraform variable values.
<< rule-name >>.zip: Compiled rule code.

7RDK test$ cd MY_FIRST_RULE/  
(venv) 8c85902e4110:MY_FIRST_RULE test$ ls -1  
MY_FIRST_RULE.py  
MY_FIRST_RULE.zip  
MY_FIRST_RULE_test.py  
my_first_rule.tfvars.json  
my_first_rule_rule.tf  
my_first_rule_variables.tf  
parameters.json

Deploy your rule using the Terraform

Initialize the Terraform by using “terraform init” to download the AWS provider Plug-In.

MY_FIRST_RULE test$ terraform init  
  
Initializing the backend...  
  
Initializing provider plugins...  
- Checking for available provider plugins...  
- Downloading plugin for provider "aws" (hashicorp/aws) 2.70.0...  
  
The following providers do not have any version constraints in configuration,  
so the latest version was installed.  
  
To prevent automatic upgrades to new major versions that may contain breaking  
changes, it is recommended to add version = "..." constraints to the  
corresponding provider blocks in configuration, with the constraint strings  
suggested below.  
  
* provider.aws: version = "~> 2.70"  
  
Terraform has been successfully initialized!

To deploy the config rules, your role should have the permissions and should mention the role ARN in my_rule.tfvars.json

To apply the Terraform, it requires two arguments:

var-file: Terraform script variable file name, created while exporting the rule using RDK.
source_bucket: Your Amazon S3 bucket name, to upload the config rule lambda code.

Make sure that AWS provider is configured for your Terraform environment as mentioned in the docs.

MY_FIRST_RULE test$ terraform apply -var-file=my_first_rule.tfvars.json --var source_bucket=config-bucket-xxxxx  
  
aws_iam_policy.awsconfig_policy[0]: Creating...  
aws_iam_role.awsconfig[0]: Creating...  
aws_s3_bucket_object.rule_code: Creating...  
aws_iam_role.awsconfig[0]: Creation complete after 3s [id=my_first_rule-awsconfig-role]  
aws_iam_role_policy_attachment.readonly-role-policy-attach[0]: Creating...  
aws_iam_policy.awsconfig_policy[0]: Creation complete after 4s [id=arn:aws:iam::xxxxxxxxxxxx:policy/my_first_rule-awsconfig-policy]  
aws_iam_role_policy_attachment.awsconfig_policy_attach[0]: Creating...  
aws_s3_bucket_object.rule_code: Creation complete after 5s [id=MY_FIRST_RULE.zip]  
aws_lambda_function.rdk_rule: Creating...  
aws_iam_role_policy_attachment.readonly-role-policy-attach[0]: Creation complete after 2s [id=my_first_rule-awsconfig-role-20200726023315892200000001]  
aws_iam_role_policy_attachment.awsconfig_policy_attach[0]: Creation complete after 3s [id=my_first_rule-awsconfig-role-20200726023317242000000002]  
aws_lambda_function.rdk_rule: Still creating... [10s elapsed]  
aws_lambda_function.rdk_rule: Creation complete after 18s [id=RDK-Rule-Function-MY_FIRST_RULE]  
aws_lambda_permission.lambda_invoke: Creating...  
aws_config_config_rule.event_triggered[0]: Creating...  
aws_lambda_permission.lambda_invoke: Creation complete after 2s [id=AllowExecutionFromConfig]  
aws_config_config_rule.event_triggered[0]: Creation complete after 4s [id=MY_FIRST_RULE]  
  
Apply complete! Resources: 8 added, 0 changed, 0 destroyed.

Clean up

Enter the following command to remove all the resources.

MY_FIRST_RULE test$ terraform destroy

Conclusion

With this new feature, you can export the AWS config rules developed by RDK to the Terraform, and integrate these files into your Terraform CI/CD pipeline to provision the config rules in AWS without using the RDK.

How to authenticate private container registries using AWS Batch

2021-08-21 Ben Peven

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/how-to-authenticate-private-container-registries-using-aws-batch/

This post was contributed by Clayton Thomas, Solutions Architect, AWS WW Public Sector SLG Govtech.

Many AWS Batch users choose to store and consume their AWS Batch job container images on AWS using Amazon Elastic Container Registries (ECR). AWS Batch and Amazon Elastic Container Service (ECS) natively support pulling from Amazon ECR without any extra steps required. For those users that choose to store their container images on other container registries or Docker Hub, often times they are not publicly exposed and require authentication to pull these images. Third-party repositories may throttle the number of requests, which impedes the ability to run workloads and self-managed repositories require heavy tuning to offer the scale that Amazon ECS provides. This makes Amazon ECS the preferred solution to run workloads on AWS Batch.

While Amazon ECS allows you to configure repositoryCredentials in task definitions containing private registry credentials, AWS Batch does not expose this option in AWS Batch job definitions. AWS Batch does not provide the ability to use private registries by default but you can allow that by configuring the Amazon ECS agent in a few steps.

This post shows how to configure an AWS Batch EC2 compute environment and the Amazon ECS agent to pull your private container images from private container registries. This gives you the flexibility to use your own private and public container registries with AWS Batch.

Overview

The solution uses AWS Secrets Manager to securely store your private container registry credentials, which are retrieved on startup of the AWS Batch compute environment. This ensures that your credentials are securely managed and accessed using IAM roles and are not persisted or stored in AWS Batch job definitions or EC2 user data. The Amazon ECS agent is then configured upon startup to pull these credentials from AWS Secrets Manager. Note that this solution only supports Amazon EC2 based AWS Batch compute environments, thus AWS Fargate cannot use this solution.

Figure 1: High-level diagram showing event flow

AWS Batch uses an Amazon EC2 Compute Environment powered by Amazon ECS. This compute environment uses a custom EC2 Launch Template to configure the Amazon ECS agent to include credentials for pulling images from private registries.
An EC2 User Data script is run upon EC2 instance startup that retrieves registry credentials from AWS Secrets Manager. The EC2 instance authenticates with AWS Secrets Manager using its configured IAM instance profile, which grants temporary IAM credentials.
AWS Batch jobs can be submitted using private images that require authentication with configured credentials.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
An Amazon Virtual Private Cloud with private and public subnets. If you do not have a VPC, this tutorial can be followed. The AWS Batch compute environment must have connectivity to the container registry.
A container registry containing a private image. This example uses Docker Hub and assumes you have created a private repository
Registry credentials and/or an access token to authenticate with the container registry or Docker Hub.
A VPC Security Group allowing the AWS Batch compute environment egress connectivity to the container registry.

A CloudFormation template is provided to simplify setting up this example. The CloudFormation template and provided EC2 user data script can be viewed here on GitHub.

The CloudFormation template will create the following resources:

Necessary IAM roles for AWS Batch
AWS Secrets Manager secret containing container registry credentials
AWS Batch managed compute environment and job queue
EC2 Launch Configuration with user data script

Click the Launch Stack button to get started:

Launch the CloudFormation stack

After clicking the Launch stack button above, click Next to be presented with the following screen:

Figure 2: CloudFormation stack parameters

Fill in the required parameters as follows:

Stack Name: Give your stack a unique name.
Password: Your container registry password or Docker Hub access token. Note that both user name and password are masked and will not appear in any CF logs or output. Additionally, they are securely stored in an AWS Secrets Manager secret created by CloudFormation.
RegistryUrl: If not using Docker Hub, specify the URL of the private container registry.
User name: Your container registry user name.
SecurityGroupIDs: Select your previously created security group to assign to the example Batch compute environment.
SubnetIDs: To assign to the example Batch compute environment, select one or more VPC subnet IDs.

After entering these parameters, you can click through next twice and create the stack, which will take a few minutes to complete. Note that you must acknowledge that the template creates IAM resources on the review page before submitting.

Finally, you will be presented with a list of created AWS resources once the stack deployment finishes as shown in Figure 3 if you would like to dig deeper.

Figure 3: CloudFormation created resources

User data script contained within launch template

AWS Batch allows you to customize the compute environment in a variety of ways such as specifying an EC2 key pair, custom AMI, or an EC2 user data script. This is done by specifying an EC2 launch template before creating the Batch compute environment. For more information on Batch launch template support, see here.

Let’s take a closer look at how the Amazon ECS agent is configured upon compute environment startup to use your registry credentials.

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

packages:
- jq
- aws-cli

runcmd:
- /usr/bin/aws configure set region $(curl http://169.254.169.254/latest/meta-data/placement/region)
- export SECRET_STRING=$(/usr/bin/aws secretsmanager get-secret-value --secret-id your_secrets_manager_secret_id | jq -r '.SecretString')
- export USERNAME=$(echo $SECRET_STRING | jq -r '.username')
- export PASSWORD=$(echo $SECRET_STRING | jq -r '.password')
- export REGISTRY_URL=$(echo $SECRET_STRING | jq -r '.registry_url')
- echo $PASSWORD | docker login --username $USERNAME --password-stdin $REGISTRY_URL
- export AUTH=$(cat ~/.docker/config.json | jq -c .auths)
- echo 'ECS_ENGINE_AUTH_TYPE=dockercfg' >> /etc/ecs/ecs.config
- echo "ECS_ENGINE_AUTH_DATA=$AUTH" >> /etc/ecs/ecs.config

--==MYBOUNDARY==--

This example script uses and installs a few tools including the AWS CLI and the open-source tool jq to retrieve and parse the previously created Secrets Manager secret. These packages are installed using the cloud-config user data type, which is part of the cloud-init packages functionality. If using the provided CloudFormation template, this script will be dynamically rendered to reference the created secret, but note that you must specify the correct Secrets Manager secret id if not using the template.

After performing a Docker login, the generated Auth JSON object is captured and passed to the Amazon ECS agent configuration to be used on AWS Batch jobs that require private images. For an explanation of Amazon ECS agent configuration options including available Amazon ECS engine Auth types, see here. This example script can be extended or customized to fit your needs but must adhere to requirements for Batch launch template user data scripts, including being in MIME multi-part archive format.

It’s worth noting that the AWS CLI automatically grabs temporary IAM credentials from the associated IAM instance profile the CloudFormation stack created in order to retrieve the Secret Manager secret values. This example assumes you created the AWS Secrets Manager secret with the default AWS managed KMS key for Secrets Manager. However, if you choose to encrypt your secret with a customer managed KMS key, make sure to specify kms:Decrypt IAM permissions for the Batch compute environment IAM role.

Submitting the AWS Batch job

Now let’s try an example Batch job that uses a private container image by creating a Batch job definition and submitting a Batch job:

Open the AWS Batch console
Navigate to the Job Definition page
Click create
Provide a unique Name for the job definition
Select the EC2 platform
Specify your private container image located in the Image field
Click create

Figure 4: Batch job definition

Now you can submit an AWS Batch job that uses this job definition:

Click on the Jobs page
Click Submit New Job
Provide a Name for the job
Select the previously created job definition
Select the Batch Job Queue created by the CloudFormation stack
Click Submit

Figure 5: Submitting a new Batch job

After submitting the AWS Batch job, it will take a few minutes for the AWS Batch Compute Environment to create resources for scheduling the job. Once that is done, you should see a SUCCEEDED status by viewing the job and filtering by AWS Batch job queue shown in Figure 6.

Figure 6: AWS Batch job succeeded

Cleaning up

To clean up the example resources, click delete for the created CloudFormation stack in the CloudFormation Console.

Conclusion

In this blog, you deployed a customized AWS Batch managed compute environment that was configured to allow pulling private container images in a secure manner. As I’ve shown, AWS Batch gives you the flexibility to use both private and public container registries. I encourage you to continue to explore the many options available natively on AWS for hosting and pulling container images. Amazon ECR or the recently launched Amazon ECR public repositories (for a deeper dive, see this blog announcement) both provide a seamless experience for container workloads running on AWS.