Tag Archives: Uncategorized

Разпространението на програмите на БНТ

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/11/08/bnt-22/

Емил Кошлуков, генерален директор на БНТ,  в интервю за БНР:

Обмисляме сваляне на БНТ2 и БНТ3 от мултиплекса.

След като преди три години   прехвърли програмите  от едната на другата мрежа, сега обществената телевизия обмисля само БНТ1 да се разпространява наземно цифрово.

Законовата основа, която трябва да се взема предвид при тези решения –

Чл. 44. (1) (Изм. – ДВ, бр. 14 от 2009 г.) Разпространението на радио- и телевизионните програми на БНР и БНТ се извършва чрез електронни съобщителни мрежи и/или съоръжения за наземно аналогово радиоразпръскване, които са собственост на БНР и БНТ, или въз основа на договор с предприятие, предоставящо електронни съобщителни услуги.
(2) (Изм. – ДВ, бр. 14 от 2009 г.) Държавата взема необходимите мерки за гарантиране разпространението на програмите на БНР и БНТ на цялата територия на страната при осъществяване на политиката в областта на електронните съобщения.
(3) (Нова – ДВ, бр. 41 от 2007 г.) Българската национална телевизия и Българското национално радио осигуряват излъчването на националните си програми посредством спътник/спътници върху покритието на териториите на Европа и други континенти, на които има граждани с български произход по данни на Агенцията за българите в чужбина и чрез собствени изследвания.
(4) (Нова – ДВ, бр. 41 от 2007 г.) Средствата за изпълнение на дейностите по ал. 1 се осигуряват от държавния бюджет.
(5) (Нова – ДВ, бр. 41 от 2007 г.) Българската национална телевизия и Българското национално радио предоставят безвъзмездно националните и регионалните си програми на предприятия, осъществяващи електронни съобщения чрез кабелни електронни съобщителни мрежи за разпространение на радио- и телевизионни програми, както и за спътниково и наземно цифрово радиоразпръскване.

Разпространението е втори въпрос –  първи въпрос за обществените медии е съдържанието и услугите, които  предоставят. БНТ2 търси облика си, БНТ3 – за нея аз мога само да се позова  – през Еко  – на Сирано:   – mon panache: на шапката ми славна — разветите пера. 

Не е ясно обмисля ли се защита на начините на предоставяне на съдържанието – като се вземат предвид новите потребителски модели –  от години можем да следим аргументацията/обосновката/ защитата на новите услуги на БиБиСи, но не и на БНТ.

 

Building Your First Journey in Amazon Pinpoint

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/building-your-first-journey-in-amazon-pinpoint/

Note: This post was written by Zach Barbitta, the Product Lead for Pinpoint Journeys, and Josh Kahn, an AWS Solution Architect.


We recently added a new feature called journeys to Amazon Pinpoint. With journeys, you can create fully automated, multi-step customer engagements through an easy to use, drag-and-drop interface.

In this post, we’ll provide a quick overview of the features and capabilities of Amazon Pinpoint journeys. We’ll also provide a blueprint that you can use to build your first journey.

What is a journey?

Think of a journey as being similar to a flowchart. Every journey begins by adding a segment of participants to it. These participants proceed through steps in the journey, which we call activities. Sometimes, participants split into different branches. Some participants end their journeys early, while some proceed through several steps. The experience of going down one path in the journey can be very different from the experience of going down a different path. In a journey, you determine the outcomes that are most relevant for your customers, and then create the structure that leads to those outcomes.

Your journey can contain several different types of activities, each of which does something different when journey participants arrive on it. For example, when participants arrive on an Email activity, they receive an email. When they land on a Random Split activity, they’re randomly separated into one of up to five different groups. You can also separate customers based on their attributes, or based on their interactions with previous steps in the journey, by using a Yes/no Split or a Multivariate Split activity. There are six different types of activities that you can add to your journeys. You can find more information about these activity types in our User Guide.

The web-based Amazon Pinpoint management console includes an easy-to-use, drag-and-drop interface for creating your journeys. You can also create journeys programmatically by using the Amazon Pinpoint API, the AWS CLI, or an AWS SDK. Best of all, you can use the API to modify journeys that you created in the console, and vice-versa.

Planning our journey

The flexible nature of journeys makes it easy to create customer experiences that are timely, personalized, and relevant. In this post, we’ll assume the role of a music streaming service. We’ll create a journey that takes our customers through the following steps:

  1. When customers sign up, we’ll send them an email that highlights some of the cool features they’ll get by using our service.
  2. After 1 week, we’ll divide customers into two separate groups based whether or not they opened the email we sent them when they signed up.
  3. To the group of customers who opened the first message, we’ll send an email that contains additional tips and tricks.
  4. To the group who didn’t open the first message, we’ll send an email that reiterates the basic information that we mentioned in the first message.

The best part about journeys in Amazon Pinpoint is that you can set them up in just a few minutes, without having to do any coding, and without any specialized training (unless you count reading this blog post as training).

After we launch this journey, we’ll be able to view metrics through the same interface that we used to create the journey. For example, for each split activity, we’ll be able to see how many participants were sent down each path. For each Email activity, we’ll be able to view the total number of sends, deliveries, opens, clicks, bounces, and unsubscribes. We’ll also be able to view these metrics as aggregated totals for the entire journey. These metrics can help you discover what worked and what didn’t work in your journey, so that you can build more effective journeys in the future.

First steps

There are a few prerequisites that we have to complete before we can start creating the journey.

Create Endpoints

When a new user signs up for your service, you have to create an endpoint record for that user. One way to do this is by using AWS Amplify (if you use Amplify’s built-in analytics tools, Amplify creates Amazon Pinpoint endpoints). You can also use AWS Lambda to call Amazon Pinpoint’s UpdateEndpoint API. The exact method that you use to create these endpoints is up to your engineering team.

When your app creates new endpoints, they should include a user attribute that identifies them as someone who should participate in the journey. We’ll use this attribute to create our segment of journey participants.

If you just want to get started without engaging your engineering team, you can import a segment of endpoints to use in your journey. To learn more, see Importing Segments in the Amazon Pinpoint User Guide.

Verify an Email Address

In Amazon Pinpoint, you have to verify the email addresses that you use to send email. By verifying an email address, you prove that you own it, and you grant Amazon Pinpoint permission to send your email from that address. For more information, see Verifying Email Identities in the Amazon Pinpoint User Guide.

Create a Template

Before you can send email from a journey, you have to create email templates. Email templates are pre-defined message bodies that you can use to send email messages to your customers. You can learn more about creating email templates in the Amazon Pinpoint User Guide.

To create the journey that we discussed earlier, we’ll need at least three email templates: one that contains the basic information that we want to share with new customers, one with more advanced content for users who opened the first email, and another that reiterates the content in the first message.

Create a Segment

Finally, you need to create the segment of users who will participate in the journey. If your service creates an endpoint record that includes a specific user attribute, you can use this attribute to create your segment. You can learn more about creating segments in the Amazon Pinpoint User Guide.

Building the Journey

Now that we’ve completed the prerequisites, we can start building our journey. We’ll break the process down into a few quick steps.

Set it up

Start by signing in to the Amazon Pinpoint console. In the navigation pane on the right side of the page, choose Journeys, and then choose Create a journey. In the upper left corner of the page, you’ll see an area where you can enter a name for the journey. Give the journey a name that helps you identify it.

Add participants

Every journey begins with a Journey entry activity. In this activity, you define the segment that will participate in the journey. On the Journey entry activity, choose Set entry condition. In the list, choose the segment that contains the participants for the journey.

On this activity, you can also control how often new participants are added to the journey. Because new customers could sign up for our service at any time, we want to update the segment regularly. For this example, we’ll set up the journey to look for new segment members once every hour.

Send the initial email

Now we can configure the first message that we’ll send in our journey. Under the Journey entry activity that you just configured, choose Add activity, and then choose Send email. Choose the email template that you want to use in the email, and then specify the email address that you want to send the email from.

Add the split

Under the Email activity, choose Add activity, and then choose Yes/no split. This type of activity looks for all journey participants who meet a condition and sends them all down a path (the “Yes” path). All participants who don’t meet that condition are sent down a “No” path. This is similar to the Multivariate split activity. The difference between the two is that the Multivariate split can produce up to four branches, each with its own criteria, plus an “Else” branch for everyone who doesn’t meet the criteria in the other branches. A Yes/no split activity, on the other hand, can only ever have two branches, “Yes” and “No”. However, unlike the Multivariate split activity, the “Yes” branch in a Yes/no split activity can contain more than one matching criteria.

In this case, we’ll set up the “Yes” branch to look for email click events in the Email activity that occurred earlier in the journey. We also know that the majority of recipients who open the email aren’t going to do so the instant they receive the email. For this reason, we’ll adjust the Condition evaluation setting for the activity to wait for 7 days before looking for open events. This gives our customers plenty of time to receive the message, check their email, open the message, and, if they’re interested in the content of the message, click a link in the message. When you finish setting up the split activity, it should resemble the example shown in the following image.

Continue building the branches

From here, you can continue building the branches of the journey. In each branch, you’ll send a different email template based on the needs of the participants in that branch. Participants in the “Yes” branch will receive an email template that contains more advanced information, while those in the “No” branch will receive a template that revisits the content from the original message.

To learn more about setting up the various types of journey activities, see Setting Up Journey Activities in the Amazon Pinpoint User Guide.

Review the journey

When you finish adding branches and activities to your journey, choose the Review button in the top right corner of the page. When you do, the Review your journey pane opens on the left side of the screen. The Review your journey pane shows you errors that need to be fixed in your journey. It also gives you some helpful recommendations, and provides some best practices that you can use to optimize your journey. You can click any of the notifications in the Review your journey page to move immediately to the journey activity that the message applies to. When you finish addressing the errors and reviewing the recommendations, you can mark the journey as reviewed.

Testing and publishing the journey

After you review the journey, you have a couple options. You can either launch the journey immediately, or test the journey before you launch it.

If you want to test the journey, close the Review your journey pane. Then, on the Actions menu, choose Test. When you test your journey, you have to choose a segment of testers. The segment that you choose should only contain the internal recipients on your team who you want to test the journey before you publish it. You also have the option of changing the wait times in your journey, or skipping them altogether. If you choose a wait time of 1 hour here, it makes it much easier to test the Yes/no split activity (which, in the production version of the journey, contains a 7 day wait). To learn more about testing your journey, including some tips and best practices, see Testing Your Journey in the Amazon Pinpoint User Guide.

When you finish your testing, complete the review process again. On the final page of content in the Review your journey pane, choose Publish. After a short delay, your journey goes live. That’s it! You’ve created and launched your very first journey!

Wrapping up

We’re very excited about Pinpoint Journeys and look forward to seeing what you build with it. If you have questions, leave us a message on the Amazon Pinpoint Developer Forum and check out the Amazon Pinpoint documentation.

Orchestrate big data workflows with Apache Airflow, Genie, and Amazon EMR: Part 2

Post Syndicated from Francisco Oliveira original https://aws.amazon.com/blogs/big-data/orchestrate-big-data-workflows-with-apache-airflow-genie-and-amazon-emr-part-2/

Large enterprises running big data ETL workflows on AWS operate at a scale that services many internal end-users and runs thousands of concurrent pipelines. This, together with a continuous need to update and extend the big data platform to keep up with new frameworks and the latest releases of big data processing frameworks, requires an efficient architecture and organizational structure that both simplifies management of the big data platform and promotes easy access to big data applications.

In Part 1 of this post series, you learned how to use Apache Airflow, Genie, and Amazon EMR to manage big data workflows.

This post guides you through deploying the AWS CloudFormation templates, configuring Genie, and running an example workflow authored in Apache Airflow.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Solution overview

This solution uses an AWS CloudFormation template to create the necessary resources.

Users access the Apache Airflow Web UI and the Genie Web UI via SSH tunnel to the bastion host.

The Apache Airflow deployment uses Amazon ElastiCache for Redis as a Celery backend, Amazon EFS as a mount point to store DAGs, and Amazon RDS PostgreSQL for database services.

Genie uses Apache Zookeeper for leader election, an Amazon S3 bucket to store configurations (binaries, application dependencies, cluster metadata), and Amazon RDS PostgreSQL for database services. Genie submits jobs to an Amazon EMR cluster.

The architecture in this post is for demo purposes. In a production environment, the Apache Airflow and the Genie instances should be part of an Auto Scaling Group. For more information, see Deployment on the Genie Reference Guide.

The following diagram illustrates the solution architecture.

Creating and storing admin passwords in AWS Systems Manager Parameter Store

This solution uses AWS Systems Manager Parameter Store to store the passwords used in the configuration scripts. With AWS Systems Manager Parameter Store, you can create secure string parameters, which are parameters that have a plaintext parameter name and an encrypted parameter value. Parameter Store uses AWS KMS to encrypt and decrypt the parameter values of secure string parameters.

Before deploying the AWS CloudFormation templates, execute the following AWS CLI commands. These commands create AWS Systems Manager Parameter Store parameters to store the passwords for the RDS master user, the Airflow DB administrator, and the Genie DB administrator.

aws ssm put-parameter --name "/GenieStack/RDS/Settings" --type SecureString --value "ch4ng1ng-s3cr3t" --region Your-AWS-Region

aws ssm put-parameter --name "/GenieStack/RDS/AirflowSettings" --type SecureString --value "ch4ng1ng-s3cr3t" --region Your-AWS-Region

aws ssm put-parameter --name "/GenieStack/RDS/GenieSettings" --type SecureString --value "ch4ng1ng-s3cr3t" --region Your-AWS-Region

Creating an Amazon S3 Bucket for the solution and uploading the solution artifacts to S3

This solution uses Amazon S3 to store all artifacts used in the solution. Before deploying the AWS CloudFormation templates, create an Amazon S3 bucket and download the artifacts required by the solution from this link.

Unzip the artifacts required by the solution and upload the airflow and genie directories to the Amazon S3 bucket you just created. Keep a record of the Amazon S3 root path because you add it as a parameter to the AWS CloudFormation template later.

As an example, the following screenshot uses the root location geniestackbucket.

Use the value of the Amazon S3 Bucket you created for the AWS CloudFormation parameters GenieS3BucketLocation and AirflowBucketLocation.

Deploying the AWS CloudFormation stack

To launch the entire solution, choose Launch Stack.

The following table explains the parameters that the template requires. You can accept the default values for any parameters not in the table. For the full list of parameters, see the AWS CloudFormation template.

ParameterValue
Location of the configuration artifactsGenieS3BucketLocationThe S3 bucket with Genie artifacts and Genie’s installation scripts. For example: geniestackbucket.
AirflowBucketLocationThe S3 bucket with the Airflow artifacts. For example: geniestackbucket.
NetworkingSSHLocationThe IP address range to SSH to the Genie, Apache Zookeeper, and Apache Airflow EC2 instances.
SecurityBastionKeyNameAn existing EC2 key pair to enable SSH access to the bastion host instance.
AirflowKeyNameAn existing EC2 key pair to enable SSH access to the Apache Airflow instance.
ZKKeyNameAn existing EC2 key pair to enable SSH access to the Apache Zookeeper instance.
GenieKeyNameAn existing EC2 key pair to enable SSH access to the Genie.
EMRKeyNameAn existing Amazon EC2 key pair to enable SSH access to the Amazon EMR cluster.
LoggingemrLogUriThe S3 location to store Amazon EMR cluster Logs. For example: s3://replace-with-your-bucket-name/emrlogs/

Post-deployment steps

To access the Apache Airflow and Genie Web Interfaces, set up an SSH and configure a SOCKS proxy for your browser. Complete the following steps:

  1. On the AWS CloudFormation console, choose the stack you created.
  2. Choose the Outputs
  3. Find the public DNS of the bastion host instance.The following screenshot shows the instance this post uses.
  4. Set up an SSH tunnel to the master node using dynamic port forwarding.
    Instead of using the master public DNS name of your cluster and the username hadoop, which the walkthrough references, use the public DNS of the bastion host instance and replace the user hadoop for the user ec2-user.
  1. Configure the proxy settings to view websites hosted on the master node.
    You do not need to modify any of the steps in the walkthrough.

This process configures a SOCKS proxy management tool that allows you to automatically filter URLs based on text patterns and limit the proxy settings to domains that match the form of the Amazon EC2 instance’s public DNS name.

Accessing the Web UI for Apache Airflow and Genie

To access the Web UI for Apache Airflow and Genie, complete the following steps:

  1. On the CloudFormation console, choose the stack you created.
  2. Choose the Outputs
  3. Find the URLs for the Apache Airflow and Genie Web UI.The following screenshot shows the URLs that this post uses.
  1. Open two tabs in your web browser. You will use the tabs for the Apache Airflow UI and the Genie UI.
  2. For the Foxy Proxy you configured previously, click the icon Foxy Proxy added to the top right section of your browser and choose Use proxies based on their predefined patterns and priorities.The following screenshot shows the proxy options.
  1. Enter the URL for the Apache Airflow Web UI and for the Genie Web UI on their respective tabs.

You are now ready to run a workflow in this solution.

Preparing application resources

The first step as a platform admin engineer is to prepare the binaries and configurations of the big data applications that the platform supports. In this post, the Amazon EMR clusters use release 5.26.0. Because Amazon EMR release 5.26.0 has Hadoop 2.8.5 and Spark 2.4.3 installed, those are the applications you want to support in the big data platform. If you decide to use a different EMR release, prepare binaries and configurations for those versions. The following sections guide you through the steps to prepare binaries should you wish to use a different EMR release version.

To prepare a Genie application resource, create a YAML file with fields that are sent to Genie in a request to create an application resource.

This file defines metadata information about the application, such as the application name, type, version, tags, the location on S3 of the setup script, and the location of the application binaries. For more information, see Create an Application in the Genie REST API Guide.

Tag structure for application resources

This post uses the following tags for application resources:

  • type – The application type, such as Spark, Hadoop, Hive, Sqoop, or Presto.
  • version – The version of the application, such as 2.8.5 for Hadoop.

The next section shows how the tags are defined in the YAML file for an application resource. You can add an arbitrary number of tags to associate with Genie resources. Genie also maintains their own tags in addition to the ones the platform admins define, which you can see in the field ID and field name of the files.

Preparing the Hadoop 2.8.5 application resource

This post provides an automated creation of the YAML file. The following code shows the resulting file details:

id: hadoop-2.8.5
name: hadoop
user: hadoop
status: ACTIVE
description: Hadoop 2.8.5 Application
setupFile: s3://Your_Bucket_Name/genie/applications/hadoop-2.8.5/setup.sh
configs: []
version: 2.8.5
type: hadoop
tags:
  - type:hadoop
  - version:2.8.5
dependencies:
  - s3://Your_Bucket_Name/genie/applications/hadoop-2.8.5/hadoop-2.8.5.tar.gz

The file is also available directly at s3://Your_Bucket_Name/genie/applications/hadoop-2.8.5/hadoop-2.8.5.yml.

NOTE: The following steps are for reference only, should you be completing this manually, rather than using the automation option provided.

The S3 objects referenced by the setupFile and dependencies labels are available in your S3 bucket. For your reference, the steps to prepare the artifacts used by properties setupFile and dependencies are as follows:

  1. Download hadoop-2.8.5.tar.gz from https://www.apache.org/dist/hadoop/core/hadoop-2.8.5/.
  2. Upload hadoop-2.8.5.tar.gz to s3://Your_Bucket_Name/genie/applications/hadoop-2.8.5/.

Preparing the Spark 2.4.3 application resource

This post provides an automated creation of the YAML file. The following code shows the resulting file details:

id: spark-2.4.3
name: spark
user: hadoop
status: ACTIVE
description: Spark 2.4.3 Application
setupFile: s3://Your_Bucket_Name/genie/applications/spark-2.4.3/setup.sh
configs: []
version: 2.4.3
type: spark
tags:
  - type:spark
  - version:2.4.3
  - version:2.4
dependencies:
  - s3://Your_Bucket_Name/genie/applications/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

The file is also available directly at s3://Your_Bucket_Name/genie/applications/spark-2.4.3/spark-2.4.3.yml.

NOTE: The following steps are for reference only, should you be completing this manually, rather than using the automation option provided.

The objects in setupFile and dependencies are available in your S3 bucket. For your reference, the steps to prepare the artifacts used by properties setupFile and dependencies are as follows:

  1. Download spark-2.4.3-bin-hadoop2.7.tgz from https://archive.apache.org/dist/spark/spark-2.4.3/ .
  2. Upload spark-2.4.3-bin-hadoop2.7.tgz to s3://Your_Bucket_Name/genie/applications/spark-2.4.3/ .

Because spark-2.4.3-bin-hadoop2.7.tgz uses Hadoop 2.7 and not Hadoop 2.8.3, you need to extract the EMRFS libraries for Hadoop 2.7 from an EMR cluster running Hadoop 2.7 (release 5.11.3). This is already available in your S3 Bucket. For reference, the steps to extract the EMRFS libraries are as follows:

  1. Deploy an EMR cluster with release 5.11.3.
  2. Run the following command:
aws s3 cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-2.20.0.jar s3://Your_Bucket_Name/genie/applications/spark-2.4.3/hadoop-2.7/aws/emr/emrfs/lib/

Preparing a command resource

The next step as a platform admin engineer is to prepare the Genie commands that the platform supports.

In this post, the workflows use Apache Spark. This section shows the steps to prepare a command resource of type Apache Spark.

To prepare a Genie command resource, create a YAML file with fields that are sent to Genie in a request to create a command resource.

This file defines metadata information about the command, such as the command name, type, version, tags, the location on S3 of the setup script, and the parameters to use during command execution. For more information, see Create a Command in the Genie REST API Guide.

Tag structure for command resources

This post uses the following tag structure for command resources:

  • type – The command type, for example, spark-submit.
  • version – The version of the command, for example, 2.4.3 for Spark.

The next section shows how the tags are defined in the YAML file for a command resource. Genie also maintains their own tags in addition to the ones the platform admins define, which you can see in the field ID and field name of the files.

Preparing the spark-submit command resource

This post provides an automated creation of the YAML file. The following code shows the resulting file details:

id: spark-2.4.3_spark-submit
name: Spark Submit 
user: hadoop 
description: Spark Submit Command 
status: ACTIVE 
setupFile: s3://Your_Bucket_Name/genie/commands/spark-2.4.3_spark-submit/setup.sh
configs: [] 
executable: ${SPARK_HOME}/bin/spark-submit --master yarn --deploy-mode client 
version: 2.4.3 
tags:
  - type:spark-submit
  - version:2.4.3
checkDelay: 5000

The file is also available at s3://Your_Bucket_Name/genie/commands/spark-2.4.3_spark-submit/spark-2.4.3_spark-submit.yml.

The objects in setupFile are available in your S3 bucket.

Preparing cluster resources

This post also automated the step to prepare cluster resources; it follows a similar process as described previously but applied to cluster resources.

During the startup of the Amazon EMR cluster, a custom script creates a YAML file with the metadata details about the cluster and uploads the file to S3. For more information, see Create a Cluster in the Genie REST API Guide.

The script also extracts all Amazon EMR libraries and uploads them to S3. The next section discusses the process of registering clusters with Genie.

The script is available at s3://Your_Bucket_Name/genie/scripts/genie_register_cluster.sh.

Tag structure for cluster resources

This post uses the following tag structure for cluster resources:

  • cluster.release – The Amazon EMR release name. For example, emr-5.26.0.
  • cluster.id – The Amazon EMR cluster ID. For example, j-xxxxxxxx.
  • cluster.name – The Amazon EMR cluster name.
  • cluster.role – The role associated with this cluster. For this post, the role is batch. Other possible roles would be ad hoc or Presto, for example.

You can add new tags for a cluster resource or change the values of existing tags by editing s3://Your_Bucket_Name/genie/scripts/genie_register_cluster.sh.

You could also use other combinations of tags, such as a tag to identify the application lifecycle environment or required custom jars.

Genie also maintains their own tags in addition to the ones the platform admins define, which you can see in the field ID and field name of the files. If multiple clusters share the same tag, by default, Genie distributes jobs across clusters associated with the same tag randomly. For more information, see Cluster Load Balancing in the Genie Reference Guide.

Registering resources with Genie

Up to this point, all the configuration activities mentioned in the previous sections were already prepared for you.

The following sections show how to register resources with Genie. In this section you will be connecting to the bastion via SSH to run configuration commands.

Registering application resources

To register the application resources you prepared in the previous section, SSH into the bastion host and run the following command:

python /tmp/genie_assets/scripts/genie_register_application_resources.py Replace-With-Your-Bucket-Name Your-AWS-Region http://replace-with-your-genie-server-url:8080

To see the resource information, navigate to the Genie Web UI and choose the Applications tab. See the following screenshot. The screenshot shows two application resources, one for Apache Spark (version 2.4.3) and the other for Apache Hadoop (version 2.8.5).

Registering commands and associate commands with applications

The next step is to register the Genie command resources with specific applications. For this post, because spark-submit depends on Apache Hadoop and Apache Spark, associate the spark-submit command with both applications.

The order you define for the applications in file genie_register_command_resources_and_associate_applications.py is important. Because Apache Spark depends on Apache Hadoop, the file first references Apache Hadoop and then Apache Spark. See the following code:

commands = [{'command_name' : 'spark-2.4.3_spark-submit', 'applications' : ['hadoop-2.8.5', 'spark-2.4.3']}]

To register the command resources and associate them with the application resources registered in the previous step, SSH into the bastion host and run the following command:

python /tmp/genie_assets/scripts/genie_register_command_resources_and_associate_applications.py Replace-With-Your-Bucket-Name Your-AWS-Region http://replace-with-your-genie-server-url:8080

To see the command you registered plus the applications it is linked to, navigate to the Genie Web UI and choose the Commands tab.

The following screenshot shows the command details and the applications it is linked to.

Registering Amazon EMR clusters

As previously mentioned, the Amazon EMR cluster deployed in this solution registers the cluster when the cluster starts via an Amazon EMR step. You can access the script that Amazon EMR clusters use at s3://Your_Bucket_Name/genie/scripts/genie_register_cluster.sh. The script also automates deregistering the cluster from Genie when the cluster terminates.

In the Genie Web UI, choose the Clusters tab. This page shows you the current cluster resources. You can also find the location of the configuration files that uploaded to the cluster S3 location during the registration step.

The following screenshot shows the cluster details and the location of configuration files (yarn-site.xml, core-site.xml, mapred-site.xml).

Linking commands to clusters

You have now registered all applications, commands, and clusters, and associated commands with the applications on which they depend. The final step is to link a command to a specific Amazon EMR cluster that is configured to run it.

Complete the following steps:

  1. SSH into the bastion host.
  2. Open /tmp/genie_assets/scripts/genie_link_commands_to_clusters.py with your preferred text editor.
  3. Look for the following lines in the code:# Change cluster_name below
    clusters = [{'cluster_name' : 'j-xxxxxxxx', 'commands' :
    ['spark-2.4.3_spark-submit']}]
  1. Replace j-xxxxxxxx in the file with the cluster_name.
    To see the name of the cluster, navigate to the Genie Web UI and choose Clusters.
  2. To link the command to a specific Amazon EMR cluster run the following command:
    python /tmp/genie_assets/scripts/genie_link_commands_to_clusters.py Replace-With-Your-Bucket-Name Your-AWS-Region http://replace-with-your-genie-server-url:8080

The command is now linked to a cluster.

In the Genie Web UI, choose the Commands tab. This page shows you the current command resources. Select spark-2.4.3_spark_submit and see the clusters associated with the command.

The following screenshot shows the command details and the clusters it is linked to.

You have configured Genie with all resources; it can now receive job requests.

Running an Apache Airflow workflow

It is out of the scope of this post to provide a detailed description of the workflow code and dataset. This section provides details of how Apache Airflow submits jobs to Genie via a GenieOperator that this post provides.

The GenieOperator for Apache Airflow

The GenieOperator allows the data engineer to define the combination of tags to identify the commands and the clusters in which the tasks should run.

In the following code example, the cluster tag is ‘emr.cluster.role:batch‘ and the command tags are ‘type:spark-submit‘ and ‘version:2.4.3‘.

spark_transform_to_parquet_movies = GenieOperator(
    task_id='transform_to_parquet_movies',
    cluster_tags=['emr.cluster.role:batch'],
    command_tags=['type:spark-submit', 'version:2.4.3'],
    command_arguments="transform_to_parquet.py s3://{}/airflow/demo/input/csv/{}  s3://{}/demo/output/parquet/{}/".format(bucket,'movies.csv',bucket,'movies'), dependencies="s3://{}/airflow/dag_artifacts/transforms/transform_to_parquet.py".format(bucket),
    description='Demo Spark Submit Job',
    job_name="Genie Demo Spark Submit Job",
    tags=['transform_to_parquet_movies'],
    xcom_vars=dict(),
    retries=3,
    dag=extraction_dag)

The property command_arguments defines the arguments to the spark-submit command, and dependencies defines the location of the code for the Apache Spark Application (PySpark).

You can find the code for the GenieOperator in the following location: s3://Your_Bucket_Name/airflow/plugins/genie_plugin.py.

One of the arguments to the DAG is the Genie connection ID (genie_conn_id). This connection was created during the automated setup of the Apache Airflow Instance. To see this and other existing connections, complete the following steps:

  1. In the Apache Airflow Web UI, choose the Admin
  2. Choose Connections.

The following screenshot shows the connection details.

The Airflow variable s3_location_genie_demo reference in the DAG was set during the installation process. To see all configured Apache Airflow variables, complete the following steps:

  1. In the Apache Airflow Web UI, choose the Admin
  2. Choose Variables.

The following screenshot shows the Variables page.

Triggering the workflow

You can now trigger the execution of the movie_lens_transfomer_to_parquet DAG. Complete the following steps:

  1. In the Apache Airflow Web UI, choose the DAGs
  2. Next to your DAG, change Off to On.

The following screenshot shows the DAGs page.

For this example DAG, this post uses a small subset of the movielens dataset. This dataset is a popular open source dataset, which you can use in exploring data science algorithms. Each dataset file is a comma-separated values (CSV) file with a single header row. All files are available in your solution S3 bucket under s3://Your_Bucket_Name/airflow/demo/input/csv .

movie_lens_transfomer_to_parquet is a simple workflow that triggers a Spark job that converts input files from CSV to Parquet.

The following screenshot shows a graphical representation of the DAG.

In this example DAG, after transform_to_parquet_movies concludes, you can potentially execute four tasks in parallel. Because the DAG concurrency is set to 3, as seen in the following code example, only three tasks can run at the same time.

# Initialize the DAG
# Concurrency --> Number of tasks allowed to run concurrently
extraction_dag = DAG(dag_name,
          default_args=dag_default_args,
          start_date=start_date,
          schedule_interval=schedule_interval,
          concurrency=3,
          max_active_runs=1
          )

Visiting the Genie job UI

The GenieOperator for Apache Airflow submitted the jobs to Genie. To see job details, in the Genie Web UI, choose the Jobs tab. You can see details such as the jobs submitted, their arguments, the cluster it is running, and the job status.

The following screenshot shows the Jobs page.

You can now experiment with this architecture by provisioning a new Amazon EMR cluster, registering it with a new value (for example, “production”) for Genie tag “emr.cluster.role”, linking the cluster to a command resource, and updating the tag combination in the GenieOperator used by some of the tasks in the DAG.

Cleaning up

To avoid incurring future charges, delete the resources and the S3 bucket created for this post.

Conclusion

This post showed how to deploy an AWS CloudFormation template that sets up a demo environment for Genie, Apache Airflow, and Amazon EMR. It also demonstrated how to configure Genie and use the GenieOperator for Apache Airflow.

 


About the Authors

Francisco Oliveira is a senior big data solutions architect with AWS. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.

 

 

 

Jelez Raditchkov is a practice manager with AWS.

 

 

 

 

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

AWS re:Invent 2019 security guide: sessions, workshops, and chalk talks

Post Syndicated from Shllomi Ezra original https://aws.amazon.com/blogs/security/aws-reinvent-2019-security-guide-sessions-workshops-and-chalk-talks/

With re:Invent 2019 just weeks away, the excitement is building and we’re looking forward to seeing you all soon! If you’re attending re:Invent with the goal of improving your organization’s cloud security operations, here are some highlights from the re:Invent 2019 session catalog. Reserved seating is now open, so get your seats in advance for your favorite sessions.

Getting started

These sessions cover the basics, including conceptual overviews and demos for AWS Security services, AWS Identity, and more.

  • The fundamentals of AWS cloud security (SEC205-R)

    By the end of this session led by Becky Weiss, you will know the fundamental patterns that you can apply to secure any workload you run in AWS with confidence. It covers the basics of network security, the process of reading and writing access management policies, and data encryption.

  • Threat management in the cloud: Amazon GuardDuty and AWS Security Hub (SEC206-R)

    Amazon GuardDuty and AWS Security Hub in tandem provide continuous visibility, compliance, and detection of threats for AWS accounts and workloads.

  • Getting started with AWS Identity (SEC209-R)

    The number, range, and breadth of AWS services are large, but the set of techniques that you, as a builder in the cloud, will use to secure them is not. Your cloud journey starts with this breakout session, in which we get you up to speed quickly on the practical fundamentals to do identity and authorization right in AWS.

Inspiration

  • Leadership session: AWS Security (SEC201-L)

    Stephen Schmidt, Chief Information Security Officer for AWS, addresses the current state of security in the cloud, with a focus on feature updates, the AWS internal “secret sauce,” and what’s to come in terms of security, identity, and compliance tooling.

  • Provable access control: Know who can access your AWS resources (SEC343-R)

    In this session, we discuss the evolution of automated reasoning technology at AWS and how it works in the services in which it is embedded, including Amazon Simple Storage Service (Amazon S3), AWS Config, and Amazon Macie.

  • Amazon’s approach to failing successfully (DOP208-R)

    In this session, we cover Amazon’s favorite techniques for defining and reviewing metrics — watching the systems before they fail — as well as how to do an effective postmortem that drives both learning and meaningful improvement.

  • Speculation & leakage: Timing side channels & multi-tenant computing (SEC355)

    In January 2018, the world learned about Spectre and Meltdown, a new class of issues that affects virtually all modern CPUs via nearly imperceptible changes to their micro-architectural states and can result in full access to physical RAM or leaking of state between threads, processes, or guests. In this session, Eric Brandwine examines one of these side-channel attacks in detail and explore the implications for multi-tenant computing. He discusses AWS design decisions and what AWS does to protect your instances, containers, and function invocations.

  • Security benefits of the Nitro architecture (SEC408-R)

    Hear Mark Ryland speak about how the Nitro computers carefully control the workload computer access, providing a layer of protection. Learn about the security properties of this powerful architecture, which significantly increases cloud reliability and performance.

Threat detection and response

  • Continuous security monitoring and threat detection with AWS (SEC321-R)

    In this session, we talk about a number of AWS services involved in threat detection and remediation and we walk through some real-world threat scenarios. You get answers to your questions about threat detection on AWS and learn about the threat-detection capabilities of Amazon GuardDuty, Amazon Macie, AWS Config, and the available remediation options.

  • Threat detection with Amazon GuardDuty (SEC353-R)

    Amazon GuardDuty is a threat detection system that is purpose-built for the cloud. Once enabled, GuardDuty immediately starts analyzing continuous streams of account and network activity in near real time and at scale. You don’t have to deploy or manage any additional security software, sensors, or network appliances. Threat intelligence is pre-integrated into the service and is continuously updated and maintained. In this session, we introduce you to GuardDuty, walk you through the detection of an event, and discuss the various ways you can react and remediate.

  • Mitigate risks using cloud-native security (SEC216-R)

    Whether you are migrating existing workloads or creating something new on AWS, it can be tempting to bring your current security solutions with you. In this hands-on builders session, we help you identify which cloud-native solutions can mitigate your existing risks while providing scalability, reliability, and cost optimization at a low operational burden. During this session, learn how to use cloud-native controls such as those found in AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC) security groups, and Amazon GuardDuty to secure your cloud architecture.

  • Monitoring anomalous application behavior (NFX205)

    In this talk, Travis McPeak of Netflix and Will Bengtson introduce a system built strictly with off-the-shelf AWS components that tracks AWS CloudTrail activity across multi-account environments and sends alerts when applications perform anomalous actions.

  • Workshop

    • Automating threat detection and response in AWS (SEC301-R)

      This workshop provides the opportunity for you to get familiar with AWS security services and learn how to use them to identify and remediate threats in your environment. Learn how to use Amazon GuardDuty, Amazon Macie, Amazon Inspector, and AWS Security Hub to investigate threats during and after an attack, set up a notification and response pipeline, and add additional protections to improve your environment’s security posture.

Advanced topics in threat detection and response

  • Actionable threat hunting in AWS (SEC339)

    Learn how WarnerMedia leveraged Amazon GuardDuty, AWS CloudTrail, and its own serverless inventory tool (Antiope) to root out cloud vulnerabilities, insecure behavior, and potential account compromise activities across a large number of accounts.

  • How to prepare for & respond to security incidents in your AWS environment (SEC356)

    In this session, Paul Hawkins and Nathan Case walk through what you need to do to be prepared to respond to security incidents in your AWS environments.

  • DIY guide to runbooks, incident reports, and incident response (SEC318-R)

    In this session, we explore the cost of incidents and consider creative ways to look at future threats.

  • A defense-in-depth approach to building web applications (SEC407-R)

    In this session, learn about common security issues, including those described in the Open Web Application Security Project (OWASP) Top 10. Also learn how to build a layered defense using multi-layered perimeter security and development best practices.

Identity

  • Failing successfully: The AWS approach to resilient design (ARC303-R)

    AWS global infrastructure provides the tools customers need to design resilient and reliable services. In this session, we explore how to get the most out of these tools.

  • Access control confidence: Grant the right access to the right things (SEC316-R)

    Hear Brigid Johnson explain that, as your organization builds on AWS, granting developers and applications the right access to the right resources at the right time for the right actions is critical to security.

Advanced topics in AWS Identity

  • Access management in 4D (SEC405-R)

    Listen to Quint Van Deman demonstrate patterns that allow you to implement advanced access-management workflows such as two-person rule, just-in-time privilege elevation, real-time adaptive permissions, and more using advanced combinations of AWS Identity services.

Data protection

  • Using AWS KMS for data protection, access control, and audit (SEC340-R)

    This session focuses on how customers are using AWS Key Management Service (AWS KMS) to raise the bar for security and compliance with their workloads.

Compliance

  • Use AWS Security Hub to act on your compliance and security posture (SEC342)

    Join us for this chalk talk where we discuss how to continuously assess and act on your AWS security and compliance issues using AWS Security Hub.

  • Workshop

    • Compliance automation: Set it up fast, then code it your way (SEC304-R)

      In this workshop, learn how to detect common resource misconfigurations using AWS Security Hub.

Best practices

  • AWS Well-Architected: Best practices for securing workloads (SEC202-R1)

    Security best practices help you secure your workloads in the cloud to meet organizational, legal, and compliance requirements. In this chalk talk, Ben Potter will guide you through core security best practices aligned with the AWS Well-Architected Framework.

  • Architecting security & governance across your landing zone (SEC325-R)

    In this session, Sam Elmalak discusses updates to multi-account strategy best practices for establishing your landing zone.

  • Best practices for your full-stack security practice (GPSTEC307)

    At AWS, security is our top priority. In this chalk talk, discover proven techniques and key learnings to elevate your ability to identify, protect against, detect, respond to, and recover from security events. We’ll leverage industry frameworks, reference architectures, the latest AWS services and features.

  • Artificial Intelligence & Machine Learning (AIM337-R)

    Join us for this chalk talk as we dive into the many features of Amazon SageMaker that enable customers to build highly secure data science environments and support stringent security requirements.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

author

Shllomi Ezra

Shllomi Ezra is on the AWS Business Development team for AWS Security services. He cares about making customers’ journeys to the cloud more enjoyable. In his spare time, Shllomi loves to run, travel, and have fun with his family and friends.

Deploying a highly available WordPress site on Amazon Lightsail, Part 2: Using Amazon S3 with WordPress to securely deliver media files

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/compute/deploying-a-highly-available-wordpress-site-on-amazon-lightsail-part-2-using-amazon-s3-with-wordpress-to-securely-deliver-media-files/

This post is contributed by Mike Coleman | Developer Advocate for Lightsail | Twitter: @mikegcoleman

This is the second post in a series focusing on how to build a highly available WordPress site on Amazon Lightsail. The first post, Implementing a highly available Lightsail database, explained how to deploy a WordPress instance with a standalone MySQL database on Lightsail. The post also discussed how both the database and web server’s file system store WordPress data.

This post shows you how WordPress stores shared media files (such as pictures and videos) on Amazon S3. S3 is a managed storage service that provides an affordable, performant, and secure method for storing a variety of data. This post interfaces between WordPress and S3 with the WP Offload Media Lite plugin from Delicious Brains. The plugin takes any file uploaded to WordPress and copies it over to S3, where your other WordPress instances can access it.

Prerequisites

This post requires that you have deployed your WordPress instance and configured it to work with a standalone MySQL database in Lightsail. For more information, see the first post in this series.

Because this post uses additional AWS services beyond Lightsail, you need an AWS account with sufficient privileges to the rest of AWS.

Solution overview

This solution includes the following steps:

  1. Create an AWS IAM user.
  2. Update the WordPress configuration file with those user credentials.
  3. Install and configure the actual plugin.
  4. Test the solution by uploading an image to WordPress.

Creating an IAM User

To relocate your media files to S3, WordPress needs credentials proving it has the correct permissions. You create those credentials by adding a new IAM user and assigning that user a role that includes the necessary S3 permissions. After creating the credentials, add them to your WordPress configuration file.

  1. Open the AWS Management Console.
  2. Navigate to the IAM Console.
  3. From the Dashboard, choose Users.

The following screenshot shows the Dashboard pane.

Users selection from the IM menu in WordPress

 

 

4. At the top of the page, choose Add user.

The following screenshot shows the Add user button.

Add User button highlighted

 

 

 

 

5. In Set user details, for User name, enter a name.

This post uses the name wp-s3-user.

6. For Access type, select Programmatic access.

The following screenshot shows the Set user details section.

Set user details

7. Choose Next: Permissions. 

8. Choose Attach existing policy directly.

9. In the Filter polices field, enter S3.

10. From the search results, select AmazonS3FullAccess.

11. From the search results, select AmazonS3FullAccess.

The following screenshot shows the Set permissions section.Set permissions.

  1. Choose Next: Tags.
  2. Choose Next: Review.
  3. Choose Create user.

The page displays the credentials you need to configure the WP Offload Media Lite plugin.

  1. Under Secret access key, select Show.
  2. Save the Access Key ID and Secret Access Key for reference.

To download a CSV file with the key information, choose the Download .csv file button.

If you navigate away from this screen, you can’t obtain the credentials again and need to create a new user. Be sure to either download the CSV or record both values in another document. Treat these credentials the same way you’d treat any sensitive username/password pair.

Connecting to your WordPress instance

Connect to your WordPress instance by using your SSH client or the web-based SSH client in the Lightsail console.

  1. In the terminal prompt for your WordPress instance, set two environment variables (ACCESS_KEY and SECRET_KEY) that contain the credentials for your IAM user.

To set the environment variables, substitute the values for your IAM user’s access key and secret key into the following lines and enter each command one at a time at the terminal prompt:

ACCESS_KEY=AKIAIOSFODNN7EXAMPLE

SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

The next step is to update your WordPress configuration file (wp-config) with the credentials.

  1. Enter the following command:

cat <<EOT >> credfile.txt
define( 'AS3CF_SETTINGS', serialize( array (
  'provider' => 'aws',
  'access-key-id' => '$ACCESS_KEY',
  'secret-access-key' => '$SECRET_KEY',
) ) );
EOT

This command creates a file with the credentials to insert into the configuration file.

  1. Enter the following sed command:

sed -i "/define( 'WP_DEBUG', false );/r credfile.txt" \
/home/bitnami/apps/wordpress/htdocs/wp-config.php

This command inserts the temporary file you created into the WordPress configuration file.

It changes the permissions on the WordPress configuration file.

  1. To reset the permissions, enter the following code:

sudo chown bitnami:daemon /home/bitnami/apps/wordpress/htdocs/wp-config.php

  1. Restart the services on the instance by entering the following command:

sudo /opt/bitnami/ctlscript.sh restart

After the services restart, your WordPress instance is configured to use the IAM credentials and you are ready to configure the WP Offload Media Lite plugin.

Installing and configuring the plugin

Now that the configuration file holds your credentials, you can move on to installing and configuring the plugin.

The next step requires you to log in to the WordPress dashboard, and to do that you need the Bitnami application password for your WordPress site. It’s stored at /home/bitnami/bitnami_application_password.

  1. Enter the following cat command in the terminal to display the password value:

cat /home/bitnami/bitnami_application_password

  1. Log in to the administrator control panel of your WordPress site.

You can access the WordPress login screen at http://SiteIpAddress/wp-admin, where SiteIpAddress is the IP address of your WordPress instance, which you can find on the card for your instance in the Lightsail console.

The following screenshot shows the location of the IP address.

Site IP Address inside the Lightsail Console

For example, the instance in the preceding screenshot has the IP address 192.0.2.0; you can access the login screen using http://192.0.2.0/wp-admin.

  1. For login credentials, the default user name is user.
  2. For the password, use the Bitnami application password you recorded previously.

After signing in, install and configure the WP Offload Media Lite plugin by following these steps:

  1. Under Plugins, choose Add New.

The following screenshot shows the navigation pane.

Navigation pain in WordPress

  1. On the Add New pane, in the Keyword field, enter WP media offload.

The following screenshot shows the Keyword search bar.

Keyword Searchbar in WordPress

  1. The results display the WP Offload Media Lite plugin.
  2. Choose Install Now.

The following screenshot shows the selected plugin.

WP media offload plugin

After a few seconds, the Install Now button changes to Activate.

  1. Choose Activate.

The main WordPress plugins screen opens.

  1. Scroll down to WP Offload Media Lite.
  2. Choose Settings.

The following screenshot shows the selected plugin.

  1. From the Settings page, choose Create new bucket.

An S3 bucket is a unique container that stores all your WordPress files.

The following screenshot shows the Media Library options for your plugin.

Media Library options for your plugin

  1. For Region, select the Region that matches the Region of your WordPress instance.

This post uses the Region US West (Oregon). The following screenshot shows the selected Region.

AWS Region in S3

The following screenshot from the Lightsail console shows your displayed Region.

Region in Lightsail console

  1. Enter a name for your S3 bucket.
  2. Choose Create New Bucket.

Your WordPress website is now configured to upload files to the S3 bucket managed by the WP Offload Media Lite plugin.

Troubleshooting

If you received an error indicating that access to your S3 was denied, check the public permissions for S3 by following these steps:

  1. Open the S3 console.
  2. From the menu, choose Block public access (account settings).
  3. Choose Edit.
  4. Clear Block all public access.
  5. Choose Save.
  6. In the text box, enter confirm.
  7. Choose Confirm.

This setting ensures that none of your buckets are publicly accessible. If you clear the setting, make sure to verify that permissions are set correctly on your buckets.

  1. Return to the WordPress console, and refresh the Offload Media Lite page.

You should no longer see the error.

  1. Choose Save Changes.

Testing the plugin

To confirm that the plugin is working correctly, upload a new media file and verify it’s being served from the S3 bucket.

  1. From the WordPress console, choose Media, Add new.

The following screenshot shows the WordPress console menu.

  1. Upload a file by either dragging and dropping a file into the field, or choosing Select and selecting a file from your local machine.
  2. After the file uploads, choose Edit.

The following screenshot shows the Edit option on the plugin thumbnail.

In the Edit Media pane, the File URL should point to your S3 bucket. See the following screenshot.

WordPress console menu

  1. Copy that URL into a browser and confirm your file loads correctly.

Conclusion

In this post you learned how to install and configure the Media Offload Lite plug-in. Your media files are now centrally served out of S3. You are ready to finalize the look and feel of your site and scale out the front end. A subsequent post in this series explores that process.

Optimizing deep learning on P3 and P3dn with EFA

Post Syndicated from whiteemm original https://aws.amazon.com/blogs/compute/optimizing-deep-learning-on-p3-and-p3dn-with-efa/

This post is written by Rashika Kheria, Software Engineer, Purna Sanyal, Senior Solutions Architect, Strategic Account and James Jeun, Sr. Product Manager

The Amazon EC2 P3dn.24xlarge instance is the latest addition to the Amazon EC2 P3 instance family, with upgrades to several components. This high-end size of the P3 family allows users to scale out to multiple nodes for distributed workloads more efficiently.  With these improvements to the instance, you can complete training jobs in a shorter amount of time and iterate on your Machine Learning (ML) models faster.

 

This blog reviews the significant upgrades with p3dn.24xlarge, walks you through deployment, and shows an example ML use case for these upgrades.

 

Overview of P3dn instance upgrades

The most notable upgrade to the p3dn.24xlarge instance is the 100-Gbps network bandwidth and the new EFA network interface that allows for highly scalable internode communication. This means you can scale runs on applications to use thousands of GPUs, which reduces time to get results. EFA’s operating system bypasses networking mechanisms and the underlying Scalable Reliable Protocol that is built in to the Nitro Controllers. The Nitro controllers enable a low-latency, low-jitter channel for inter-instance communication. EFA has been adopted in the mainline Linux and integrated with LibFabric and various distributions. AWS worked with NVIDIA for EFA to support NVIDIA Collective Communication Library (NCCL). NCCL optimizes multi-GPU and multi-node communication primitives and helps achieve high throughput over NVLink interconnects.

 

The following diagram shows the PCIe/NVLink communication topology used by the p3.16xlarge and p3dn.24xlarge instance types.

the PCIe/NVLink communication topology used by the p3.16xlarge and p3dn.24xlarge instance types.

 

The following table summarizes the full set of differences between p3.16xlarge and p3dn.24xlarge.

Featurep3.16xlp3dn.24xl
ProcessorIntel Xeon E5-2686 v4Intel Skylake 8175 (w/ AVX 512)
vCPUs6496
GPU8x 16 GB NVIDIA Tesla V1008x 32 GB NVIDIA Tesla V100
RAM488 GB768 GB
Network25 Gbps ENA100 Gbps ENA + EFA
GPU InterconnectNVLink – 300 GB/sNVLink – 300 GB/s

 

P3dn.24xl offers more networking bandwidth than p3.16xl. Paired with EFA’s communication library, this feature increases scaling efficiencies drastically for large-scale, distributed training jobs. Other improvements include double the GPU memory for large datasets and batch sizes, increased system memory, and more vCPUs. This upgraded instance is the most performant GPU compute option on AWS.

 

The upgrades also improve your workload around distributed deep learning. The GPU memory improvement enables higher intranode batch sizes. The newer Layer-wise Adaptive Rate Scaling (LARS) has been tested with ResNet50 and other deep neural networks (DNNs) to allow for larger batch sizes. The increased batch sizes reduce wall-clock time per epoch with minimal loss of accuracy. Additionally, using 100-Gbps networking with EFA heightens performance with scale. Greater networking performance is beneficial when updating weights for a large number of parameters. You can see high scaling efficiency when running distributed training on GPUs for ResNet50 type models that primarily use images for object recognition. For more information, see Scalable multi-node deep learning training using GPUs in the AWS Cloud.

 

Natural language processing (NLP) also presents large compute requirements for model training. This large compute requirement is especially present with the arrival of large Transformer-based models like BERT and GPT-2, which have up to a billion parameters. The following describes how to set up distributed model trainings with scalability for both image and language-based models, and also notes how the AWS P3 and P3dn instances perform.

 

Optimizing your P3 family

First, optimize your P3 instances with an important environmental update. This update runs traditional TCP-based networking and is in the latest release of NCCL 2.4.8 as of this writing.

 

Two new environmental variables are available, which allow you to take advantage of multiple TCP sockets per thread: NCCL_SOCKET_NTHREADS and NCCL_NSOCKS_PERTHREAD.

 

These environmental variables allow the NCCL backend to exceed the 10-Gbps TCP single stream bandwidth limitation in EC2.

 

Enter the following command:

/opt/openmpi/bin/mpirun -n 16 -N 8 --hostfile hosts -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_NSOCKS_PERTHREAD=4 -x NCCL_SOCKET_NTHREADS=4 --mca btl_tcp_if_exclude lo,docker0 /opt/nccl-tests/build/all_reduce_perf -b 16 -e 8192M -f 2 -g 1 -c 1 -n 100

 

The following graph shows the synthetic NCCL tests and their increased performance with the additional directives.

synthetic NCCL tests and their increased performance with the additional directives

You can achieve a two-fold increase in throughput after a threshold in the synthetic payload size (around 1 MB).

 

 

Deploying P3dn

 

The following steps walk you through spinning up a cluster of p3dn.24xlarge instances in a cluster placement group. This allows you to take advantage of all the new performance features within the P3 instance family. For more information, see Cluster Placement Groups in the Amazon EC2 User Guide.

This post deploys the following stack:

 

  1. On the Amazon EC2 console, create a security group.

 Make sure that both inbound and outbound traffic are open on all ports and protocols within the security group.

 

  1. Modify the user variables in the packer build script so that the variables are compatible with your environment.

The following is the modification code for your variables:

 

{

  "variables": {

    "Region": "us-west-2",

    "flag": "compute",

    "subnet_id": "<subnet-id>",

    "sg_id": "<security_group>",

    "build_ami": "ami-0e434a58221275ed4",

    "iam_role": "<iam_role>",

    "ssh_key_name": "<keyname>",

    "key_path": "/path/to/key.pem"

},

3. Build and Launch the AMI by running the following packer script:

Packer build nvidia-efa-fsx-al2.yml

This entire workflow takes care of setting up EFA, compiling NCCL, and installing the toolchain. After building it, you have an AMI ID that you can launch in the EC2 console. Make sure to enable the EFA.

  1. Launch a second instance in a cluster placement group so you can run two node tests.
  2. Enter the following code to make sure that all components are built correctly:

/opt/nccl-tests/build/all_reduce_perf 

  1. The following output of the commend will confirm that the build is using EFA :

INFO: Function: ofi_init Line: 686: NET/OFI Selected Provider is efa

INFO: Function: main Line: 49: NET/OFI Process rank 8 started. NCCLNet device used on ip-172-0-1-161 is AWS Libfabric.

INFO: Function: main Line: 53: NET/OFI Received 1 network devices

INFO: Function: main Line: 57: NET/OFI Server: Listening on dev 0

INFO: Function: ofi_init Line: 686: NET/OFI Selected Provider is efa

 

Synthetic two-node performance

This blog includes the NCCL-tests GitHub as part of the deployment stack. This shows synthetic benchmarking of the communication layer over NCCL and the EFA network.

When launching the two-node cluster, complete the following steps:

  1. Place the instances in the cluster placement group.
  2. SSH into one of the nodes.
  3. Fill out the hosts file.
  4. Run the two-node test with the following code:

/opt/openmpi/bin/mpirun -n 16 -N 8 --hostfile hosts -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x FI_PROVIDER="efa" -x FI_EFA_TX_MIN_CREDITS=64 -x NCCL_SOCKET_IFNAME=eth0 --mca btl_tcp_if_exclude lo,docker0 /opt/nccl-tests/build/all_reduce_perf -b 16 -e 8192M -f 2 -g 1 -c 1 -n 100

This test makes sure that the node performance works the way it is supposed to.

The following graph compares the NCCL bandwidth performance using -x FI_PROVIDER="efa" vs. -x FI_PROVIDER="tcp“. There is a three-fold increase in bus bandwidth when using EFA.

 

 -x FI_PROVIDER="efa" vs. -x FI_PROVIDER="tcp". There is a three-fold increase in bus bandwidth when using EFA. 

Now that you have run the two node tests, you can move on to a deep learning use case.

FAIRSEQ ML training on a P3dn cluster

Fairseq(-py) is a sequence modeling toolkit that allows you to train custom models for translation, summarization, language modeling, and other text-generation tasks. FAIRSEQ MACHINE TRANSLATION distributed training requires a fast network to support the Allreduce algorithm. Fairseq provides reference implementations of various sequence-to-sequence models, including convolutional neural networks (CNN), long short-term memory (LSTM) networks, and transformer (self-attention) networks.

 

After you receive consistent 10 GB/s bus-bandwidth on the new P3dn instance, you are ready for FAIRSEQ distributed training.

To install fairseq from source and develop locally, complete the following steps:

  1. Copy FAIRSEQ source code to one of the P3dn instance.
  2. Copy FAIRSEQ Training data in the data folder.
  3. Copy FAIRSEQ Test Data in the data folder.

 

git clone https://github.com/pytorch/fairseq

cd fairseq

pip install -- editable . 

Now that you have FAIRSEQ installed, you can run the training model. Complete the following steps:

  1. Run FAIRSEQ Training in 1 node/8 GPU p3dn instance to check the performance and the accuracy of FAIRSEQ operations.
  2. Create a custom AMI.
  3. Build the other 31 instances from the custom AMI.

 

Use the following scripts for distributed All Reduce FAIRSEQ Training :

 

export RANK=$1 # the rank of this process, from 0 to 127 in case of 128 GPUs
export LOCAL_RANK=$2 # the local rank of this process, from 0 to 7 in case of 8 GPUs per mac
export NCCL_DEBUG=INFO
export NCCL_TREE_THRESHOLD=0;
export FI_PROVIDER="efa";

export FI_EFA_TX_MIN_CREDIS=64;
export LD_LIBRARY_PATH=/opt/amazon/efa/lib64/:/home/ec2-user/aws-ofi-nccl/install/lib/:/home/ec2-user/nccl/build/lib:$LD_LIBRARY_PATH;
echo $FI_PROVIDER
echo $LD_LIBRARY_PATH
python train.py data-bin/wmt18_en_de_bpej32k \
   --clip-norm 0.0 -a transformer_vaswani_wmt_en_de_big \
   --lr 0.0005 --source-lang en --target-lang de \
   --label-smoothing 0.1 --upsample-primary 16 \
   --attention-dropout 0.1 --dropout 0.3 --max-tokens 3584 \
   --log-interval 100  --weight-decay 0.0 \
   --criterion label_smoothed_cross_entropy --fp16 \
   --max-update 500000 --seed 3 --save-interval-updates 16000 \
   --share-all-embeddings --optimizer adam --adam-betas '(0.9, 0.98)' \
   --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 \
   --warmup-updates 4000 --min-lr 1e-09 \
   --distributed-port 12597 --distributed-world-size 32 \
   --distributed-init-method 'tcp://172.31.43.34:9218' --distributed-rank $RANK \
   --device-id $LOCAL_RANK \
   --max-epoch 3 \
   --no-progress-bar  --no-save

Now that you have completed and validated your base infrastructure layer, you can add additional components to the stack for various workflows. The following charts show time-to-train improvement factors when scaling out to multiple GPUs for FARSEQ model training.

time-to-train improvement factors when scaling out to multiple GPUs for FARSEQ model training

 

Conclusion

EFA on p3dn.24xlarge allows you to take advantage of additional performance at scale with no change in code. With this updated infrastructure, you can decrease cost and time to results by using more GPUs to scale out and get more done on complex workloads like natural language processing. This blog provides much of the undifferentiated heavy lifting with the DLAMI integrated with EFA. Go power up your ML workloads with EFA!

 

Managing domain membership of dynamic fleet of EC2 instances

Post Syndicated from whiteemm original https://aws.amazon.com/blogs/compute/managing-domain-membership-of-dynamic-fleet-of-ec2-instances/

This post is written by Alex Zarenin, Senior AWS Solution Architect, Microsoft Tech.

1.   Introduction

For most companies, a move of Microsoft workloads to AWS starts with “lift and shift” where existing workloads are moved from the on-premises data centers to the cloud. These workloads may include WEB and API farms, and a fleet of processing nodes, which typically depend on AD Domain membership for access to shared resources, such as file shares and SQL Server databases.

When the farms and set of processing nodes are static, which is typical for on-premises deployments, managing domain membership is simple – new instances join the AD Domain and stay there. When some machines are periodically recycled, respective AD computer accounts are disabled or deleted, and new accounts are added when new machines are added to the domain. However, these changes are slow and can be easily managed.

When these workloads are moved to the cloud, it is natural to set up WEB and API farms as scalability groups to allow for scaling up and scaling down membership to optimize cost while meeting the performance requirements. Similarly, processing nodes could be combined into scalability groups or created on-demand as a set of Spot Instances.

In either case, the fleet becomes very dynamic, and can expand and shrink multiple times to match the load or in response to some events, which makes manual management of AD Domain membership impractical. This scenario requires automated solution for managing domain membership.

2.   Challenges

This automated solution to manage domain membership of dynamic fleet of Amazon EC2 instances should provide for:

  • Seamless AD Domain joining when the new instances join the fleet and it should work both for Managed and native ADs;
  • Automatic unjoining from the AD Domain and removal from AD the respective computer account when the instance is stopped or terminated;
  • Following the best practices for protecting sensitive information – the identity of the account that is used for joining domain or removing computer account from the domain.
  • Extensive logging to facilitate troubleshooting if something does not work as expected.

3.   Solution overview

Joining an AD domain, whether native or managed, could be achieved by placing a PowerShell script that performs domain joining into the User Data section of the EC2 instance launch configuration.

It is much more difficult to implement domain unjoining and deleting computer account from AD upon instance termination as Windows does not support On-Shutdown trigger in the Task Scheduler. However, it is possible to define On-Shutdown script using the local Group Policy.

If defined, the On-Shutdown script runs on EVERY shutdown. However, joining a domain REQUIRES reboot of the machine, so On-Shutdown policy cannot be enabled on the first invocation of the User Data script as it will be removed from the domain by the On-Shutdown script right when it joins the domain. Thus, the User Data script must have some logic to define whether it is the first invocation upon instance launch, or the subsequent one following the domain join reboot. The On-Shutdown policy should be enabled only on the second start-up. This also necessitates to define the User Data script as “persistent” by specifying <persist>true</persist> in the User Data section of the launch configuration.

Both domain join and domain unjoin scripts require security context that allows to perform these operations on the domain, which is usually achieved by providing credentials for a user account with corresponding rights. In the proposed implementation, both scripts obtain account credentials from the AWS Secrets Manager under protection of security policies and roles – no credentials are stored in the scripts.

Both scripts generate detailed log of their operation stored locally on the machine.

In this post, I demonstrate a solution based upon PowerShell script that is scheduled to perform Active Directory domain joining on the instance start-up through the EC2 launch User Data script. I also show removal from the domain with the deletion of respective computer accounts from the domain upon instance shutdown using the script installed in the On-Shutdown policy.

User Data script overall logic:

  1. Initialize Logging
  2. Initialize On-Shutdown Policy
  3. Read fields UserID, Password, and Domain from prod/AD secret
  4. Verify machine’s domain membership
  5. If machine is already a member of the domain, then
    1. Enable On-Shutdown Policy
    2. Install RSAT for AD PowerShell
  6. Otherwise
    1. Create credentials from the secret
    2. Initiate domain join
    3. Request machine restart

On-Shutdown script overall logic:

  1. Initialize Logging
  2. Check cntrl variable; If cntrl variable is not set to value “run”, exit script
  3. Check whether machine is a member of the domain; if not, exit script
  4. Check if the RSAT for AD PowerShell installed; if not installed, exit the script
  5. Read fields UserID, Password, and Domain from prod/AD secret
  6. Create credentials from the secret
  7. Identify domain controller
  8. Remove machine from the domain
  9. Delete machine account from domain controller

Simplified Flow chart of User Data and On-Shutdown Scripts

Now that we have reviewed the overall logic of the scripts, we may examine components of each script in more details.

4.   Routines common to both User Data and On-Shutdown scripts

4.1. Processing configuration variables and parameters

User Data script does not accept parameters and is being executed exactly as being provided in the User Data section of the launch configuration. However, at the beginning of the script two variables are specified that could be easily changed:

[string]$SecretAD  = "prod/AD"

[string]$LogPath   = "C:\ProgramData\Amazon\CustomLaunch"

The first one defines the name of the secret in the Secrets Manager that contains UserID, Password, and Domain. The second variable defines the name of the folder where the log file will be stored.

The On-Shutdown Group Policy invokes corresponding script with a parameter, which is stored in the registry as part of the policy setup. Thus, the first line of the On-Shutdown script defines the variable for this parameter:

param([string]$cntrl = "NotSet")

The following two lines in the On-Shutdown script are the same as in the User Data script; actually, they are generated from the corresponding variables in the User Data script.

4.2. The Logger class

Both scripts use the same Logger class to manage logging:

class Logger

    {

    [string] hidden  $logFileName

    Logger([string] $Path, [string] $LogName)

        {

        if(!(Test-Path -Path $Path -pathType container))

            { $dummy = New-Item -ItemType directory -Path $Path }

        $this.logFileName = [System.IO.Path]::Combine($Path, ($LogName + ".log"))

        }

    [void] WriteLine([string] $msg)

        {

        [string] $logLine = (Get-Date -Format 'hh:mm:ss')

         if ($msg) { Out-File -LiteralPath $this.logFileName -Append -InputObject ($logLine + " " + $msg) }

        # Msg is null - creating new file

        else { Out-File -LiteralPath $this.logFileName -InputObject ($logLine + " Log created")     }

        }

    }

Constructor of the class verifies/creates required folder structure and generates log file name to be used by subsequent calls to WriteLine([string] $msg) method. This method outputs provided message as a separate line in the log, prefixed with the timestamp. The following code example creates a Logger instance and writes a message to the log:

[Logger]$log = [Logger]::new($LogPath, "ADMgmt")

$log.WriteLine("Log Started")

5.   The User Data script

The following are the major components of the User Data script.

5.1. Initializing the On-Shutdown policy

The SDManager class wraps functionality necessary to create On-Shutdown policy. The policy requires certain registry entries as well as a script that will be executed when policy is invoked. This script must be placed in a well-defined folder on the file system.

The SDManager constructor performs the following task:

  • Verifies that the folder C:\Windows\System32\GroupPolicy\Machine\Scripts\Shutdown exists and, if necessary, creates it;
  • Updates On-Shutdown script stored as an array in SDManager with the parameters provided to the constructor, and then saves adjusted script in the proper location;
  • Creates all registry entries required for On-Shutdown policy;
  • Sets the parameter that will be passed to the On-Shutdown script by the policy to a value that would preclude On-Shutdown script from removing machine from the domain.

SDManager exposes two member functions, EnableUnJoin() and DisableUnJoin(), which update parameter passed to On-Shutdown script to enable or disable removing machine from the domain, respectively.

5.2. Reading the “secret”

Using the value of configuration variable $SecretAD, the following code example retrieves the secret value from AWS Secrets Manager and creates PowerShell credential to be used for the operations on the domain. The Domain value from the secret is also used to verify that machine is a member of required domain.

Import-Module AWSPowerShell

try { $SecretObj = (Get-SECSecretValue -SecretId $SecretAD) }

catch

    {

    $log.WriteLine("Could not load secret <" + $SecretAD + "> - terminating execution")

    return

    }

[PSCustomObject]$Secret = ($SecretObj.SecretString  | ConvertFrom-Json)

$log.WriteLine("Domain (from Secret): <" + $Secret.Domain + ">")

To get the secret from AWS Secrets Manager, you must use an AWS-specific cmdlet. To make it available, you must import the AWSPowerShell module.

5.3. Checking for domain membership and enabling On-Shutdown policy

To check for domain membership, we use WMI Win32_ComputerObject. While performing check for domain membership, we also validate that if the machine is a member of the domain, it is the domain specified in the secret.

If machine is already a member of the correct domain, the script proceeds with installing RSAT for AD PowerShell, which is required for the On-Shutdown script. It also enables the On-Shutdown script. The following code example achieves this:

$compSys = Get-WmiObject -Class Win32_ComputerSystem

if ( ($compSys.PartOfDomain) -and ($compSys.Domain -eq $Secret.Domain))

    {

    $log.WriteLine("Already member of: <" + $compSys.Domain + "> - Verifying RSAT Status")

    $RSAT = (Get-WindowsFeature RSAT-AD-PowerShell)

    if ($RSAT -eq $null)

        {

        $log.WriteLine("<RSAT-AD-PowerShell> feature not found - terminating script")

        return

        }

    $log.WriteLine("Enable OnShutdown task to un-join Domain")

    $sdm.EnableUnJoin()

    if ( (-Not $RSAT.Installed) -and ($RSAT.InstallState -eq "Available") )

        {

        $log.WriteLine("Installing <RSAT-AD-PowerShell> feature")

        Install-WindowsFeature RSAT-AD-PowerShell

        }

    $log.WriteLine("Terminating script - ")

    return

    }

5.4. Joining Domain

If a machine is not member of the domain or member of the wrong domain, the script creates credentials from the Secret and requests domain joining with subsequent restart of the machine. The following code example performs all these tasks:

$log.WriteLine("Domain Join required")

$log.WriteLine("Disable OnShutdown task to avoid reboot loop")

$sdm.DisableUnJoin()

$password   = $Secret.Password | ConvertTo-SecureString -asPlainText -Force

$username   = $Secret.UserID + "@" + $Secret.Domain

$credential = New-Object System.Management.Automation.PSCredential($username,$password)

$log.WriteLine("Attempting to join domain <" + $Secret.Domain + ">")

Add-Computer -DomainName $Secret.Domain -Credential $credential -Restart -Force

$log.WriteLine("Requesting restart...")

 

6.   The On-Shutdown script

Many components of the On-Shutdown script, such as logging, working with the AWS Secrets Manager, and validating domain membership are either the same or very similar to respective components of the User Data script.

One interesting difference is that the On-Shutdown script accepts parameter from the respective policy; the value of this parameter is set by EnableUnJoin() and DisableUnJoin() functions in the User Data script to control whether domain un-join will happen on a particular reboot – something that we have discussed earlier. Thus, you have the following code example at the beginning of On-Shutdown script:

if ($cntrl -ne "run")

      {

      $log.WriteLine("Script param <" + $cntrl + "> not set to <run> - script terminated")

      return

      }

So, by setting the On-Shutdown policy parameter (a value in registry) to something other than “run” we can stop On-Shutdown script from executing – this is exactly what function DisableUnJoin() does. Similarly, the function EnableUnJoin() sets the value of this parameter to “run” thus allowing the On-Shutdown script to continue execution when invoked.

Another interesting problem with this script is how to implement removing a machine from the domain and deleting respective computer account from the Active Directory. If the script first removes machine from the domain, then it cannot find domain controller to delete computer account.

Alternatively, if the script first deletes computer account, and then tries to remove computer account by changing domain to a workgroup, this change would fail. The following code example represents how this issue was resolved in the script:

import-module ActiveDirectory

$DCHostName = (Get-ADDomainController -Discover).HostName

$log.WriteLine("Using Account <" + $username + ">")

$log.WriteLine("Using Domain Controller <" + $DCHostName + ">")

Remove-Computer -WorkgroupName "WORKGROUP" -UnjoinDomainCredential $credential -Force -Confirm:$false

Remove-ADComputer -Identity $MachineName -Credential $credential -Server "$DCHostName" -Confirm:$false

Before removing machine from the domain, the script obtains and stores in a local variable the name of one of the domain controllers. Then computer is switched from domain to the workgroup. As the last step, the respective computer account is being deleted from the AD using the host name of the domain controller obtained earlier.

7.   Managing Script Credentials

Both User Data and On-Shutdown scripts obtain the domain name and user credentials to add or remove computers from the domain from AWS Secrets Manager secret with the predefined name  prod/AD. This predefined name can be changed in the script.

Details on how to create a secret are available in AWS documentation. This secret should be defined as Other type of secrets and contain at least the following fields:

  • UserID
  • Password
  • Domain

Fill in the fields and choose Next. As illustrated on the following screenshot:

Screenshot Store a new secret. UserID, Password, Domain

Give the new secret the name prod/AD (this name is referred to in the script) and capture the secret’s ARN. The latter is required for creating a policy that allows access to this secret.

Screenshot of Secret Details and Secret Name

8.   Creating AWS Policy and Role to access the Credential Secret

8.1. Creating IAM Policy

The next step is to use IAM to create a policy that would allow access to the secret; the policy  statement will appear as following:

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "VisualEditor0",

            "Effect": "Allow",

            "Action": "secretsmanager:GetSecretValue",

            "Resource": "arn:aws:secretsmanager:us-east-1:NNNNNNNN7225:secret:prod/AD-??????"

        }

    ]

}

Below is the AWS console screenshot with the policy statement filled in:

Screenshot with policy statement filled in

The resource in the policy is identified with the “wildcard” characters for the 6 random characters at the end of the ARN, which may change when the Secret is updated. Configuring policy with the wildcard allows to extend rights to ALL versions of the secret, which would allow for changing credential information without changing respective policy.

Screenshot of review policy. Name AdminReadDomainCreds

Let’s name this policy AdminReadDomainCreds so that we may refer to it when creating an IAM Role.

 

8.2. Creating IAM Role

Now that we have defined AdminReadDomainCreds policy, we may create a role AdminDomainJoiner that would refer to this policy. On the Permission tab of the role creation dialog we should attach standard SSM policy for EC2, AmazonEC2RoleforSSM, together with the just created custom policy AdminReadDomainCreds.

The Permission tab of the role creation dialog with the respective roles attached is shown in the following screenshot:

Screenshot of permission tab with roles

This role should include our new policy, AdminReadDomainCreds, in addition to standard SSM policy for EC2.

9.   Launching the Instance

Now, you’re ready to launch the instance or create the launch configuration. When configuring the instance for launch, it’s important to assign the instance to the role AdminDomainJoiner, which you just created:

Screenshot of Configure Instance Details. IAM role as AdminDomainJoiner

In the Advanced Details section of the configuration screen, paste the script into the User Data field:

If you named your secret differently than the name prod/AD that I used, modify the script parameters SecretAD to use the name of your secret.

10.   Conclusion

That’s it! When you launch this instance, it will automatically join the domain. Upon Stop or Termination, the instance will remove itself from the domain.

 

For your convenience we provide the full text of the User Data script:

<powershell>
#------------------------------------------------------------------------------
# Script parameters
#------------------------------------------------------------------------------
[string]$SecretAD = "prod/AD"
[string]$LogPath = "C:\ProgramData\Amazon\CustomLaunch"
#------------------------------------------------------------------------------
#region Logger class
#----------------------------------------------
class Logger
{
[string] hidden $logFileName
#----------------------------------------
Logger([string] $Path, [string] $LogName)
{
if(!(Test-Path -Path $Path -pathType container))
{ $dummy = New-Item -ItemType directory -Path $Path }
$this.logFileName = [System.IO.Path]::Combine($Path, ($LogName + ".log"))
}
#----------------------------------------
[void] WriteLine([string] $msg)
{
[string] $logLine = (Get-Date -Format 'hh:mm:ss')
if ($msg) { Out-File -LiteralPath $this.logFileName -Append -InputObject ($logLine + " " + $msg) }
# Msg is null - creating new file
else { Out-File -LiteralPath $this.logFileName -InputObject ($logLine + " Log created") }
}
}
#----------------------------------------------
#endregion
#------------------------------------------------------------------------------
# Create Logger
#------------------------------------------------------------------------------
[Logger]$log = [Logger]::new($LogPath, "ADMgmt")
$log.WriteLine("--------------------------------------------------------------------")
$log.WriteLine("Log Started")
#------------------------------------------------------------------------------

#------------------------------------------------------------------------------
#region SDManager class
#----------------------------------------------
class SDManager
{
#-------------------------------------------------------------------
[string] hidden $ScriptName = "Shutdown-UnJoin.ps1"
[string] hidden $GPScrShd_0_0 = "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts\Shutdown\0\0"
[string] hidden $GPMScrShd_0_0 = "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts\Shutdown\0\0"
#-------------------------------------------------------------------
SDManager([string]$LogDirPath, [string]$SecretName)
{
[string] $GPRootPath = "C:\Windows\System32\GroupPolicy"
[string] $GPMcnPath = "C:\Windows\System32\GroupPolicy\Machine"
[string] $GPScrPath = "C:\Windows\System32\GroupPolicy\Machine\Scripts"
[string] $GPSShdPath = "C:\Windows\System32\GroupPolicy\Machine\Scripts\Shutdown"
[string] $ScriptFile = [System.IO.Path]::Combine($GPSShdPath, $this.ScriptName)
[string] $SecretLine = '[string]$SecretAD = "' + $SecretName + '"'
[string] $PathLine = '[string]$LogPath = "' + $LogDirPath + '"'
#----------------------------------------------------------------
#region Shutdown script (scheduled through Local Policy)
$ScriptBody =
@(
'param([string]$cntrl = "NotSet")',
$SecretLine,
$PathLine,
'[string]$MachineName = $env:COMPUTERNAME',
'class Logger ',
' {',
' [string] hidden $logFileName',
' Logger([string] $Path, [string] $LogName)',
' {',
' if(!(Test-Path -Path $Path -pathType container))',
' { $dummy = New-Item -ItemType directory -Path $Path }',
' $this.logFileName = [System.IO.Path]::Combine($Path, ($LogName + ".log"))',
' }',
' [void] WriteLine([string] $msg)',
' {',
' [string] $logLine = (Get-Date -Format "hh:mm:ss")',
' if ($msg) { Out-File -LiteralPath $this.logFileName -Append -InputObject ($logLine + " " + $msg) }',
' else { Out-File -LiteralPath $this.logFileName -InputObject ($logLine + " Log created") }',
' }',
' }',
'[Logger]$log = [Logger]::new($LogPath, "ADMgmt")',
'$log.WriteLine("-----------------------------------------")',
'$log.WriteLine("Log Started")',
'if ($cntrl -ne "run") ',
' { ',
' $log.WriteLine("Script param <" + $cntrl + "> not set to <run> - script terminated") ',
' return',
' }',
'$compSys = Get-WmiObject -Class Win32_ComputerSystem',
'if ( -Not ($compSys.PartOfDomain))',
' {',
' $log.WriteLine("Not member of a domain - terminating script")',
' return',
' }',
'$RSAT = (Get-WindowsFeature RSAT-AD-PowerShell)',
'if ( $RSAT -eq $null -or (-Not $RSAT.Installed) )',
' {',
' $log.WriteLine("<RSAT-AD-PowerShell> feature not found - terminating script")',
' return',
' }',
'$log.WriteLine("Removing machine <" +$MachineName + "> from Domain <" + $compSys.Domain + ">")',
'$log.WriteLine("Reading Secret <" + $SecretAD + ">")',
'Import-Module AWSPowerShell',
'try { $SecretObj = (Get-SECSecretValue -SecretId $SecretAD) }',
'catch ',
' { ',
' $log.WriteLine("Could not load secret <" + $SecretAD + "> - terminating execution")',
' return ',
' }',
'[PSCustomObject]$Secret = ($SecretObj.SecretString | ConvertFrom-Json)',
'$password = $Secret.Password | ConvertTo-SecureString -asPlainText -Force',
'$username = $Secret.UserID + "@" + $Secret.Domain',
'$credential = New-Object System.Management.Automation.PSCredential($username,$password)',
'import-module ActiveDirectory',
'$DCHostName = (Get-ADDomainController -Discover).HostName',
'$log.WriteLine("Using Account <" + $username + ">")',
'$log.WriteLine("Using Domain Controller <" + $DCHostName + ">")',
'Remove-Computer -WorkgroupName "WORKGROUP" -UnjoinDomainCredential $credential -Force -Confirm:$false ',
'Remove-ADComputer -Identity $MachineName -Credential $credential -Server "$DCHostName" -Confirm:$false ',
'$log.WriteLine("Machine <" +$MachineName + "> removed from Domain <" + $compSys.Domain + ">")'
)
#endregion
#----------------------------------------------------------------
if(!(Test-Path -Path $GPRootPath -pathType container))
{ New-Item -ItemType directory -Path $GPRootPath }
if(!(Test-Path -Path $GPMcnPath -pathType container))
{ New-Item -ItemType directory -Path $GPMcnPath }
if(!(Test-Path -Path $GPScrPath -pathType container))
{ New-Item -ItemType directory -Path $GPScrPath }
if(!(Test-Path -Path $GPSShdPath -pathType container))
{ New-Item -ItemType directory -Path $GPSShdPath }
#----------------------------------------
Set-Content $ScriptFile -Value $ScriptBody
#----------------------------------------
$RegistryScript =
@(
'Windows Registry Editor Version 5.00',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts]',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts\Shutdown]',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts\Shutdown\0]',
'"GPO-ID"="LocalGPO"',
'"SOM-ID"="Local"',
'"FileSysPath"="C:\\Windows\\System32\\GroupPolicy\\Machine"',
'"DisplayName"="Local Group Policy"',
'"GPOName"="Local Group Policy"',
'"PSScriptOrder"=dword:00000001',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts\Shutdown\0\0]',
'"Script"="Shutdown-UnJoin.ps1"',
'"Parameters"=""',
'"IsPowershell"=dword:00000001',
'"ExecTime"=hex(b):00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\Scripts\Startup]',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts]',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts\Shutdown]',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts\Shutdown\0]',
'"GPO-ID"="LocalGPO"',
'"SOM-ID"="Local"',
'"FileSysPath"="C:\\Windows\\System32\\GroupPolicy\\Machine"',
'"DisplayName"="Local Group Policy"',
'"GPOName"="Local Group Policy"',
'"PSScriptOrder"=dword:00000001',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts\Shutdown\0\0]',
'"Script"="Shutdown-UnJoin.ps1"',
'"Parameters"=""',
'"ExecTime"=hex(b):00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00',
'[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Group Policy\State\Machine\Scripts\Startup]'
)
[string] $RegistryFile = [System.IO.Path]::Combine($LogDirPath, "OnShutdown.reg")
Set-Content $RegistryFile -Value $RegistryScript
&regedit.exe /S "$RegistryFile"
}
#----------------------------------------
[void] DisableUnJoin()
{
Set-ItemProperty -Path $this.GPScrShd_0_0 -Name "Parameters" -Value "ignore"
Set-ItemProperty -Path $this.GPMScrShd_0_0 -Name "Parameters" -Value "ignore"
&gpupdate /Target:computer /Wait:0
}
#----------------------------------------
[void] EnableUnJoin()
{
Set-ItemProperty -Path $this.GPScrShd_0_0 -Name "Parameters" -Value "run"
Set-ItemProperty -Path $this.GPMScrShd_0_0 -Name "Parameters" -Value "run"
&gpupdate /Target:computer /Wait:0
}
}
#----------------------------------------------
#endregion
#------------------------------------------------------------------------------
# Create ShutDown Manager
#------------------------------------------------------------------------------
[SDManager]$sdm = [SDManager]::new($LogPath, $SecretAD, $LogPath)
#------------------------------------------------------------------------------
$log.WriteLine("Loading Secret <" + $SecretAD + ">")
#------------------------------------------------------------------------------
Import-Module AWSPowerShell
try { $SecretObj = (Get-SECSecretValue -SecretId $SecretAD) }
catch
{
$log.WriteLine("Could not load secret <" + $SecretAD + "> - terminating execution")
return
}
[PSCustomObject]$Secret = ($SecretObj.SecretString | ConvertFrom-Json)
$log.WriteLine("Domain (from Secret): <" + $Secret.Domain + ">")
#------------------------------------------------------------------------------
# Verify domain membership
#------------------------------------------------------------------------------
$compSys = Get-WmiObject -Class Win32_ComputerSystem
#------------------------------------------------------------------------------
if ( ($compSys.PartOfDomain) -and ($compSys.Domain -eq $Secret.Domain))
{
$log.WriteLine("Already member of: <" + $compSys.Domain + "> - Verifying RSAT Status")
#------------------------------------------------------------------------------
$RSAT = (Get-WindowsFeature RSAT-AD-PowerShell)
if ($RSAT -eq $null)
{
$log.WriteLine("<RSAT-AD-PowerShell> feature not found - terminating script")
return
}
#--------------------------------------------------------------------------
$log.WriteLine("Enable OnShutdown task to un-join Domain")
$sdm.EnableUnJoin()
#------------------------------------------------------------------------------
if ( (-Not $RSAT.Installed) -and ($RSAT.InstallState -eq "Available") )
{
$log.WriteLine("Installing <RSAT-AD-PowerShell> feature")
Install-WindowsFeature RSAT-AD-PowerShell
}
#--------------------------------------------------------------------------
$log.WriteLine("Terminating script - ")
return
}
#------------------------------------------------------------------------------
# Performing Domain Join
#------------------------------------------------------------------------------
$log.WriteLine("Domain Join required")
#------------------------------------------------------------------------------
$log.WriteLine("Disable OnShutdown task to avoid reboot loop")
$sdm.DisableUnJoin()
$password = $Secret.Password | ConvertTo-SecureString -asPlainText -Force
$username = $Secret.UserID + "@" + $Secret.Domain
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
#------------------------------------------------------------------------------
$log.WriteLine("Attempting to join domain <" + $Secret.Domain + ">")
#------------------------------------------------------------------------------
Add-Computer -DomainName $Secret.Domain -Credential $credential -Restart -Force
#------------------------------------------------------------------------------
$log.WriteLine("Requesting restart...")
#------------------------------------------------------------------------------
</powershell>
<persist>true</persist>

 

Centralizing Windows Logs with Amazon Elasticsearch Services

Post Syndicated from whiteemm original https://aws.amazon.com/blogs/compute/centralizing-windows-logs-with-amazon-elasticsearch-services/

Cloud administrators often rely on centralized logging systems to better understand their environments, learn usage patterns, and identify future problems so that they can pre-emptively prevent them from occurring, or troubleshoot them more effectively. Of course, these are just some of the functions a centralized logging strategy provides, all which help IT organizations become more productive.  However, delivering these insightful services in-house requires significant effort, including daily management and updates, backups, high availability, hardening, encryption in transit and at rest, and storage management.

 

Cloud admins can reduce this management effort by creating a near real-time event management solution running on AWS using Amazon Elasticsearch Services and Winlogbeat from Elastic as a Windows Agent. This blog post provides step by step instructions on how you can create such a management solution to analyze your Windows Logs based Amazon EC2 instances without the need of expensive third-party products.

Step by Step

Steps for this procedure include:

  1. Launch an Amazon Elasticsearch Service domain.
  2. Install and configure the Winlogbeat agent offered by Elastic.co in an Amazon EC2 running Windows instances.
  3. Customize Logs visualization on Kibana.

1. Launch an Amazon Elasticsearch Service domain.

  1. Open the AWS Console Elasticsearch Service Dashboard
  2. Click “Create a new domain”.
  3. Enter the Domain Name and select the version Elasticsearch 6.0 or earlier.
  4. Click Next.
  5. On Configure Cluster Page, choose the options that better fit your needs. In my case, it will be the following configuration:

Node Configuration

 – Instance Count: 1
– Instance Type: m4.large.elasticsearch

Storage

 – Storage Type: EBS
 – EBS Volume Type: General Purpose (SSD)
 – EBS Volume Size: 100 GB

 Snapshot Configuration

  – Automated snapshot start hour: 00:00 UTC (Default)

 

6. Click Next.

7. On Network Configuration page.

8. Select VPC Access (Recommended)

– Select the VPC, Subnet and Security Groups in which the Elasticsearch cluster will be launched. (The Amazon ElasticSearch Service uses TCP port 443 and must be released in the selected Security Group).

– Kibana Authentication will not be used in this demo, leave it blank.

– On Access Policy: Do not require signing request with IAM credential (Allows associated security groups within your VPC full access to the domain).

9. Click Next. 

10. Click Confirm and the Amazon Elasticsearch domain will be launched.

     While the Amazon Elasticsearch Service is launching, configure the agent on the Amazon EC2.

 

2. Install and configure the Winlogbeat agent offered by Elastic.co on Amazon EC2 Instances.

1.     Access Elasticsearch Winlogbeat and download the x64 installer.

Download Wingbeat Web page

2.     Extract the contents in the “C:\Program Files” directory and rename the extracted directory to Winlogbeat.

Powershell Console on the Winglobeat Directory

3.     Within the Winlogbeat directory (renamed earlier), there is a file called winlogbeat.yml, open it for editing.

4.     The winlogbeat.event_logs section should contain the name of the logs that will be sent to the Amazon Elasticsearch service.

File winlogbeat.event_logs describing the log name

*Note: Use the following command in PowerShell to access the name of all event logs available in the operating system:

       Get-WinEvent -Listlog * | Format-List -Property LogName > C:\Logs.txt

The output.elasticsearch section must contain the URL of the Amazon Elasticsearch Service domain. Port 443 must be specified as in the image below, otherwise the agent will attempt to use the default port of Elasticsearch (9300/tcp).

File winlogbeat.event_logs describing the Amazon Elasticsearch Service configuration

In the setup.kibana section, you must add the URL of the domain. Port 443 must be specified as in the image above (before /_plugin/kibana). The   URL must be entered exactly as in the image below (in quotation marks).

File winlogbeat.event_logs describing the Kibana configuration

* Note: The Amazon Elasticsearch Service and Kibana URLs are available in the administration console of the Elasticsearch domain just created in step 1.

Amazon Elasticsearch Service Console Overview tab

5.     After the winlogbeat.yml file is properly configured, install the agent.

6.     Open Powershell as administrator and navigate to the “C:\Program Files\ Winlogbeat” directory

Run the installation script: .\install-service-winlogbeat.ps1

Powershell Console running the command .\install-service-winlogbeat.ps1 inside the Winlogbeat directory

*Note: If script execution fails due to operating system policy restriction, run Setup with the following command:

PowerShell.exe -ExecutionPolicy UnRestricted -File .\install-service-winlogbeat.ps1

7.     If successfully installed, the Service Status will appear as Stopped. Verify that the configuration file has no syntax error with the following command:

.\winlogbeat.exe test config -c .\winlogbeat.yml

Powershell Console running the command .\winlogbeat.ext test config -c .\winlogbeat.yml inside the Winlogbeat directory

8.     Run the command to import the Kibana dashboard. The output should read ‘Loaded dashboards’.

.\winlogbeat.exe setup –dashboards

Powershell Console running the command .\winlogbeat.exe setup -dashboards inside the Winlogbeat directory

9.      Without errors in the configuration file and dashboards imported, start the service with the following command:

Start-Service winlogbeat

 Powershell Console running the command Start-Service winlogbeat inside the Winlogbeat directory

10.  Verify that the agent has successfully connected to the Amazon Elasticsearch Service by analyzing the log generated in the agent in “C:\ProgramData\winlogbeat\Logs”  file and search for Connection to backoff with Established result.

Checking the log on C:\ProgramData\Winlogbeat\Logs in order to confirm the sucessful connection between Agent and Amazon Elasticsearch Service

3. Customize Logs visualization on Kibana

Logs will now start being sent to the Amazon Elasticsearch Service and you can create insights about the environment.

1.     Access Kibana through a browser of an Amazon EC2 Windows instance at the Kibana address entered in the administration console of the Amazon Elasticsearch Service domain.

Amazon Elasticsearch Service Console Overview tab

2.     Navigate to the Dashboard tab and you will see the Dashboard named: Winlogbeat Dashboard, click on it and a screen similar to the one below will be displayed.

On Kibana Dashboard accessing Winlogbeat Dashboard

You now have an AWS-managed log centralization system. Just set up your Amazon EC2 instances to send the logs to Amazon Elasticsearch Service and create the filters as needed.

Here is an example on how about create a filter for Logon Failure for Event ID 4625. With the Security logs configured as a shipping log in winlogbeat.yml, any unsuccessful logon attempt will generate an EventID 4625 and this will be filtered.

Click Add a filter and create a filter as in the image below and click Save.

On Kibana Dashboard accessing Winlogbeat Dashboard, adding a filter for event ID 4625

After the filter is created, click PIN FILTER as shown in the image below and then click Discover.

On Kibana Dashboard accessing Winlogbeat Dashboard, pinning a filter

It will now be possible to better understand each log and take the appropriate actions.

On Kibana Dashboard accessing Winlogbeat Dashboard with the results

 

To learn more about Microsoft Workloads on AWS, go to:

https://aws.amazon.com/pt/solutionspace/microsoft-workloads/

 

Orchestrate big data workflows with Apache Airflow, Genie, and Amazon EMR: Part 1

Post Syndicated from Francisco Oliveira original https://aws.amazon.com/blogs/big-data/orchestrate-big-data-workflows-with-apache-airflow-genie-and-amazon-emr-part-1/

Large enterprises running big data ETL workflows on AWS operate at a scale that services many internal end-users and runs thousands of concurrent pipelines. This, together with a continuous need to update and extend the big data platform to keep up with new frameworks and the latest releases of big data processing frameworks, requires an efficient architecture and organizational structure that both simplifies management of the big data platform and promotes easy access to big data applications.

This post introduces an architecture that helps centralized platform teams maintain a big data platform to service thousands of concurrent ETL workflows, and simplifies the operational tasks required to accomplish that.

Architecture components

At high level, the architecture uses two open source technologies with Amazon EMR to provide a big data platform for ETL workflow authoring, orchestration, and execution. Genie provides a centralized REST API for concurrent big data job submission, dynamic job routing, central configuration management, and abstraction of the Amazon EMR clusters. Apache Airflow provides a platform for job orchestration that allows you to programmatically author, schedule, and monitor complex data pipelines. Amazon EMR provides a managed cluster platform that can run and scale Apache Hadoop, Apache Spark, and other big data frameworks.

The following diagram illustrates the architecture.

Apache Airflow

Apache Airflow is an open source tool for authoring and orchestrating big data workflows.

With Apache Airflow, data engineers define direct acyclic graphs (DAGs). DAGs describe how to run a workflow and are written in Python. Workflows are designed as a DAG that groups tasks that are executed independently. The DAG keeps track of the relationships and dependencies between tasks.

Operators define a template to define a single task in the workflow. Airflow provides operators for common tasks, and you can also define custom operators. This post discusses the custom operator (GenieOperator) to submit tasks to Genie.

A task is a parameterized instance of an operator. After an operator is instantiated, it’s referred to as a task. A task instance represents a specific run of a task. A task instance has an associated DAG, task, and point in time.

You can run DAGs and tasks on demand or schedule them to run at a specific time defined as a cron expression in the DAG.

For additional details on Apache Airflow, see Concepts in the Apache Airflow documentation.

Genie

Genie is an open source tool by Netflix that provides configuration-management capabilities and dynamic routing of jobs by abstracting access to the underlining Amazon EMR clusters.

Genie provides a REST API to submit jobs from big data applications such as Apache Hadoop MapReduce or Apache Spark. Genie manages the metadata of the underlining clusters and the commands and applications that run in the clusters.

Genie abstracts access to the processing clusters by associating one or more tags with the clusters. You can also associate tags with the metadata details for the applications and commands that the big data platform supports. As Genie receives job submissions for specific tags, it uses a combination of the cluster/command tag to route each job to the correct EMR cluster dynamically.

Genie’s data model

Genie provides a data model to capture the metadata associated with resources in your big data environment.

An application resource is a reusable set of binaries, configuration files, and setup files to install and configure applications supported by the big data platform on the Genie node that submits the jobs to the clusters. When Genie receives a job, the Genie node downloads all dependencies, configuration files, and setup files associated with the applications and stores it in a job working directory. Applications are linked to commands because they represent the binaries and configurations needed before a command runs.

Command resources represent the parameters when using the command line to submit work to a cluster and which applications need to be available on the PATH to run the command. Command resources glue metadata components together. For example, a command resource representing a Hive command would include a hive-site.xml and be associated with a set of application resources that provide the Hive and Hadoop binaries needed to run the command. Moreover, a command resource is linked to the clusters it can run on.

A cluster resource identifies the details of an execution cluster, including connection details, cluster status, tags, and additional properties. A cluster resource can register with Genie during startup and deregister during termination automatically. Clusters are linked to one or more commands that can run in it. After a command is linked to a cluster, Genie can start submitting jobs to the cluster.

Lastly, there are three job resource types: job request, job, and job execution. A job request resource represents the request submission with details to run a job. Based on the parameters submitted in the request, a job resource is created. The job resource captures details such as the command, cluster, and applications associated with the job. Additionally, information on status, start time, and end time is also available on the job resource. A job execution resource provides administrative details so you can understand where the job ran.

For more information, see Data Model on the Genie Reference Guide.

Amazon EMR and Amazon S3

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. For more information, see Overview of Amazon EMR Architecture and Overview of Amazon EMR.

Data is stored in Amazon S3, an object storage service with scalable performance, ease-of-use features, and native encryption and access control capabilities. For more details on S3, see Amazon S3 as the Data Lake Storage Platform.

Architecture deep dive

Two main actors interact with this architecture: platform admin engineers and data engineers.

Platform admin engineers have administrator access to all components. They can add or remove clusters, and configure the applications and the commands that the platform supports.

Data engineers focus on writing big data applications with their preferred frameworks (Apache Spark, Apache Hadoop MR, Apache Sqoop, Apache Hive, Apache Pig, and Presto) and authoring python scripts to represent DAGs.

At high level, the team of platform admin engineers prepares the supported big data applications and its dependencies and registers them with Genie. The team of platform admin engineers launches Amazon EMR clusters that register with Genie during startup.

The team of platform admin engineers associates each Genie metadata resource (applications, commands, and clusters) with Genie tags. For example, you can associate a cluster resource with a tag named environment and the value can be “Production Environment”, “Test Environment”, or “Development Environment”.

Data engineers author workflows as Airflow DAGs and use a custom Airflow Operator—GenieOperator—to submit tasks to Genie. They can use a combination of tags to identify the type of tasks they are running plus where the tasks should run. For example, you might need to run Apache Spark 2.4.3 tasks in the environment identified by the “Production Environment” tag. To do this, set the cluster and command tags in the Airflow GenieOperator as the following code:

(cluster_tags=['emr.cluster.environment:production'],command_tags=['type:spark-submit','ver:2.4.3'])

The following diagram illustrates this architecture.

The workflow, as it corresponds to the numbers in this diagram are as follows:

  1. A platform admin engineer prepares the binaries and dependencies of the supported applications (Spark-2.4.5, Spark-2.1.0, Hive-2.3.5, etc.). The platform admin engineer also prepares commands (spark-submit, hive). The platform admin engineer registers applications and commands with Genie. Moreover, the platform admin engineer associates commands with applications and links commands to a set of clusters after step 2 (below) is concluded.
  2. Amazon EMR cluster(s) register with Genie during startup.
  3. A data engineer authors Airflow DAGs and uses the Genie tag to reference the environment, application, command or any combination of the above. In the workflow code, the data engineer uses the GenieOperator. The GenieOperator submits jobs to Genie.
  4. A schedule triggers workflow execution or a data engineer manually triggers the workflow execution. The jobs that compose the workflow are submitted to Genie for execution with a set of Genie tags that specify where the job should be run.
  5. The Genie node, working as the client gateway, will set up a working directory with all binaries and dependencies. Genie dynamically routes the jobs to the cluster(s) associated with the provided Genie tags. The Amazon EMR clusters run the jobs.

For details on the authorization and authentication mechanisms supported by Apache Airflow and Genie see Security in the Apache Airflow documentation and Security in the Genie documentation.  This architecture pattern does not expose SSH access to the Amazon EMR clusters. For details on providing different levels of access to data in Amazon S3 through EMR File System (EMRFS), see Configure IAM Roles for EMRFS Requests to Amazon S3.

Use cases enabled by this architecture

The following use cases demonstrate the capabilities this architecture provides.

Managing upgrades and deployments with no downtime and adopting the latest open source release

In a large organization, teams that use the data platform use heterogeneous frameworks and different versions. You can use this architecture to support upgrades with no downtime and offer the latest version of open source frameworks in a short amount of time.

Genie and Amazon EMR are the key components to enable this use case. As the Amazon EMR service team strives to add the latest version of the open source frameworks running on Amazon EMR in a short release cycle, you can keep up with your internal teams’ needs of the latest features of their preferred open source framework.

When a new version of the open source framework is available, you need to test it, add the new supported version and its dependencies to Genie, and move tags in the old cluster to the new one. The new cluster takes new job submissions, and the old cluster concludes jobs it is still running.

Moreover, because Genie centralizes the location of application binaries and its dependencies, upgrading binaries and dependencies in Genie also upgrades any upstream client automatically. Using Genie removes the need for upgrading all upstream clients.

Managing a centralized configuration, job and cluster status, and logging

In a universe of thousands of jobs and multiple clusters, you need to identify where a specific job is running and access logging details quickly. This architecture gives you visibility into jobs running on the data platform, logging of jobs, clusters, and their configurations.

Having programmatic access to the big data platform

This architecture enables a single point of job submissions by using Genie’s REST API. Access to the underlying cluster is abstracted through a set of APIs that enable administration tasks plus submitting jobs to the clusters. A REST API call submits jobs into Genie asynchronously. If accepted, a job-id is returned that you can use to get job status and outputs programmatically via API or web UI. A Genie node sets up the working directory and runs the job on a separate process.

You can also integrate this architecture with continuous integration and continuous delivery (CI/CD) pipelines for big data application and Apache Airflow DAGs.

Enabling scalable client gateways and concurrent job submissions

The Genie node acts as a client gateway (edge node) and can scale horizontally to make sure the client gateway resources used to submit jobs to the data platform meet demand. Moreover, Genie allows the submission of concurrent jobs.

When to use this architecture

This architecture is recommended for organizations that use multiple large, multi-tenant processing clusters instead of transient clusters. It is out of the scope of this post to address when organizations should consider always-on clusters versus transient clusters (you can use an EMR Airflow Operator to spin up Amazon EMR clusters that register with Genie, run a job, and tear them down). You should use Reserved Instances with this architecture. For more information, see Using Reserved Instances.

This architecture is especially recommended for organizations that choose to have a central platform team to administer and maintain a big data platform that supports many internal teams that require thousands of jobs to run concurrently.

This architecture might not make sense for organizations that are not at as large or don’t expect to grow to that scale. The benefits of cluster abstraction and centralized configuration management are ideal in bringing structured access to a potentially chaotic environment of thousands of concurrent workflows and hundreds of teams.

This architecture is also recommended for organizations that support a high percentage of multi-hour or overlapping workflows and heterogeneous frameworks (Apache Spark, Apache Hive, Apache Pig, Apache Hadoop MapReduce, Apache Sqoop, or Presto).

If your organization relies solely on Apache Spark and is aligned with the recommendations discussed previously, this architecture might still apply. For organizations that don’t have the scale to justify the need for centralized REST API for job submission, cluster abstraction, dynamic job routing, or centralized configuration management, Apache Livy plus Amazon EMR might be the appropriate option. Genie has its own scalable infrastructure that acts as the edge client. This means that Genie does not compete with Amazon EMR master instance resources, whereas Apache Livy does.

If the majority of your organization’s workflows are a few short-lived jobs, opting for a serverless processing layer, serverless ad hoc querying layer, or using dedicated transient Amazon EMR clusters per workflow might be more appropriate. If the majority of your organization’s workflows are composed of thousands of short-lived jobs, the architecture still applies because it removes the need to spin up and down clusters.

This architecture is recommended for organizations that require full control of the processing platform to optimize component performance. Moreover, this architecture is recommended for organizations that need to enforce centralized governance on their workflows via CI/CD pipelines.

It is out of the scope of this post to evaluate different orchestration options or the benefits of adopting Airflow as the orchestration layer. When considering adopting an architecture, also consider the existing skillset and time to adopt tooling. The open source nature of Genie may allow you to integrate other orchestration tools. Evaluating that route might be an option if you wish to adopt this architecture with another orchestration tool.

Conclusion

This post presented how to use Apache Airflow, Genie, and Amazon EMR to manage big data workflows. The post described the architecture components, the use cases the architecture supports, and when to use it. The second part of this post deploys a demo environment and walks you through the steps to configure Genie and use the GenieOperator for Apache Airflow.

 


About the Author

Francisco Oliveira is a senior big data solutions architect with AWS. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.

 

 

 

Jelez Raditchkov is a practice manager with AWS.

Welcome Pavritha — Sales Engineer

Post Syndicated from Nicole Perry original https://www.backblaze.com/blog/welcome-pavritha-sales-engineer/

In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure so in comes Pavritha, Sales Engineer! Let’s learn a little more about her, shall we?

What is your Backblaze Title?
Sales Engineer.

Where are you originally from?
India.

What attracted you to Backblaze?
The exciting opportunity and the work culture.

What do you expect to learn while being at Backblaze?
I will try to learn everything possible and focus a lot on my role.

Where else have you worked?
Yotascale Inc. worked for a year.

Where did you go to school?
California State University -East Bay.

What’s your dream job?
I would love to do travel journalism at some point in time.

Of what achievements are you most proud of?
Flying all the way from India to the US achieving my master’s degree and now to land in a job as a Solutions Engineer.

Favorite place you’ve traveled?
Florida.

Favorite hobby?
Traveling and trying out different food.

Favorite food?
I am basically foodie so I enjoy food and if it has to specific then it will be Asian and Italian cuisine.

Anything else you’d like to tell us?
So I am a person who likes to experiment a lot with food whenever I get a chance I will try out some new place or a new cuisine and I also love to recreate those dishes at home. One more thing which I am sort of good at is giving good choices when it comes to food based on what the occasion is also based on the food.

You’re in luck Pavritha! We just started a blog post series about how to become a digital nomad. Welcome aboard to the team!

 

The post Welcome Pavritha — Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

30 години по-късно: PEW за отношението към промените от 1989

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/10/22/30-2/

Как държавите от бившия Източен блок оценяват политическите и икономическите промени след 1989 година – изследването е на PEW и се отнася до Русия, Украйна и държавите от Централна и Източна Европа, включително България.

 

PG_10.15.19.europe.values-01-012

Sending Push Notifications to iOS 13 Devices with Amazon SNS

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/sending-push-notifications-to-ios-13-devices-with-amazon-sns/

Note: This post was written by Alan Chuang, a Senior Product Manager on the AWS Messaging team.


On September 19, 2019, Apple released iOS 13. This update introduced changes to the Apple Push Notification service (APNs) that can impact your existing workloads. Amazon SNS has made some changes that ensure uninterrupted delivery of push notifications to iOS 13 devices.

iOS 13 introduced a new and required header called apns-push-type. The value of this header informs APNs of the contents of your notification’s payload so that APNs can respond to the message appropriately. The possible values for this header are:

  • alert
  • background
  • voip
  • complication
  • fileprovider
  • mdm

Apple’s documentation indicates that the value of this header “must accurately reflect the contents of your notification’s payload. If there is a mismatch, or if the header is missing on required systems, APNs may return an error, delay the delivery of the notification, or drop it altogether.”

We’ve made some changes to the Amazon SNS API that make it easier for developers to handle this breaking change. When you send push notifications of the alert or background type, Amazon SNS automatically sets the apns-push-type header to the appropriate value for your message. For more information about creating an alert type and a background type notification, see Generating a Remote Notification and Pushing Background Updates to Your App on the Apple Developer website.

Because you might not have enough time to react to this breaking change, Amazon SNS provides two options:

  • If you want to set the apns-push-type header value yourself, or the contents of your notification’s payload require the header value to be set to voip, complication, fileprovider, or mdm, Amazon SNS lets you set the header value as a message attribute using the Amazon SNS Publish API action. For more information, see Specifying Custom APNs Header Values and Reserved Message Attributes for Mobile Push Notifications in the Amazon SNS Developer Guide.
  • If you send push notifications as alert or background type, and if the contents of your notification’s payload follow the format described in the Apple Developer documentation, then Amazon SNS automatically sets the correct header value. To send a background notification, the format requires the aps dictionary to have only the content-available field set to 1. For more information about creating an alert type and a background type notification, see Generating a Remote Notification and Pushing Background Updates to Your App on the Apple Developer website.

We hope these changes make it easier for you to send messages to your customers who use iOS 13. If you have questions about these changes, please open a ticket with our Customer Support team through the AWS Management Console.

Plague at Pi Towers

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/plague-at-pi-towers/

Alex, Helen and I are all in our respective beds today with the plague. So your usual blog fodder won’t get served up today because none of us can look at a monitor for more than thirty seconds at a trot: instead I’m going to ask you to come up with some content for us. Let us know in the comments what you think we should be blogging about next, and also if you have any top sinus remedies.

The post Plague at Pi Towers appeared first on Raspberry Pi.

Predictive User Engagement using Amazon Pinpoint and Amazon Personalize

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/predictive-user-engagement-using-amazon-pinpoint-and-amazon-personalize/

Note: This post was written by John Burry, a Solution Architect on the AWS Customer Engagement team.


Predictive User Engagement (PUE) refers to the integration of machine learning (ML) and customer engagement services. By implementing a PUE solution, you can combine ML-based predictions and recommendations with real-time notifications and analytics, all based on your customers’ behaviors.

This blog post shows you how to set up a PUE solution by using Amazon Pinpoint and Amazon Personalize. Best of all, you can implement this solution even if you don’t have any prior machine learning experience. By completing the steps in this post, you’ll be able to build your own model in Personalize, integrate it with Pinpoint, and start sending personalized campaigns.

Prerequisites

Before you complete the steps in this post, you need to set up the following:

  • Create an admin user in Amazon Identity and Account Management (IAM). For more information, see Creating Your First IAM Admin User and Group in the IAM User Guide. You need to specify the credentials of this user when you set up the AWS Command Line Interface.
  • Install Python 3 and the pip package manager. Python 3 is installed by default on recent versions of Linux and macOS. If it isn’t already installed on your computer, you can download an installer from the Python website.
  • Use pip to install the following modules:
    • awscli
    • boto3
    • jupyter
    • matplotlib
    • sklearn
    • sagemaker

    For more information about installing modules, see Installing Python Modules in the Python 3.X Documentation.

  • Configure the AWS Command Line Interface (AWS CLI). During the configuration process, you have to specify a default AWS Region. This solution uses Amazon Sagemaker to build a model, so the Region that you specify has to be one that supports Amazon Sagemaker. For a complete list of Regions where Sagemaker is supported, see AWS Service Endpoints in the AWS General Reference. For more information about setting up the AWS CLI, see Configuring the AWS CLI in the AWS Command Line Interface User Guide.
  • Install Git. Git is installed by default on most versions of Linux and macOS. If Git isn’t already installed on your computer, you can download an installer from the Git website.

Step 1: Create an Amazon Pinpoint Project

In this section, you create and configure a project in Amazon Pinpoint. This project contains all of the customers that we will target, as well as the recommendation data that’s associated with each one. Later, we’ll use this data to create segments and campaigns.

To set up the Amazon Pinpoint project

  1. Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint/.
  2. On the All projects page, choose Create a project. Enter a name for the project, and then choose Create.
  3. On the Configure features page, under SMS and voice, choose Configure.
  4. Under General settings, select Enable the SMS channel for this project, and then choose Save changes.
  5. In the navigation pane, under Settings, choose General settings. In the Project details section, copy the value under Project ID. You’ll need this value later.

Step 2: Create an Endpoint

In Amazon Pinpoint, an endpoint represents a specific method of contacting a customer, such as their email address (for email messages) or their phone number (for SMS messages). Endpoints can also contain custom attributes, and you can associate multiple endpoints with a single user. In this example, we use these attributes to store the recommendation data that we receive from Amazon Personalize.

In this section, we create a new endpoint and user by using the AWS CLI. We’ll use this endpoint to test the SMS channel, and to test the recommendations that we receive from Personalize.

To create an endpoint by using the AWS CLI

  1. At the command line, enter the following command:
    aws pinpoint update-endpoint --application-id <project-id> \
    --endpoint-id 12456 --endpoint-request "Address='<mobile-number>', \
    ChannelType='SMS',User={UserAttributes={recommended_items=['none']},UserId='12456'}"

    In the preceding example, replace <project-id> with the Amazon Pinpoint project ID that you copied in Step 1. Replace <mobile-number> with your phone number, formatted in E.164 format (for example, +12065550142).

Note that this endpoint contains hard-coded UserId and EndpointId values of 12456. These IDs match an ID that we’ll create later when we generate the Personalize data set.

Step 3: Create a Segment and Campaign in Amazon Pinpoint

Now that we have an endpoint, we need to add it to a segment so that we can use it within a campaign. By sending a campaign, we can verify that our Pinpoint project is configured correctly, and that we created the endpoint correctly.

To create the segment and campaign

  1. Open the Pinpoint console at http://console.aws.amazon.com/pinpoint, and then choose the project that you created in Step 1.
  2. In the navigation pane, choose Segments, and then choose Create a segment.
  3. Name the segment “No recommendations”. Under Segment group 1, on the Add a filter menu, choose Filter by user.
  4. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “none”.
  5. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  6. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  7. Name the campaign “SMS to users with no recommendations”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  8. On the Choose a segment page, choose the “No recommendations” segment that you just created, and then choose Next.
  9. In the message editor, type a test message, and then choose Next.
  10. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  11. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Step 4: Load sample data into Amazon Personalize

At this point, we’ve finished setting up Amazon Pinpoint. Now we can start loading data into Amazon Personalize.

To load the data into Amazon Personalize

  1. At the command line, enter the following command to clone the sample data and Jupyter Notebooks to your computer:
    git clone https://github.com/markproy/personalize-car-search.git

  2. At the command line, change into the directory that contains the data that you just cloned. Enter the following command:
    jupyter notebook

    A new window opens in your web browser.

  3. In your web browser, open the first notebook (01_generate_data.ipynb). On the Cell menu, choose Run all. Wait for the commands to finish running.
  4. Open the second notebook (02_make_dataset_group.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  5. Open the third notebook (03_make_campaigns.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  6. Open the fourth notebook (04_use_the_campaign.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete.
  7. After the fourth notebook is finished running, choose Quit to terminate the Jupyter Notebook. You don’t need to run the fifth notebook for this example.
  8. Open the Amazon Personalize console at http://console.aws.amazon.com/personalize. Verify that Amazon Personalize contains one dataset group named car-dg.
  9. In the navigation pane, choose Campaigns. Verify that it contains all of the following campaigns, and that the status for each campaign is Active:
    • car-popularity-count
    • car-personalized-ranking
    • car-hrnn-metadata
    • car-sims
    • car-hrnn

Step 5: Create the Lambda function

We’ve loaded the data into Amazon Personalize, and now we need to create a Lambda function to update the endpoint attributes in Pinpoint with the recommendations provided by Personalize.

The version of the AWS SDK for Python that’s included with Lambda doesn’t include the libraries for Amazon Personalize. For this reason, you need to download these libraries to your computer, put them in a .zip file, and upload the entire package to Lambda.

To create the Lambda function

  1. In a text editor, create a new file. Paste the following code.
    # Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
    #
    # This file is licensed under the Apache License, Version 2.0 (the "License").
    # You may not use this file except in compliance with the License. A copy of the
    # License is located at
    #
    # http://aws.amazon.com/apache2.0/
    #
    # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
    # OF ANY KIND, either express or implied. See the License for the specific
    # language governing permissions and limitations under the License.
    
    AWS_REGION = "<region>"
    PROJECT_ID = "<project-id>"
    CAMPAIGN_ARN = "<car-hrnn-campaign-arn>"
    USER_ID = "12456"
    endpoint_id = USER_ID
    
    from datetime import datetime
    import json
    import boto3
    import logging
    from botocore.exceptions import ClientError
    
    DATE = datetime.now()
    
    personalize           = boto3.client('personalize')
    personalize_runtime   = boto3.client('personalize-runtime')
    personalize_events    = boto3.client('personalize-events')
    pinpoint              = boto3.client('pinpoint')
    
    def lambda_handler(event, context):
        itemList = get_recommended_items(USER_ID,CAMPAIGN_ARN)
        response = update_pinpoint_endpoint(PROJECT_ID,endpoint_id,itemList)
    
        return {
            'statusCode': 200,
            'body': json.dumps('Lambda execution completed.')
        }
    
    def get_recommended_items(user_id, campaign_arn):
        response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                           userId=str(user_id), 
                                                           numResults=10)
        itemList = response['itemList']
        return itemList
    
    def update_pinpoint_endpoint(project_id,endpoint_id,itemList):
        itemlistStr = []
        
        for item in itemList:
            itemlistStr.append(item['itemId'])
    
        pinpoint.update_endpoint(
        ApplicationId=project_id,
        EndpointId=endpoint_id,
        EndpointRequest={
                            'User': {
                                'UserAttributes': {
                                    'recommended_items': 
                                        itemlistStr
                                }
                            }
                        }
        )    
    
        return
    

    In the preceding code, make the following changes:

    • Replace <region> with the name of the AWS Region that you want to use, such as us-east-1.
    • Replace <project-id> with the ID of the Amazon Pinpoint project that you created earlier.
    • Replace <car-hrnn-campaign-arn> with the Amazon Resource Name (ARN) of the car-hrnn campaign in Amazon Personalize. You can find this value in the Amazon Personalize console.
  2. Save the file as pue-get-recs.py.
  3. Create and activate a virtual environment. In the virtual environment, use pip to download the latest versions of the boto3 and botocore libraries. For complete procedures, see Updating a Function with Additional Dependencies With a Virtual Environment in the AWS Lambda Developer Guide. Also, add the pue-get-recs.py file to the .zip file that contains the libraries.
  4. Open the IAM console at http://console.aws.amazon.com/iam. Create a new role. Attach the following policy to the role:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:DescribeLogGroups",
                    "logs:CreateLogGroup",
                    "logs:PutLogEvents",
                    "personalize:GetRecommendations",
                    "mobiletargeting:GetUserEndpoints",
                    "mobiletargeting:GetApp",
                    "mobiletargeting:UpdateEndpointsBatch",
                    "mobiletargeting:GetApps",
                    "mobiletargeting:GetEndpoint",
                    "mobiletargeting:GetApplicationSettings",
                    "mobiletargeting:UpdateEndpoint"
                ],
                "Resource": "*"
            }
        ]
    }
    
  5. Open the Lambda console at http://console.aws.amazon.com/lambda, and then choose Create function.
  6. Create a new Lambda function from scratch. Choose the Python 3.7 runtime. Under Permissions, choose Use an existing role, and then choose the IAM role that you just created. When you finish, choose Create function.
  7. Upload the .zip file that contains the Lambda function and the boto3 and botocore libraries.
  8. Under Function code, change the Handler value to pue-get-recs.lambda_handler. Save your changes.

When you finish creating the function, you can test it to make sure it was set up correctly.

To test the Lambda function

  1. On the Select a test event menu, choose Configure test events. On the Configure test events window, specify an Event name, and then choose Create.
  2. Choose the Test button to execute the function.
  3. If the function executes successfully, open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint.
  4. In the navigation pane, choose Segments, and then choose the “No recommendations” segment that you created earlier. Verify that the number under total endpoints is 0. This is the expected value; the segment is filtered to only include endpoints with no recommendation attributes, but when you ran the Lambda function, it added recommendations to the test endpoint.

Step 7: Create segments and campaigns based on recommended items

In this section, we’ll create a targeted segment based on the recommendation data provided by our Personalize dataset. We’ll then use that segment to create a campaign.

To create a segment and campaign based on personalized recommendations

  1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. On the All projects page, choose the project that you created earlier.
  2. In the navigation pane, choose Segments, and then choose Create a segment. Name the new segment “Recommendations for product 26304”.
  3. Under Segment group 1, on the Add a filter menu, choose Filter by user. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “26304”. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  4. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  5. Name the campaign “SMS to users with recommendations for product 26304”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  6. On the Choose a segment page, choose the “Recommendations for product 26304” segment that you just created, and then choose Next.
  7. In the message editor, type a test message, and then choose Next.
  8. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  9. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Next steps

Your PUE solution is now ready to use. From here, there are several ways that you can make the solution your own:

  • Expand your usage: If you plan to continue sending SMS messages, you should request a spending limit increase.
  • Extend to additional channels: This post showed the process of setting up an SMS campaign. You can add more endpoints—for the email or push notification channels, for example—and associate them with your users. You can then create new segments and new campaigns in those channels.
  • Build your own model: This post used a sample data set, but Amazon Personalize makes it easy to provide your own data. To start building a model with Personalize, you have to provide a data set that contains information about your users, items, and interactions. To learn more, see Getting Started in the Amazon Personalize Developer Guide.
  • Optimize your model: You can enrich your model by sending your mobile, web, and campaign engagement data to Amazon Personalize. In Pinpoint, you can use event streaming to move data directly to S3, and then use that data to retrain your Personalize model. To learn more about streaming events, see Streaming App and Campaign Events in the Amazon Pinpoint User Guide.
  • Update your recommendations on a regular basis: Use the create-campaign API to create a new recurring campaign. Rather than sending messages, include the hook property with a reference to the ARN of the pue-get-recs function. By completing this step, you can configure Pinpoint to retrieve the most up-to-date recommendation data each time the campaign recurs. For more information about using Lambda to modify segments, see Customizing Segments with AWS Lambda in the Amazon Pinpoint Developer Guide.

VC4 and V3D OpenGL drivers for Raspberry Pi: an update

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/vc4-and-v3d-opengl-drivers-for-raspberry-pi-an-update/

Here’s an update from Iago Toral of Igalia on development of the open source VC4 and V3D OpenGL drivers used by Raspberry Pi.

Some of you may already know that Eric Anholt, the original developer of the open source VC4 and V3D OpenGL drivers used by Raspberry Pi, is no longer actively developing these drivers and a team from Igalia has stepped in to continue his work. My name is Iago Toral (itoral), and together with my colleagues Alejandro Piñeiro (apinheiro) and José Casanova (chema), we have been hard at work learning about the V3D GPU hardware and Eric’s driver design over the past few months.

Learning a new GPU is a lot of work, but I think we have been making good progress and in this post we would like to share with the community some of our recent contributions to the driver and some of the plans we have for the future.

But before we go into the technical details of what we have been up to, I would like to give some context about the GPU hardware and current driver status for Raspberry Pi 4, which is where we have been focusing our efforts.

The GPU bundled with Raspberry Pi 4 is a VideoCore VI capable of OpenGL ES 3.2, a significant step above the VideoCore IV present in Raspberry Pi 3 which could only do OpenGL ES 2.0. Despite the fact that both GPU models belong in Broadcom’s VideoCore family, they have quite significant architectural differences, so we also have two separate OpenGL driver implementations. Unfortunately, as you may have guessed, this also means that driver work on one GPU won’t be directly useful for the other, and that any new feature development that we do for the Raspberry Pi 4 driver stack won’t naturally transport to Raspberry Pi 3.

The driver code for both GPU models is available in the Mesa upstream repository. The codename for the VideoCore IV driver is VC4, and the codename for the VideoCore VI driver is V3D. There are no downstream repositories – all development happens directly upstream, which has a number of benefits for end users:

  1. It is relatively easy for the more adventurous users to experiment with development builds of the driver.
  2. It is fairly simple to follow development activities by tracking merge requests with the V3D and VC4 labels.

At present, the V3D driver exposes OpenGL ES 3.0 and OpenGL 2.1. As I mentioned above, the VideoCore VI GPU can do OpenGL ES 3.2, but it can’t do OpenGL 3.0, so future feature work will focus on OpenGL ES.

Okay, so with that introduction out of the way, let’s now go into the nitty-gritty of what we have been working on as we ramped up over the last few months:

Disclaimer: I won’t detail here everything we have been doing because then this would become a long and boring changelog list; instead I will try to summarize the areas where we put more effort and the benefits that the work should bring. For those interested in the full list of changes, you can always go to the upstream Mesa repository and scan it for commits with Igalia authorship and the v3d tag.

First we have the shader compiler, where we implemented a bunch of optimizations that should be producing better (faster) code for many shader workloads. This involved work at the NIR level, the lower-level IR specific to V3D, and the assembly instruction scheduler. The shader-db graph below shows how the shader compiler has evolved over the last few months. It should be noted here that one of the benefits of working within the Mesa ecosystem is that we get a lot of shader optimization work done by other Mesa contributors, since some parts of the compiler stack are shared across multiple drivers.

Bar chart with y-axis range from -12.00% to +2.00%. It is annotated, "Lower is better except for Threads". There are four bars: Instructions (about -4.75%); Threads (about 0.25%); Uniforms (about -11.00%); and Splits (about 0.50%).

Evolution of the shader compiler (June vs present)

Another area where we have done significant work is transform feedback. Here, we fixed some relevant flushing bugs that could cause transform feedback results to not be visible to applications after rendering. We also optimized the transform feedback process to better use the hardware for in-pipeline synchronization of transform feedback workloads without having to always resort to external job flushing, which should be better for performance. Finally, we also provided a better implementation for transform feedback primitive count queries that makes better use of the GPU (the previous implementation handled all this on the CPU side), which is also correct at handling overflow of the transform feedback buffers (there was no overflow handling previously).

We also implemented support for OpenGL Logic Operations, an OpenGL 2.0 feature that was somehow missing in the V3D driver. This was responsible for this bug, since, as it turns out, the default LibreOffice theme in Raspbian was triggering a path in Glamor that relied on this feature to render the cursor. Although Raspbian has since been updated to use a different theme, we made sure to implement this feature and verify that the bug is now fixed for the original theme as well.

Fixing Piglit and CTS test failures has been another focus of our work in these initial months, trying to get us closer to driver conformance. You can check the graph below showcasing Piglit test results to have a quick view at how things have evolved over the last few months. This work includes a relevant bug fix for a rather annoying bug in the way the kernel driver was handling L2 cache invalidation that could lead to GPU hangs. If you have observed any messages from the kernel warning about write violations (maybe accompanied by GPU hangs), those should now be fixed in the kernel driver. This fix goes along with a user-space fix to go that should be merged soon in the upstream V3D driver.

A bar chart with y-axis ranging from 0 to 16000. There are three groups of bars: "June (master)"; "Present (master)"; Present (GLES 3.1)". Each group has three bars: Pass; Fail; Skip. Passes are higher in the "Present (master)" and "Present (GLES 3.1)" groups of bars than in the "June (master)" group, and skips and fails are lower.

Evolution of Piglit test results (June vs present)

A a curiosity, here is a picture of our own little continuous integration system that we use to run regression tests both regularly and before submitting code for review.

Ten Raspberry Pis with small black fans, most of them in colourful Pimoroni Pibow open cases, in a nest of cables and labels

Our continuous integration system

The other big piece of work we have been tackling, and that we are very excited about, is OpenGL ES 3.1, which will bring Compute Shaders to Raspberry Pi 4! Credit for this goes to Eric Anholt, who did all the implementation work before leaving – he just never got to the point where it was ready to be merged, so we have picked up Eric’s original work, rebased it, and worked on bug fixes to have a fully conformant implementation. We are currently hard at work squashing the last few bugs exposed by the Khronos Conformance Test Suite and we hope to be ready to merge this functionality in the next major Mesa release, so look forward to it!

Compute Shaders is a really cool feature but it won’t be the last. I’d like to end this post with a small note on another important large feature that is currently in early stages of development: Geometry Shaders, which will bring the V3D driver one step closer to exposing a full programmable 3D pipeline – so look forward to that as well!

The post VC4 and V3D OpenGL drivers for Raspberry Pi: an update appeared first on Raspberry Pi.

Running Java applications on Amazon EC2 A1 instances with Amazon Corretto

Post Syndicated from Neelay Thaker original https://aws.amazon.com/blogs/compute/running-java-applications-on-amazon-ec2-a1-instances-with-amazon-corretto/

This post is contributed by Jeff Underhill | EC2 Principal Business Development Manager and Arthur Petitpierre | Arm Specialist Solutions Architect

 

Amazon EC2 A1 instances deliver up to 45% cost savings for scale-out applications and are powered by AWS Graviton Processors that feature 64-bit Arm Neoverse cores and custom silicon designed by AWS. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK).

Production-ready Arm 64-bit Linux builds of Amazon Corretto for JDK8 and JDK 11 were released Sep 17, 2019. This provided an additional Java runtime option when deploying your scale-out Java applications on Amazon EC2 A1 instances. We’re fortunate to have James Gosling, the designer of Java, as a member of the Amazon team, and he recently took to Twitter to announce the General Availability (GA) of Amazon Corretto for the Arm architecture:

For those of you that like playing with Linux on ARM, the Corretto build for ARM64 is now GA.  Fully production ready. Both JDK8 and JDK11

If you’re interested to experiment with Amazon Corretto on Amazon EC2 A1 instances then read on for step-by-step instructions that will have you up and running in no time.

Launching an A1 EC2 instance

The first step is to create a running Amazon EC2 A1 instance. In this example, we demonstrate how to boot your instance using Amazon Linux 2. Starting from the AWS Console, you need to log-in to your AWS account or create a new account if you don’t already have one. Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) as follows:

Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) and click on Launch a virtual machine

Next, select the operating system and compute architecture of the EC2 instance you want to launch.  In this case, use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm):use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm)

On the next page, we select an A1 instance type. select an a1.xlarge that offers 4 x vCPU’s and 8GB of memory (refer to the Amazon EC2 A1 page for more information). Then, select the “Review and Launch” button:

select an A1 instance type - an a1.xlarge that offers 4 x vCPU’s and 8GB of memory

Next, you can review a summary of your instance details. This summary is shown in the following pictures. Note: the only network port exposed is SSH via TCP port 22. This allows you to remotely connect to the instance via an SSH terminal:

review a summary of your instance details

Before proceeding be aware you are about to start spending money (and don’t forget to terminate the instance at the end to avoid ongoing charges). As the warning in the screen shot above states: the A1 instance selected is not eligible for free tier.  So, you are charged based on the pricing of the instance selected (refer to the Amazon EC2 on-demand pricing page for details. The a1.xlarge instance selected is $0.102 per Hour as of this writing).

Once you’re ready to proceed, select “Launch” to continue. At this point you need to create or supply an existing key-pair for use when connecting to the new instance via SSH. Details to establish this connection can be found in the EC2 documentation.

In this example, I connect from a Windows laptop using PuTTY.  The details of converting EC2 keys into the right format can be found here. You can connect the same way. In the following screenshot, I use an existing key-pair that I generated. You can create an existing key-pair that best suits your workload and do the following:

Select an existing key pair or create one
While your instance launches, you can click on “View Instances” to see the status of instances within my AWS account:

click on “View Instances” to see the status of instances

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state:

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state

Now, you can connect to your new instance. Right click on the instance from within the console, then select “Connect” from the pop-up menu to get details and instructions on connecting to the instance. This is shown in the following screenshot:

select “Connect” from the pop-up menu to get details and instructions on connecting to the instance
The following screenshot provides you with instructions and specific details needed to connect to your running A1 instance:

Connect to your instance using an SSH Client
You can now connect to the running a1.xlarge instance through instructions to map your preferred SSH client.

Then, the Amazon Linux 2 command prompt pops up as follows:

Note: I run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm.

 run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm

Once you complete this step, your A1 instance is up and running. From here, you can leverage Corretto8.

 

Installing corretto8

You can now install Amazon Corretto 8 on Amazon Linux 2 following the instructions from the documentation.  Use option 1 to install the application from Amazon Linux 2 repository:

$ sudo amazon-linux-extras enable corretto8

$ sudo yum clean metadata

$ sudo yum install -y java-1.8.0-amazon-corretto

This code initiates the installation. Once complete, you can use the java version command to see that you have the newest version of Amazon Corretto.  The java command is as follows (your version may be more recent):

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment Corretto-8.232.09.1 (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM Corretto-8.232.09.1 (build 25.232-b09, mixed mode 

This command confirms that you have Amazon Corretto 8 version 8.232.09.1 installed and ready to go. If you see a version string that doesn’t mention Corretto, this means you have another version of Java already running. In this case, run the following command to change the default java providers:

$ sudo alternatives --config java

Installing tomcat8.5 and a simple JSP application

Once the latest Amazon Corretto is installed, confirm that the Java installation works. You can do this by installing and running a simple Java application.

To run this test, you need to install Apache Tomcat, which is a Java based application web server. Then, open up a public port to make it accessible and connect to it from a browser to confirm it’s running as expected.

Then, install tomcat8.5 from amazon-linux-extras using the following code:

$ sudo amazon-linux-extras enable tomcat8.5
$ sudo yum clean metadata
$ sudo yum install -y tomcat 

Now configure tomcat to use /dev/urandom as an entropy source. This is important to do because otherwise tomcat might hang on a freshly booted instance if there’s not enough entropy depth. Note: there’s a kernel patch in flight to provide an alternate entropy mechanism:

$ sudo bash -c 'echo JAVA_OPTS=\"-Djava.security.egd=file:/dev/urandom\" >> /etc/tomcat/tomcat.conf' 

Next, add a simple JavaServer Pages (JSP) application that will display details about your system.

First,  create default web application directory:

$ sudo install -d -o root -g tomcat /var/lib/tomcat/webapps/ROOT 

Then, add the small JSP application:

$ sudo bash -c 'cat <<EOF > /var/lib/tomcat/webapps/ROOT/index.jsp
<html>
<head>
<title>Corretto8 - Tomcat8.5 - Hello world</title>
</head>
<body>
  <table>
    <tr>
      <td>Operating System</td>
      <td><%= System.getProperty("os.name") %></td>
    </tr>
    <tr>
      <td>CPU Architecture</td>
      <td><%= System.getProperty("os.arch") %></td>
    </tr>
    <tr>
      <td>Java Vendor</td>
      <td><%= System.getProperty("java.vendor") %></td>
    </tr>
    <tr>
      <td>Java URL</td>
      <td><%= System.getProperty("java.vendor.url") %></td>
    </tr>
    <tr>
      <td>Java Version</td>
      <td><%= System.getProperty("java.version") %></td>
    </tr>
    <tr>
      <td>JVM Version</td>
      <td><%= System.getProperty("java.vm.version") %></td>
    </tr>
    <tr>
      <td>Tomcat Version</td>
      <td><%= application.getServerInfo() %></td>
    </tr>
</table>

</body>
</html>
EOF
'

Finally, start the Tomcat service:

$ sudo systemctl start tomcat 

Now the Tomcat service is running, you need to configure your EC2 instance to open TCP port 8080 (the default port that Tomcat listens on). This configuration allows you to access the instance from a browser and confirm Tomcat is running and serving content.

To do this, return to the AWS console and select your EC2 a1.xlarge instance. Then,  in the information panel below, select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080 as follows:

select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080

With these modifications you should now be able to connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080 as follows:

connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080
 Don’t forget to terminate your EC2 instance(s) when you’re done to avoid ongoing charges!

 

To summarize, we spun up an Amazon EC2 A1 instance, installed and enabled Amazon Corretto and Apache Tomcat server, configured the security group for the EC2 Instance to accept connections to TCP port 8080 and then created and connected to a simple default JSP web page. Being able to display the JSP page confirms  that you’re serving content, and can see the underlying Java Virtual Machine and platform architecture specifications. These steps demonstrate setting-up the Amazon Corretto + Apache Tomcat environment, and running a demo JSP web application on AWS Graviton based Amazon EC2 A1 instances using readily available open source software.

You can learn more at the Amazon Corretto website, and the downloads are all available here for Amazon Corretto 8Amazon Corretto 11 and if you’re using containers here’s the Docker Official image.  If you have any questions about your own workloads running on Amazon EC2 A1 instances, contact us at [email protected].

 

Forward Incoming Email to an External Destination

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/forward-incoming-email-to-an-external-destination/

Note: This post was written by Vesselin Tzvetkov, an AWS Senior Security Architect, and by Rostislav Markov, an AWS Senior Engagement Manager.


Amazon SES has included support for incoming email for several years now. However, some customers have told us that they need a solution for forwarding inbound emails to domains that aren’t managed by Amazon SES. In this post, you’ll learn how to use Amazon SES, Amazon S3, and AWS Lambda to create a solution that forwards incoming email to an email address that’s managed outside of Amazon SES.

Architecture

This solution uses several AWS services to forward incoming emails to a single, external email address. The following diagram shows the flow of information in this solution.

The following actions occur in this solution:

  1. A new email is sent from an external sender to your domain. Amazon SES handles the incoming email for your domain.
  2. An Amazon SES receipt rule saves the incoming message in an S3 bucket.
  3. An Amazon SES receipt rule triggers the execution of a Lambda function.
  4. The Lambda function retrieves the message content from S3, and then creates a new message and sends it to Amazon SES.
  5. Amazon SES sends the message to the destination server.

Limitations

This solution works in all AWS Regions where Amazon SES is currently available. For a complete list of supported Regions, see AWS Service Endpoints in the AWS General Reference.

Prerequisites

In order to complete this procedure, you need to have a domain that receives incoming email. If you don’t already have a domain, you can purchase one through Amazon Route 53. For more information, see Registering a New Domain in the Amazon Route 53 Developer Guide. You can also purchase a domain through one of several third-party vendors.

Procedures

Step 1: Set up Your Domain

  1. In Amazon SES, verify the domain that you want to use to receive incoming email. For more information, see Verifying Domains in the Amazon SES Developer Guide.
  2. Add the following MX record to the DNS configuration for your domain:
    10 inbound-smtp.<regionInboundUrl>.amazonaws.com

    Replace <regionInboundUrl> with the URL of the email receiving endpoint for the AWS Region that you use Amazon SES in. For a complete list of URLs, see AWS Service Endpoints – Amazon SES in the AWS General Reference.

  3. If your account is still in the Amazon SES sandbox, submit a request to have it removed. For more information, see Moving Out of the Sandbox in the Amazon SES Developer Guide.

Step 2: Configure Your S3 Bucket

  1. In Amazon S3, create a new bucket. For more information, see Create a Bucket in the Amazon S3 Getting Started Guide.
  2. Apply the following policy to the bucket:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowSESPuts",
                "Effect": "Allow",
                "Principal": {
                    "Service": "ses.amazonaws.com"
                },
                "Action": "s3:PutObject",
                "Resource": "arn:aws:s3:::<bucketName>/*",
                "Condition": {
                    "StringEquals": {
                        "aws:Referer": "<awsAccountId>"
                    }
                }
            }
        ]
    }

  3. In the policy, make the following changes:
    • Replace <bucketName> with the name of your S3 bucket.
    • Replace <awsAccountId> with your AWS account ID.

    For more information, see Using Bucket Policies and User Policies in the Amazon S3 Developer Guide.

Step 3: Create an IAM Policy and Role

  1. Create a new IAM Policy with the following permissions:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup",
                    "logs:PutLogEvents"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "ses:SendRawEmail"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucketName>/*",
                    "arn:aws:ses:<region>:<awsAccountId>:identity/*"
                ]
            }
        ]
    }

    In the preceding policy, make the following changes:

    • Replace <bucketName> with the name of the S3 bucket that you created earlier.
    • Replace <region> with the name of the AWS Region that you created the bucket in.
    • Replace <awsAccountId> with your AWS account ID. For more information, see Create a Customer Managed Policy in the IAM User Guide.
  2. Create a new IAM role. Attach the policy that you just created to the new role. For more information, see Creating Roles in the IAM User Guide.

Step 4: Create the Lambda Function

  1. In the Lambda console, create a new Python 3.7 function from scratch. For the execution role, choose the IAM role that you created earlier.
  2. In the code editor, paste the following code.
    # Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
    #
    # This file is licensed under the Apache License, Version 2.0 (the "License").
    # You may not use this file except in compliance with the License. A copy of the
    # License is located at
    #
    # http://aws.amazon.com/apache2.0/
    #
    # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
    # OF ANY KIND, either express or implied. See the License for the specific
    # language governing permissions and limitations under the License.
    
    import os
    import boto3
    import email
    import re
    from botocore.exceptions import ClientError
    from email.mime.multipart import MIMEMultipart
    from email.mime.text import MIMEText
    from email.mime.application import MIMEApplication
    
    region = os.environ['Region']
    
    def get_message_from_s3(message_id):
    
        incoming_email_bucket = os.environ['MailS3Bucket']
        incoming_email_prefix = os.environ['MailS3Prefix']
    
        if incoming_email_prefix:
            object_path = (incoming_email_prefix + "/" + message_id)
        else:
            object_path = message_id
    
        object_http_path = (f"http://s3.console.aws.amazon.com/s3/object/{incoming_email_bucket}/{object_path}?region={region}")
    
        # Create a new S3 client.
        client_s3 = boto3.client("s3")
    
        # Get the email object from the S3 bucket.
        object_s3 = client_s3.get_object(Bucket=incoming_email_bucket,
            Key=object_path)
        # Read the content of the message.
        file = object_s3['Body'].read()
    
        file_dict = {
            "file": file,
            "path": object_http_path
        }
    
        return file_dict
    
    def create_message(file_dict):
    
        sender = os.environ['MailSender']
        recipient = os.environ['MailRecipient']
    
        separator = ";"
    
        # Parse the email body.
        mailobject = email.message_from_string(file_dict['file'].decode('utf-8'))
    
        # Create a new subject line.
        subject_original = mailobject['Subject']
        subject = "FW: " + subject_original
    
        # The body text of the email.
        body_text = ("The attached message was received from "
                  + separator.join(mailobject.get_all('From'))
                  + ". This message is archived at " + file_dict['path'])
    
        # The file name to use for the attached message. Uses regex to remove all
        # non-alphanumeric characters, and appends a file extension.
        filename = re.sub('[^0-9a-zA-Z]+', '_', subject_original) + ".eml"
    
        # Create a MIME container.
        msg = MIMEMultipart()
        # Create a MIME text part.
        text_part = MIMEText(body_text, _subtype="html")
        # Attach the text part to the MIME message.
        msg.attach(text_part)
    
        # Add subject, from and to lines.
        msg['Subject'] = subject
        msg['From'] = sender
        msg['To'] = recipient
    
        # Create a new MIME object.
        att = MIMEApplication(file_dict["file"], filename)
        att.add_header("Content-Disposition", 'attachment', filename=filename)
    
        # Attach the file object to the message.
        msg.attach(att)
    
        message = {
            "Source": sender,
            "Destinations": recipient,
            "Data": msg.as_string()
        }
    
        return message
    
    def send_email(message):
        aws_region = os.environ['Region']
    
    # Create a new SES client.
        client_ses = boto3.client('ses', region)
    
        # Send the email.
        try:
            #Provide the contents of the email.
            response = client_ses.send_raw_email(
                Source=message['Source'],
                Destinations=[
                    message['Destinations']
                ],
                RawMessage={
                    'Data':message['Data']
                }
            )
    
        # Display an error if something goes wrong.
        except ClientError as e:
            output = e.response['Error']['Message']
        else:
            output = "Email sent! Message ID: " + response['MessageId']
    
        return output
    
    def lambda_handler(event, context):
        # Get the unique ID of the message. This corresponds to the name of the file
        # in S3.
        message_id = event['Records'][0]['ses']['mail']['messageId']
        print(f"Received message ID {message_id}")
    
        # Retrieve the file from the S3 bucket.
        file_dict = get_message_from_s3(message_id)
    
        # Create the message.
        message = create_message(file_dict)
    
        # Send the email and print the result.
        result = send_email(message)
        print(result)
  3. Create the following environment variables for the Lambda function:

    KeyValue
    MailS3BucketThe name of the S3 bucket that you created earlier.
    MailS3PrefixThe path of the folder in the S3 bucket where you will store incoming email.
    MailSenderThe email address that the forwarded message will be sent from. This address has to be verified.
    MailRecipientThe address that you want to forward the message to.
    RegionThe name of the AWS Region that you want to use to send the email.
  4. Under Basic settings, set the Timeout value to 30 seconds.

(Optional) Step 5: Create an Amazon SNS Topic

You can optionally create an Amazon SNS topic. This step is helpful for troubleshooting purposes, or if you just want to receive additional notifications when you receive a message.

  1. Create a new Amazon SNS topic. For more information, see Creating a Topic in the Amazon SNS Developer Guide.
  2. Subscribe an endpoint, such as an email address, to the topic. For more information, see Subscribing an Endpoint to a Topic in the Amazon SNS Developer Guide.

Step 6: Create a Receipt Rule Set

  1. In the Amazon SES console, create a new Receipt Rule Set. For more information, see Creating a Receipt Rule Set in the Amazon SES Developer Guide.
  2. In the Receipt Rule Set that you just created, add a Receipt Rule. In the Receipt Rule, add an S3 Action. Set up the S3 Action to send your email to the S3 bucket that you created earlier.
  3. Add a Lambda action to the Receipt Rule. Configure the Receipt Rule to invoke the Lambda function that you created earlier.

For more information, see Setting Up a Receipt Rule in the Amazon SES Developer Guide.

Step 7: Test the Function

  • Send an email to an address that corresponds with an address in the Receipt Rule you created earlier. Make sure that the email arrives in the correct S3 bucket. In a minute or two, the email arrives in the inbox that you specified in the MailRecipient variable of the Lambda function.

Troubleshooting

If you send a test message, but it is never forwarded to your destination email address, do the following:

  • Make sure that the Amazon SES Receipt Rule is active.
  • Make sure that the email address that you specified in the MailRecipient variable of the Lambda function is correct.
  • Subscribe an email address or phone number to the SNS topic. Send another test email to your domain. Make sure that SNS sends a Received notification to your subscribed email address or phone number.
  • Check the CloudWatch Log for your Lambda function to see if any errors occurred.

If you send a test email to your receiving domain, but you receive a bounce notification, do the following:

  • Make sure that the verification process for your domain completed successfully.
  • Make sure that the MX record for your domain specifies the correct Amazon SES receiving endpoint.
  • Make sure that you’re sending to an address that is handled by the receipt rule.

Costs of using this solution

The cost of implementing this solution is minimal. If you receive 10,000 emails per month, and each email is 2KB in size, you pay $1.00 for your use of Amazon SES. For more information, see Amazon SES Pricing.

You also pay a small charge to store incoming emails in Amazon S3. The charge for storing 1,000 emails that are each 2KB in size is less than one cent. Your use of Amazon S3 might qualify for the AWS Free Usage Tier. For more information, see Amazon S3 Pricing.

Finally, you pay for your use of AWS Lambda. With Lambda, you pay for the number of requests you make, for the amount of compute time that you use, and for the amount of memory that you use. If you use Lambda to forward 1,000 emails that are each 2KB in size, you pay no more than a few cents. Your use of AWS Lambda might qualify for the AWS Free Usage Tier. For more information, see AWS Lambda Pricing.

Note: These cost estimates don’t include the costs associated with purchasing a domain, since many users already have their own domains. The cost of obtaining a domain is the most expensive part of implementing this solution.

Conclusion

This solution makes it possible to forward incoming email from one of your Amazon SES verified domains to an email address that isn’t necessarily verified. It’s also useful if you have multiple AWS accounts, and you want incoming messages to be sent from each of those accounts to a single destination. We hope you’ve found this tutorial to be helpful!

Learn about AWS Services & Solutions – October AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-october-aws-online-tech-talks/

Learn about AWS Services & Solutions – October AWS Online Tech Talks

AWS Tech Talks

Join us this October to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

AR/VR: 

October 30, 2019 | 9:00 AM – 10:00 AM PTUsing Physics in Your 3D Applications with Amazon Sumerian – Learn how to simulate real-world environments in 3D using Amazon Sumerian’s new robust physics system.

Compute: 

October 24, 2019 | 9:00 AM – 10:00 AM PTComputational Fluid Dynamics on AWS – Learn best practices to run Computational Fluid Dynamics (CFD) workloads on AWS.

October 28, 2019 | 1:00 PM – 2:00 PM PTMonitoring Your .NET and SQL Server Applications on Amazon EC2 – Learn how to manage your application logs through AWS services to improve performance and resolve issues for your .Net and SQL Server applications.

October 31, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Costs with AWS Compute Pricing Options – Learn which pricing models work best for your workloads and how to combine different purchase options to optimize cost, scale, and performance.

Data Lakes & Analytics: 

October 23, 2019 | 9:00 AM – 10:00 AM PTPractical Tips for Migrating Your IBM Netezza Data Warehouse to the Cloud – Learn how to migrate your IBM Netezza Data Warehouse to the cloud to save costs and improve performance.

October 31, 2019 | 11:00 AM – 12:00 PM PTAlert on Your Log Data with Amazon Elasticsearch Service – Learn how to receive alerts on your data to monitor your application and infrastructure using Amazon Elasticsearch Service.

Databases:

October 22, 2019 | 1:00 PM – 2:00 PM PTHow to Build Highly Scalable Serverless Applications with Amazon Aurora Serverless – Get an overview of Amazon Aurora Serverless, an on-demand, auto-scaling configuration for Amazon Aurora, and learn how you can use it to build serverless applications.

DevOps:

October 21, 2019 | 11:00 AM – 12:00 PM PTMigrate Your Ruby on Rails App to AWS Fargate in One Step Using AWS Rails Provisioner – Learn how to define and deploy containerized Ruby on Rails Applications on AWS with a few commands.

End-User Computing: 

October 24, 2019 | 11:00 AM – 12:00 PM PTWhy Software Vendors Are Choosing Application Streaming Instead of Rewriting Their Desktop Apps – Walk through common customer use cases of how Amazon AppStream 2.0 lets software vendors deliver instant demos, trials, and training of desktop applications.

October 29, 2019 | 11:00 AM – 12:00 PM PTMove Your Desktops and Apps to AWS End-User Computing – Get an overview of AWS End-User Computing services and then dive deep into best practices for implementation.

Enterprise & Hybrid: 

October 29, 2019 | 1:00 PM – 2:00 PM PT – Leverage Compute Pricing Models and Rightsizing to Maximize Savings on AWS – Get tips on building a cost-management strategy, incorporating pricing models and resource rightsizing.

IoT:

October 30, 2019 | 1:00 PM – 2:00 PM PTConnected Devices at Scale: A Deep Dive into the AWS Smart Product Solution – Learn how to jump-start the development of innovative connected products with the new AWS Smart Product Solution.

Machine Learning:

October 23, 2019 | 1:00 PM – 2:00 PM PTAnalyzing Text with Amazon Elasticsearch Service and Amazon Comprehend – Learn how to deploy a cost-effective, end-to-end solution for extracting meaningful insights from unstructured text data like customer calls, support tickets, or online customer feedback.

October 28, 2019 | 11:00 AM – 12:00 PM PTAI-Powered Health Data Masking – Learn how to use the AI-Power Health Data Masking solution for use cases like clinical decision support, revenue cycle management, and clinical trial management.

Migration:

October 22, 2019 | 11:00 AM – 12:00 PM PTDeep Dive: How to Rapidly Migrate Your Data Online with AWS DataSync – Learn how AWS DataSync makes it easy to rapidly move large datasets into Amazon S3 and Amazon EFS for your applications.

Mobile:

October 21, 2019 | 1:00 PM – 2:00 PM PT – Mocking and Testing Serverless APIs with AWS Amplify – Learn how to mock and test GraphQL APIs in your local environment with AWS Amplify.

Robotics:

October 22, 2019 | 9:00 AM – 10:00 AM PTThe Future of Smart Robots Has Arrived – Learn how to and why you should build smarter robots with AWS.

Security, Identity and Compliance: 

October 29, 2019 | 9:00 AM – 10:00 AM PT – Using AWS Firewall Manager to Simplify Firewall Management Across Your Organization – Learn how AWS Firewall Manager simplifies rule management across your organization.

Serverless:

October 21, 2019 | 9:00 AM – 10:00 AM PTAdvanced Serverless Orchestration with AWS Step Functions – Go beyond the basics and explore the best practices of Step Functions, including development and deployment of workflows and how you can track the work being done.

October 30, 2019 | 11:00 AM – 12:00 PM PTManaging Serverless Applications with SAM Templates – Learn how to reduce code and increase efficiency by managing your serverless apps with AWS Serverless Application Model (SAM) templates.

Storage:

October 23, 2019 | 11:00 AM – 12:00 PM PTReduce File Storage TCO with Amazon EFS and Amazon FSx for Windows File Server – Learn how to optimize file storage costs with AWS storage solutions.

Integrating an Inferencing Pipeline with NVIDIA DeepStream and the G4 Instance Family

Post Syndicated from Geoff Murase original https://aws.amazon.com/blogs/compute/integrating-an-inferencing-pipeline-with-nvidia-deepstream-and-the-g4-instance-family/

Contributed by: Amr Ragab, Business Development Manager, Accelerated Computing, AWS and Kong Zhao, Solution Architect, NVIDIA Corporation

AWS continually evolves GPU offerings, striving to showcase how new technical improvements created by AWS partners improve the platform’s performance.

One result from AWS’s collaboration with NVIDIA is the recent release of the G4 instance type, a technology update from the G2 and G3. The G4 features a Turing T4 GPU with 16GB of GPU memory, offered under the Nitro hypervisor with one GPU to 4 GPUS per node. A bare metal option will be released in the coming months. It also includes up to 1.8 TB of local non-volatile memory express (NVMe) storage and up to 100 Gbps of network bandwidth.

The Turing T4 is the latest offering from NVIDIA, accelerating machine learning (ML) training and inferencing, video transcoding, and other compute-intensive workloads. With such a diverse array of optimized directives, you can now perform diverse accelerated compute workloads on a single instance family.

NVIDIA has also taken the lead in providing a robust and performant software layer in the form of SDKs and container solutions through the NVIDIA GPU Cloud (NGC) container registry. These accelerated components, combined with AWS elasticity and scale, provide a powerful combination for performant pipelines on AWS.

NVIDIA DeepStream SDK

This post focuses on one such NVIDIA SDK: DeepStream.

The DeepStream SDK is built to provide an end-to-end video processing and ML inferencing analytics solution. It uses the Video Codec API and TensorRT as key components.

DeepStream also supports an edge-cloud strategy to stream perception on the edge and other sensor metadata into AWS for further processing. An example includes wide-area consumption of multiple camera streams and metadata through the Amazon Kinesis platform.

Another classic workload that can take advantage of DeepStream is compiling the model artifacts resulting from distributed training in AWS with Amazon SageMaker Neo. Use this model on the edge or on an Amazon S3 video data lake.

If you are interested in exploring these solutions, contact your AWS account team.

Deployment

Set up programmatic access to AWS to instantiate a g4dn.2xlarge instance type with Ubuntu 18.04 in a subnet that routes SSH access. If you are interested in the full stack details, the following are required to set up the instance to execute DeepStream SDK workflows.

  • An Ubuntu 18.04 Instance with:
    • NVIDIA Turing T4 Driver (418.67 or latest)
    • CUDA 10.1
    • nvidia-docker2

Alternatively, you can launch the NVIDIA Deep Learning AMI available in AWS Marketplace, which includes the latest drivers and SDKs.

aws ec2 run-instances --region us-east-1 --image-id ami-026c8acd92718196b --instance-type g4dn.2xlarge --key-name <key-name> —subnet-id <subnet> --security-group-ids {<security-groupids>} —block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=75}'

When the instance is up, connect with SSH and pull the latest DeepStream SDK Docker image from the NGC container registry.

docker pull nvcr.io/nvidia/deepstream:4.0-19.07

nvidia-docker run -it --rm -v /usr/lib/x86_64-linux-gnu/libnvidia-encode.so:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so -v /tmp/.X11-unix:/tmp/.X11-unix -p 8554:8554 -e DISPLAY=$DISPLAY nvcr.io/nvidia/deepstream: 4.0-19.07

If your instance is running a full X environment, you can pass the authentication and display to the container to view the results in real time. However, for the purposes of this post, just execute the workload on the shell.

Go to the /root/deepstream_sdk_v4.0_x86_64/samples/configs/deepstream-app/ folder.

The following configuration files are included in the package:

  • source30_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates 30 stream decodes with primary inferencing.
  • source4_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers.
  • source4_1080p_resnet_dec_infer_tracker_sgie_tiled_display_int8_gpu1.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers on GPU 1.
  • config_infer_primary.txt: This configuration file configures an nvinfer element as the primary detector.
  • config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt, config_infer_secondary_vehicletypes.txt: These configuration files configure an nvinfer element as the secondary classifier.
  • iou_config.txt: This configuration file configures a low-level Intersection over Union (IOU) tracker.
  • source1_usb_dec_infer_resnet_int8.txt: This configuration file demonstrates one USB camera as input.

The following sample models are provided with the SDK.

ModelModel typeNumber of classesResolution
Primary DetectorResnet104640 x 368
Secondary Car Color ClassifierResnet1812224 x 224
Secondary Car Make ClassifierResnet186224 x 224
Secondary Vehicle Type ClassifierResnet1820224 x 224

Edit the configuration file source30_1080p_dec_infer-resnet_tiled_display_int8.txt to disable [sink0] and enable [sink1] for file output. Save the file, then run the DeepStream sample code.

[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=2
 
[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=2000000
output-file=out.mp4
source-id=0
 
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

You get performance data on the inferencing workflow.

 (deepstream-app:1059): GLib-GObject-WARNING **: 20:38:25.991: g_object_set_is_valid_property: object class 'nvv4l2h264enc' has no property named 'bufapi-version'
Creating LL OSD context new
 
Runtime commands:
        h: Print this help
        q: Quit
 
        p: Pause
        r: Resume
 
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.
 
** INFO: <bus_callback:163>: Pipeline ready
 
Creating LL OSD context new
** INFO: <bus_callback:149>: Pipeline running
 
 
**PERF: FPS 0 (Avg)     FPS 1 (Avg)     FPS 2 (Avg)     FPS 3 (Avg)     FPS 4 (Avg)     FPS 5 (Avg)     FPS 6 (Avg)     FPS 7 (Avg)     FPS 8 (Avg)     FPS 9 (Avg)     FPS 10 (Avg)   FPS 11 (Avg)     FPS 12 (Avg)    FPS 13 (Avg)    FPS 14 (Avg)    FPS 15 (Avg)    FPS 16 (Avg)    FPS 17 (Avg)    FPS 18 (Avg)    FPS 19 (Avg)    FPS 20 (Avg)    FPS 21 (Avg)    FPS 22 (Avg)    FPS 23 (Avg)    FPS 24 (Avg)    FPS 25 (Avg)    FPS 26 (Avg)    FPS 27 (Avg)    FPS 28 (Avg)    FPS 29 (Avg)
**PERF: 35.02 (35.02)   37.92 (37.92)   37.93 (37.93)   35.85 (35.85)   36.39 (36.39)   38.40 (38.40)   35.85 (35.85)   35.18 (35.18)   35.60 (35.60)   35.02 (35.02)   38.77 (38.77)  37.71 (37.71)    35.18 (35.18)   38.60 (38.60)   38.40 (38.40)   38.60 (38.60)   34.77 (34.77)   37.70 (37.70)   35.97 (35.97)   37.00 (37.00)   35.51 (35.51)   38.40 (38.40)   38.60 (38.60)   38.40 (38.40)   38.13 (38.13)   37.70 (37.70)   35.85 (35.85)   35.97 (35.97)   37.92 (37.92)   37.92 (37.92)
**PERF: 39.10 (37.76)   38.90 (38.60)   38.70 (38.47)   38.70 (37.78)   38.90 (38.10)   38.90 (38.75)   39.10 (38.05)   38.70 (37.55)   39.10 (37.96)   39.10 (37.76)   39.10 (39.00)  39.10 (38.68)    39.10 (37.83)   39.10 (38.95)   39.10 (38.89)   39.10 (38.95)   38.90 (37.55)   38.70 (38.39)   38.90 (37.96)   38.50 (38.03)   39.10 (37.98)   38.90 (38.75)   38.30 (38.39)   38.70 (38.61)   38.90 (38.67)   39.10 (38.66)   39.10 (38.05)   39.10 (38.10)   39.10 (38.74)   38.90 (38.60)
**PERF: 38.91 (38.22)   38.71 (38.65)   39.31 (38.82)   39.31 (38.40)   39.11 (38.51)   38.91 (38.82)   39.31 (38.56)   39.31 (38.26)   39.11 (38.42)   38.51 (38.06)   38.51 (38.80)  39.31 (38.94)    39.31 (38.42)   39.11 (39.02)   37.71 (38.41)   39.31 (39.10)   39.31 (38.26)   39.31 (38.77)   39.31 (38.51)   39.31 (38.55)   39.11 (38.44)   39.31 (38.98)   39.11 (38.69)   39.31 (38.90)   39.11 (38.85)   39.31 (38.93)   39.31 (38.56)   39.31 (38.59)   39.31 (38.97)   39.31 (38.89)
**PERF: 37.56 (38.03)   38.15 (38.50)   38.35 (38.68)   38.35 (38.38)   37.76 (38.29)   38.15 (38.62)   38.35 (38.50)   37.56 (38.06)   38.15 (38.35)   37.76 (37.97)   37.96 (38.55)  38.35 (38.77)    38.35 (38.40)   37.56 (38.59)   38.35 (38.39)   37.96 (38.77)   36.96 (37.88)   38.35 (38.65)   38.15 (38.41)   38.35 (38.49)   38.35 (38.41)   38.35 (38.80)   37.96 (38.47)   37.96 (38.62)   37.56 (38.47)   37.56 (38.53)   38.15 (38.44)   38.35 (38.52)   38.35 (38.79)   38.35 (38.73)
**PERF: 40.71 (38.63)   40.31 (38.91)   40.51 (39.09)   40.90 (38.95)   39.91 (38.65)   40.90 (39.14)   40.90 (39.04)   40.51 (38.60)   40.71 (38.87)   40.51 (38.54)   40.71 (39.04)  40.90 (39.25)    40.71 (38.92)   40.90 (39.11)   40.90 (38.96)   40.90 (39.25)   40.90 (38.56)   40.90 (39.15)   40.11 (38.79)   40.90 (39.03)   40.90 (38.97)   40.90 (39.27)   40.90 (39.02)   40.90 (39.14)   39.51 (38.71)   40.90 (39.06)   40.51 (38.90)   40.71 (39.01)   40.90 (39.27)   40.90 (39.22)
**PERF: 39.46 (38.78)   39.26 (38.97)   39.46 (39.16)   39.26 (39.00)   39.26 (38.76)   39.26 (39.16)   39.06 (39.04)   39.46 (38.76)   39.46 (38.98)   39.26 (38.67)   39.46 (39.12)  39.46 (39.29)    39.26 (38.98)   39.26 (39.14)   39.26 (39.01)   38.65 (39.14)   38.45 (38.54)   39.46 (39.21)   39.46 (38.91)   39.46 (39.11)   39.26 (39.03)   39.26 (39.27)   39.46 (39.10)   39.26 (39.16)   39.26 (38.81)   39.26 (39.10)   39.06 (38.93)   39.46 (39.09)   39.06 (39.23)   39.26 (39.23)
**PERF: 39.04 (38.82)   38.84 (38.95)   38.84 (39.11)   38.84 (38.98)   38.84 (38.77)   38.64 (39.08)   39.04 (39.04)   39.04 (38.80)   39.04 (38.99)   39.04 (38.73)   38.64 (39.04)  39.04 (39.25)    38.44 (38.90)   39.04 (39.13)   38.84 (38.99)   38.44 (39.03)   39.04 (38.62)   39.04 (39.18)   38.84 (38.90)   38.84 (39.07)   37.84 (38.84)   39.04 (39.24)   39.04 (39.09)   39.04 (39.14)   38.64 (38.78)   38.64 (39.03)   39.04 (38.95)   38.84 (39.05)   38.64 (39.14)   38.24 (39.08)
** INFO: <bus_callback:186>: Received EOS. Exiting ...
 
Quitting
App run successful

The output video file, out.mp4, is under the current folder and can be played after download.

Extending the architecture further, you can make use of AWS Batch to execute an event-driven pipeline.

Here, the input file from S3 triggers an Amazon CloudWatch event, standing up a G4 instance with a DeepStream Docker image, sourced in Amazon ECR, to process the pipeline. The video and ML analytics results can be pushed back to S3 for further processing.

Conclusion

With this basic architecture in place, you can execute a video analytics and ML inferencing pipeline. Future work can also include integration with Kinesis and cataloging DeepStream results. Let us know how it goes working with DeepStream and the NVIDIA container stack on AWS.

 

Sending Push Notifications to iOS 13 and watchOS 6 Devices

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/sending-push-notifications-to-ios-13-and-watchos-6-devices/

Last week, we made some changes to the way that Amazon Pinpoint sends Apple Push Notification service (APNs) push notifications.

In June, Apple announced that push notifications sent to iOS 13 and watchOS devices would require the new apns-push-type header. APNs uses this header to determine whether a notification should be shown on the display of the recipient’s device, or if it should be sent to the background.

When you use Amazon Pinpoint to send an APNs message, you can choose whether you want to send the message as a standard message, or as a silent notification. Amazon Pinpoint uses your selection to determine which value to apply to the apns-push-type header: if you send the message as a standard message, Amazon Pinpoint automatically sets the value of the apns-push-type header to alert; if you send the message as a silent notification, it sets the apns-push-type header to silent. Amazon Pinpoint applies these settings automatically—you don’t have to do any additional work in order to send messages to recipients with iOS 13 and watchOS 6 devices.

One last thing to keep in mind: if you specify the raw content of an APNs push notification, the message payload has to include the content-available key. The value of the content-available key has to be an integer, and can only be 0 or 1. If you’re sending an alert, set the value of content-available to 0. If you’re sending a background notification, set the value of content-available to 1. Additionally, background notification payloads can’t include the alert, badge, or sound keys.

To learn more about sending APNs notifications, see Generating a Remote Notification and Pushing Background Updates to Your App on the Apple Developer website.