Tag Archives: mapping

EC2 Instance Update – M5 Instances with Local NVMe Storage (M5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-m5-instances-with-local-nvme-storage-m5d/

Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!

Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:

Instance Name vCPUs RAM Local Storage EBS-Optimized Bandwidth Network Bandwidth
m5d.large 2 8 GiB 1 x 75 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.xlarge 4 16 GiB 1 x 150 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.2xlarge 8 32 GiB 1 x 300 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.4xlarge 16 64 GiB 1 x 600 GB NVMe SSD 2.210 Gbps Up to 10 Gbps
m5d.12xlarge 48 192 GiB 2 x 900 GB NVMe SSD 5.0 Gbps 10 Gbps
m5d.24xlarge 96 384 GiB 4 x 900 GB NVMe SSD 10.0 Gbps 25 Gbps

The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.

Jeff;

 

AWS Resources Addressing Argentina’s Personal Data Protection Law and Disposition No. 11/2006

Post Syndicated from Leandro Bennaton original https://aws.amazon.com/blogs/security/aws-and-resources-addressing-argentinas-personal-data-protection-law-and-disposition-no-112006/

We have two new resources to help customers address their data protection requirements in Argentina. These resources specifically address the needs outlined under the Personal Data Protection Law No. 25.326, as supplemented by Regulatory Decree No. 1558/2001 (“PDPL”), including Disposition No. 11/2006. For context, the PDPL is an Argentine federal law that applies to the protection of personal data, including during transfer and processing.

A new webpage focused on data privacy in Argentina features FAQs, helpful links, and whitepapers that provide an overview of PDPL considerations, as well as our security assurance frameworks and international certifications, including ISO 27001, ISO 27017, and ISO 27018. You’ll also find details about our Information Request Report and the high bar of security at AWS data centers.

Additionally, we’ve released a new workbook that offers a detailed mapping as to how customers can operate securely under the Shared Responsibility Model while also aligning with Disposition No. 11/2006. The AWS Disposition 11/2006 Workbook can be downloaded from the Argentina Data Privacy page or directly from this link. Both resources are also available in Spanish from the Privacidad de los datos en Argentina page.

Want more AWS Security news? Follow us on Twitter.

 

EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

Instance Name vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth
c5d.large 2 4 GiB 1 x 50 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.xlarge 4 8 GiB 1 x 100 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.2xlarge 8 16 GiB 1 x 225 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.4xlarge 16 32 GiB 1 x 450 GB NVMe SSD 2.25 Gbps Up to 10 Gbps
c5d.9xlarge 36 72 GiB 1 x 900 GB NVMe SSD 4.5 Gbps 10 Gbps
c5d.18xlarge 72 144 GiB 2 x 900 GB NVMe SSD 9 Gbps 25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

[$] A mapping layer for filesystems

Post Syndicated from jake original https://lwn.net/Articles/753650/rss

In a plenary session on the second day of the Linux Storage, Filesystem,
and Memory-Management Summit (LSFMM), Dave Chinner described his ideas for
a virtual block address-space layer. It would allow “space accounting to be
shared and managed at various layers in the storage stack”. One of the
targets for this work is for filesystems on thin-provisioned devices, where
the filesystem
is larger than the storage devices holding it (and administrators are
expected to add storage as needed); in current systems, running out of
space causes huge problems for filesystems and users because the filesystem
cannot communicate that error in a usable fashion.

[$] Shared memory mappings for devices

Post Syndicated from jake original https://lwn.net/Articles/753481/rss

In a short filesystem-only discussion at the 2018 Linux Storage,
Filesystem, and Memory-Management Summit (LSFMM), Jérôme Glisse wanted to
talk about some (more) changes to support GPUs, FPGAs, and RDMA devices.
In other talks at LSFMM, he discussed
changes to struct page
in support of these kinds of devices, but here he was looking to discuss
other changes
to support mapping a device’s memory into multiple processes. It should be
noted that I had a hard time following the discussion in this session, so
there may be significant gaps in the article.

[$] Repurposing page->mapping

Post Syndicated from corbet original https://lwn.net/Articles/752564/rss

The page structure is one of the
most complex in the kernel due to the need to cram the maximum amount of
information into as little space as possible. Each field is so heavily
overloaded that developers prefer to avoid making changes to struct
page
if they can avoid
it. That didn’t deter Jérôme Glisse from proposing a significant change
during two plenary sessions at
the 2018 Linux Storage, Filesystem, and Memory-Management Summit, though.
There are some interesting benefits on offer, but getting there will not be
a simple task.

Grafana v5.1 Released

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/26/grafana-v5.1-released/

v5.1 Stable Release

The recent 5.0 major release contained a lot of new features so the Grafana 5.1 release is focused on smoothing out the rough edges and iterating over some of the new features.

Download Grafana 5.1 Now

Release Highlights

There are two new features included, Heatmap Support for Prometheus and a new core data source for Microsoft SQL Server.

Another highlight is the revamp of the Grafana docker container that makes it easier to run and control but be aware there is a breaking change to file permissions that will affect existing containers with data volumes.

We got tons of useful improvement suggestions, bug reports and Pull Requests from our amazing community. Thank you all! See the full changelog for more details.

Improved Scrolling Experience

In Grafana v5.0 we introduced a new scrollbar component. Unfortunately this introduced a lot of issues and in some scenarios removed
the native scrolling functionality. Grafana v5.1 ships with a native scrollbar for all pages together with a scrollbar component for
the dashboard grid and panels that does not override the native scrolling functionality. We hope that these changes and improvements should
make the Grafana user experience much better!

Improved Docker Image

Grafana v5.1 brings an improved official docker image which should make it easier to run and use the Grafana docker image and at the same time give more control to the user how to use/run it.

We have switched the id of the grafana user running Grafana inside a docker container. Unfortunately this means that files created prior to 5.1 will not have the correct permissions for later versions and thereby introduces a breaking change. We made this change so that it would be easier for you to control what user Grafana is executed as.

Please read the updated documentation which includes migration instructions and more information.

Heatmap Support for Prometheus

The Prometheus datasource now supports transforming Prometheus histograms to the heatmap panel. The Prometheus histogram is a powerful feature, and we’re
really happy to finally allow our users to render those as heatmaps. The Heatmap panel documentation
contains more information on how to use it.

Another improvement is that the Prometheus query editor now supports autocomplete for template variables. More information in the Prometheus data source documentation.

Microsoft SQL Server

Grafana v5.1 now ships with a built-in Microsoft SQL Server (MSSQL) data source plugin that allows you to query and visualize data from any
Microsoft SQL Server 2005 or newer, including Microsoft Azure SQL Database. Do you have metric or log data in MSSQL? You can now visualize
that data and define alert rules on it as with any of Grafana’s other core datasources.

The using Microsoft SQL Server in Grafana documentation has more detailed information on how to get started.

Adding New Panels to Dashboards

The control for adding new panels to dashboards now includes panel search and it is also now possible to copy and paste panels between dashboards.

By copying a panel in a dashboard it will be displayed in the Paste tab. When you switch to a new dashboard you can paste the
copied panel.

Align Zero-Line for Right and Left Y-axes

The feature request to align the zero-line for right and left Y-axes on the Graph panel is more than 3 years old. It has finally been implemented – more information in the Graph panel documentation.

Other Highlights

  • Table Panel: New enhancements includes support for mapping a numeric value/range to text and additional units. More information in the Table panel documentation.
  • New variable interpolation syntax: We now support a new option for rendering variables that gives the user full control of how the value(s) should be rendered. More details in the in the Variables documentation.
  • Improved workflow for provisioned dashboards. More details here.

Changelog

Checkout the CHANGELOG.md file for a complete list
of new features, changes, and bug fixes.

Now You Can Create Encrypted Amazon EBS Volumes by Using Your Custom Encryption Keys When You Launch an Amazon EC2 Instance

Post Syndicated from Nishit Nagar original https://aws.amazon.com/blogs/security/create-encrypted-amazon-ebs-volumes-custom-encryption-keys-launch-amazon-ec2-instance-2/

Amazon Elastic Block Store (EBS) offers an encryption solution for your Amazon EBS volumes so you don’t have to build, maintain, and secure your own infrastructure for managing encryption keys for block storage. Amazon EBS encryption uses AWS Key Management Service (AWS KMS) customer master keys (CMKs) when creating encrypted Amazon EBS volumes, providing you all the benefits associated with using AWS KMS. You can specify either an AWS managed CMK or a customer-managed CMK to encrypt your Amazon EBS volume. If you use a customer-managed CMK, you retain granular control over your encryption keys, such as having AWS KMS rotate your CMK every year. To learn more about creating CMKs, see Creating Keys.

In this post, we demonstrate how to create an encrypted Amazon EBS volume using a customer-managed CMK when you launch an EC2 instance from the EC2 console, AWS CLI, and AWS SDK.

Creating an encrypted Amazon EBS volume from the EC2 console

Follow these steps to launch an EC2 instance from the EC2 console with Amazon EBS volumes that are encrypted by customer-managed CMKs:

  1. Sign in to the AWS Management Console and open the EC2 console.
  2. Select Launch instance, and then, in Step 1 of the wizard, select an Amazon Machine Image (AMI).
  3. In Step 2 of the wizard, select an instance type, and then provide additional configuration details in Step 3. For details about configuring your instances, see Launching an Instance.
  4. In Step 4 of the wizard, specify additional EBS volumes that you want to attach to your instances.
  5. To create an encrypted Amazon EBS volume, first add a new volume by selecting Add new volume. Leave the Snapshot column blank.
  6. In the Encrypted column, select your CMK from the drop-down menu. You can also paste the full Amazon Resource Name (ARN) of your custom CMK key ID in this box. To learn more about finding the ARN of a CMK, see Working with Keys.
  7. Select Review and Launch. Your instance will launch with an additional Amazon EBS volume with the key that you selected. To learn more about the launch wizard, see Launching an Instance with Launch Wizard.

Creating Amazon EBS encrypted volumes from the AWS CLI or SDK

You also can use RunInstances to launch an instance with additional encrypted Amazon EBS volumes by setting Encrypted to true and adding kmsKeyID along with the actual key ID in the BlockDeviceMapping object, as shown in the following command:

$> aws ec2 run-instances –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

In this example, mapping.json describes the properties of the EBS volume that you want to create:


{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 100,
"VolumeType": "gp2",
"Encrypted": true,
"kmsKeyID": "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"
}
}

You can also launch instances with additional encrypted EBS data volumes via an Auto Scaling or Spot Fleet by creating a launch template with the above BlockDeviceMapping. For example:

$> aws ec2 create-launch-template –MyLTName –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

To learn more about launching an instance with the AWS CLI or SDK, see the AWS CLI Command Reference.

In this blog post, we’ve demonstrated a single-step, streamlined process for creating Amazon EBS volumes that are encrypted under your CMK when you launch your EC2 instance, thereby streamlining your instance launch workflow. To start using this functionality, navigate to the EC2 console.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

Newly released guide provides Australian public sector the ability to evaluate AWS at PROTECTED level

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/newly-released-guide-provides-australian-public-sector-the-ability-to-evaluate-aws-at-protected-level/

Australian public sector customers now have a clear roadmap to use our secure services for sensitive workloads at the PROTECTED level. For the first time, we’ve released our Information Security Registered Assessors Program (IRAP) PROTECTED documentation via AWS Artifact. This information provides the ability to plan, architect, and self-assess systems built in AWS under the Digital Transformation Agency’s Secure Cloud Guidelines.

In short, this documentation gives public sector customers everything needed to evaluate AWS at the PROTECTED level. And we’re making this resource available to download on-demand through AWS Artifact. When you download the guide, you’ll find a mapping of how AWS meets each requirement to securely and compliantly process PROTECTED data.

With the AWS IRAP PROTECTED documentation, the process of adopting our secure services has never been easier. The information enables individual agencies to complete their own assessments and adopt AWS, but we also continue to work with the Australian Signals Directorate to include our services at the PROTECTED level on the Certified Cloud Services List.

Meanwhile, we’re also excited to announce that there are now 46 services in scope, which mean more options to build secure and innovative solutions, while also saving money and gaining the productivity of the cloud.

If you have questions about this announcement or would like to inquire about how to use AWS for your regulated workloads, contact your account team.

Real-Time Hotspot Detection in Amazon Kinesis Analytics

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/real-time-hotspot-detection-in-amazon-kinesis-analytics/

Today we’re releasing a new machine learning feature in Amazon Kinesis Data Analytics for detecting “hotspots” in your streaming data. We launched Kinesis Data Analytics in August of 2016 and we’ve continued to add features since. As you may already know, Kinesis Data Analytics is a fully managed real-time processing engine for streaming data that lets you write SQL queries to derive meaning from your data and output the results to Kinesis Data Firehose, Kinesis Data Streams, or even an AWS Lambda function. The new HOTSPOT function adds to the existing machine learning capabilities in Kinesis that allow customers to leverage unsupervised streaming based machine learning algorithms. Customers don’t need to be experts in data science or machine learning to take advantage of these capabilities.

Hotspots

The HOTSPOTS function is a new Kinesis Data Analytics SQL function you can use to idenitfy relatively dense regions in your data without having to explicity build and train complicated machine learning models. You can identify subsections of your data that need immediate attention and take action programatically by streaming the hotspots out to a Kinesis Data stream, to a Firehose delivery stream, or by invoking a AWS Lambda function.

There are a ton of really cool scenarios where this could make your operations easier. Imagine a ride-share program or autonomous vehicle fleet communicating spatiotemporal data about traffic jams and congestion, or a datacenter where a number of servers start to overheat indicating an HVAC issue. HOTSPOTS is not limited to spatiotemporal data and you could apply it across many problem domains.

The function follows some simple syntax and accepts the DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT data types.

The HOTSPOT function takes a cursor as input and returns a JSON string describing the hotspot. This will be easier to understand with an example.

Using Kinesis Data Analytics to Detect Hotspots

Let’s take a simple data set from NY Taxi and Limousine Commission that tracks yellow cab pickup and dropoff locations. Most of this data is already on S3 and publicly accessible at s3://nyc-tlc/. We will create a small python script to load our Kinesis Data Stream with Taxi records which will feed our Kinesis Data Analytics. Finally we’ll output all of this to a Kinesis Data Firehose connected to an Amazon Elasticsearch Service cluster for visualization with Kibana. I know from living in New York for 5 years that we’ll probably find a hotspot or two in this data.

First, we’ll create an input Kinesis stream and start sending our NYC Taxi Ride data into it. I just wrote a quick python script to read from one of the CSV files and used boto3 to push the records into Kinesis. You can put the record in whatever way works for you.

 

import csv
import json
import boto3
def chunkit(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

kinesis = boto3.client("kinesis")
with open("taxidata2.csv") as f:
    reader = csv.DictReader(f)
    records = chunkit([{"PartitionKey": "taxis", "Data": json.dumps(row)} for row in reader], 500)
    for chunk in records:
        kinesis.put_records(StreamName="TaxiData", Records=chunk)

Next, we’ll create the Kinesis Data Analytics application and add our input stream with our taxi data as the source.

Next we’ll automatically detect the schema.

Now we’ll create a quick SQL Script to detect our hotspots and add that to the Real Time Analytics section of our application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "pickup_longitude" DOUBLE,
    "pickup_latitude" DOUBLE,
    HOTSPOTS_RESULT VARCHAR(10000)
); 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT "pickup_longitude", "pickup_latitude", "HOTSPOTS_RESULT" FROM
        TABLE(HOTSPOTS(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            1000,
            0.013,
            20
        )
    );


Our HOTSPOTS function takes an input stream, a window size, scan radius, and a minimum number of points to count as a hotspot. The values for these are application dependent but you can tinker with them in the console easily until you get the results you want. There are more details about the parameters themselves in the documentation. The HOTSPOTS_RESULT returns some useful JSON that would let us plot bounding boxes around our hotspots:

{
  "hotspots": [
    {
      "density": "elided",
      "minValues": [40.7915039, -74.0077401],
      "maxValues": [40.7915041, -74.0078001]
    }
  ]
}

 

When we have our desired results we can save the script and connect our application to our Amazon Elastic Search Service Firehose Delivery Stream. We can run an intermediate lambda function in the firehose to transform our record into a format more suitable for geographic work. Then we can update our mapping in Elasticsearch to index the hotspot objects as Geo-Shapes.

Finally, we can connect to Kibana and visualize the results.

Looks like Manhattan is pretty busy!

Available Now
This feature is available now in all existing regions with Kinesis Data Analytics. I think this is a really interesting new feature of Kinesis Data Analytics that can bring immediate value to many applications. Let us know what you build with it on Twitter or in the comments!

Randall

LED cubes and how to map them

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/led-panel-cubes-and-how-to-build-them/

Taking inspiration from a cube he had filmed at the 34th Chaos Communication Congress in Leipzig, Germany, polyfloyd gathered friends Sebastius and Boekenwuurm together to create their own.

The build

As polyfloyd’s blog post for the project notes, Sebastius led the way with the hardware portion of the build. The cube is made from six LED panels driven by a Raspberry Pi, and uses a breakout board to support the panels, which are connected in pairs:

The displays are connected in 3 chains, the maximum number of parallel chains the board supports, of 2 panels each. Having a higher degree of parallelization increases the refresh rate which in turn improves the overall image quality.

The first two chains make up the 4 sides. The remaining chain makes up the top and bottom of the cube.

Sebastius removed the plastic frames that come as standard on the panels, in order to allow them to fit together snugly as a cube. He designed and laser-cut a custom frame from plywood to support the panels instead.

Raspberry Pi LED Cube

Software

The team used hzeller’s software to drive the panels, and polyfloyd wrote their own program to “shove the pixels around”. polyfloyd used Ledcat, software they had made to drive previous LED projects, and adapted this interface so programs written for Ledcat would also work with hzeller’s library.

The full code for the project can be found on polyfloyd’s GitHub profile. It includes the ability to render animations to gzipped files, and to stream animations in real time via SSH.

Mapping 2D and spherical images with shaders

“One of the programs that could work with my LED-panels through [Unix] pipes was Shady,” observes polyfloyd, explaining the use of shaders with the cube. “The program works by rendering OpenGL fragment shaders to an RGB24 format which could then be piped to wherever needed. These shaders are small programs that can render an image by calculating the color for each pixel on the screen individually.”

The team programmed a shader to map the two-dimensional position of pixels in an image to the three-dimensional space of the cube. This then allowed the team to apply the mapping to spherical images, such as the globe in the video below:

The team has interesting plans for the cube moving forward, including the addition of an accelerometer and batteries. Follow their progress on the polyfloyd blog.

Fun with LED panels

The internet is full of amazing Raspberry Pi projects that use LED panels. This recent project available on Instructables shows how to assemble and set up a particle generator, while this one, featured on this blog last year, tracks emojis used on the Chelsea Handler: Gotta Go! app.

The post LED cubes and how to map them appeared first on Raspberry Pi.

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPU Memory
256 (.25 vCPU) 0.5GB, 1GB, 2GB
512 (.5 vCPU) 1GB, 2GB, 3GB, 4GB
1024 (1 vCPU) 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU) Between 4GB and 16GB in 1GB increments
4096 (4 vCPU) Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

Build a Multi-Tenant Amazon EMR Cluster with Kerberos, Microsoft Active Directory Integration and EMRFS Authorization

Post Syndicated from Songzhi Liu original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-amazon-emr-cluster-with-kerberos-microsoft-active-directory-integration-and-emrfs-authorization/

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

  1. Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
  2. The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::511586466501:role/EMR_EC2_RestrictedRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The following is an example policy for the admin user role (emrfs_auth_user_role_admin_user):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

We are assuming the admin user has access to all buckets in this example.

The following is an example policy for the data science group role (emrfs_auth_group_role_data_sci):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

To create and join Active Directory to Amazon EMR, follow the steps in the blog post Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory.

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

[
  {
    "Classification":"hue-ini",
    "Properties":{

    },
    "Configurations":[
      {
        "Classification":"desktop",
        "Properties":{

        },
        "Configurations":[
          {
            "Classification":"ldap",
            "Properties":{

            },
            "Configurations":[
              {
                "Classification":"ldap_servers",
                "Properties":{

                },
                "Configurations":[
                  {
                    "Classification":"AWS",
                    "Properties":{
                      "base_dn":"DC=emrkrb,DC=test,DC=com",
                      "ldap_url":"ldap://emrkrb.test.com",
                      "search_bind_authentication":"false",
                      "bind_dn":"CN=adjoiner,CN=users,DC=emrkrb,DC=test,DC=com",
                      "bind_password":"Abc123456",
                      "create_users_on_login":"true",
                      "nt_domain":"emrkrb.test.com"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          },
          {
            "Classification":"auth",
            "Properties":{
              "backend":"desktop.auth.backend.LdapBackend"
            },
            "Configurations":[

            ]
          }
        ]
      }
    ]
  }

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

You can use the following AWS CLI command to create a cluster.

aws emr create-cluster --name "TestEMRFSAuthorization" \ 
--release-label emr-5.10.0 \ --instance-type m3.xlarge \ 
--instance-count 3 \ 
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ 
--security-configuration MyKerberosConfig \ 
--configurations file://hue-config.json \
--applications Name=Hadoop Name=Hive Name=Hue Name=Spark \ 
--kerberos-attributes Realm=EC2.INTERNAL, \ KdcAdminPassword=<YourClusterKDCAdminPassword>, \ ADDomainJoinUser=<YourADUserLogonName>,ADDomainJoinPassword=<YourADUserPassword>, \ 
CrossRealmTrustPrincipalPassword=<MatchADTrustPwd>

Note: If you create the cluster using CLI, you need to save the JSON configuration for Hue into a file named hue-config.json and place it on the server where you run the CLI command.

After the cluster gets into the Waiting state, try to connect by using SSH into the cluster using the Active Directory user name and password.

ssh -l [email protected] <EMR IP or DNS name>

Quickly run two commands to show that the Active Directory join is successful:

  1. id [user name] shows the mapped AD users and groups in Linux.
  2. hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

 

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.


Additional Reading

If you found this post useful, be sure to check out Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory and Launching and Running an Amazon EMR Cluster inside a VPC.


About the Authors

Songzhi Liu is a Big Data Consultant with AWS Professional Services. He works closely with AWS customers to provide them Big Data & Machine Learning solutions and best practices on the Amazon cloud.

 

 

 

 

Task Networking in AWS Fargate

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/task-networking-in-aws-fargate/

AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.

Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.

Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.

This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.

The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:

  • Container (local) networking
  • External networking

Container Networking

Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.

One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.

If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.

By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.

In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.

If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:

{
  "family": "myapp"
  "containerDefinitions": [
    {
      "name": "web",
      "image": "my web image url",
      "portMappings": [
        {
          "containerPort": 80
        }
      ],
      "memory": 500,
      "cpu": 10,
      "esssential": true
    },
    {
      "name": "api",
      "image": "my api image url",
      "portMappings": [
        {
          "containerPort": 8080
        }
      ],
      "cpu": 10,
      "memory": 500,
      "essential": true
    }
  ]
}

ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.

Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:

curl 127.0.0.1:8080/my-endpoint

This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.

External Networking

External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.

Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.

When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.

Public subnets

A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:


A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.

If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.

Private subnets

A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:

 

There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.

In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.

Load balancers

If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.

ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.

To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:

This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.

Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:

With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.

Best Practices for Fargate Networking

Determine whether you should use local task networking

Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.

In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.

A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.

Run internet tasks that require internet access in a public subnet

If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.

If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.

Avoid using a public subnet or public IP addresses for private, internal tasks

If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.

The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.

Conclusion

Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.

To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.

If you have questions or suggestions, please comment below.