Tag Archives: Amazon Simple Storage Service

New – SES Dedicated IP Pools

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-ses-dedicated-ip-pools/

Today we released Dedicated IP Pools for Amazon Simple Email Service (SES). With dedicated IP pools, you can specify which dedicated IP addresses to use for sending different types of email. Dedicated IP pools let you use your SES for different tasks. For instance, you can send transactional emails from one set of IPs and you can send marketing emails from another set of IPs.

If you’re not familiar with Amazon SES these concepts may not make much sense. We haven’t had the chance to cover SES on this blog since 2016, which is a shame, so I want to take a few steps back and talk about the service as a whole and some of the enhancements the team has made over the past year. If you just want the details on this new feature I strongly recommend reading the Amazon Simple Email Service Blog.

What is SES?

So, what is SES? If you’re a customer of Amazon.com you know that we send a lot of emails. Bought something? You get an email. Order shipped? You get an email. Over time, as both email volumes and types increased Amazon.com needed to build an email platform that was flexible, scalable, reliable, and cost-effective. SES is the result of years of Amazon’s own work in dealing with email and maximizing deliverability.

In short: SES gives you the ability to send and receive many types of email with the monitoring and tools to ensure high deliverability.

Sending an email is easy; one simple API call:

import boto3
ses = boto3.client('ses')
ses.send_email(
    [email protected]',
    Destination={'ToAddresses': [[email protected]']},
    Message={
        'Subject': {'Data': 'Hello, World!'},
        'Body': {'Text': {'Data': 'Hello, World!'}}
    }
)

Receiving and reacting to emails is easy too. You can set up rulesets that forward received emails to Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), or AWS Lambda – you could even trigger a Amazon Lex bot through Lambda to communicate with your customers over email. SES is a powerful tool for building applications. The image below shows just a fraction of the capabilities:

Deliverability 101

Deliverability is the percentage of your emails that arrive in your recipients’ inboxes. Maintaining deliverability is a shared responsibility between AWS and the customer. AWS takes the fight against spam very seriously and works hard to make sure services aren’t abused. To learn more about deliverability I recommend the deliverability docs. For now, understand that deliverability is an important aspect of email campaigns and SES has many tools that enable a customer to manage their deliverability.

Dedicated IPs and Dedicated IP pools

When you’re starting out with SES your emails are sent through a shared IP. That IP is responsible for sending mail on behalf of many customers and AWS works to maintain appropriate volume and deliverability on each of those IPs. However, when you reach a sufficient volume shared IPs may not be the right solution.

By creating a dedicated IP you’re able to fully control the reputations of those IPs. This makes it vastly easier to troubleshoot any deliverability or reputation issues. It’s also useful for many email certification programs which require a dedicated IP as a commitment to maintaining your email reputation. Using the shared IPs of the Amazon SES service is still the right move for many customers but if you have sustained daily sending volume greater than hundreds of thousands of emails per day you might want to consider a dedicated IP. One caveat to be aware of: if you’re not sending a sufficient volume of email with a consistent pattern a dedicated IP can actually hurt your reputation. Dedicated IPs are $24.95 per address per month at the time of this writing – but you can find out more at the pricing page.

Before you can use a Dedicated IP you need to “warm” it. You do this by gradually increasing the volume of emails you send through a new address. Each IP needs time to build a positive reputation. In March of this year SES released the ability to automatically warm your IPs over the course of 45 days. This feature is on by default for all new dedicated IPs.

Customers who send high volumes of email will typically have multiple dedicated IPs. Today the SES team released dedicated IP pools to make managing those IPs easier. Now when you send email you can specify a configuration set which will route your email to an IP in a pool based on the pool’s association with that configuration set.

One of the other major benefits of this feature is that it allows customers who previously split their email sending across several AWS accounts (to manage their reputation for different types of email) to consolidate into a single account.

You can read the documentation and blog for more info.

Launch – AWS Glue Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall

AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cloudhsm-update-cost-effective-hardware-key-management/

Our customers run an incredible variety of mission-critical workloads on AWS, many of which process and store sensitive data. As detailed in our Overview of Security Processes document, AWS customers have access to an ever-growing set of options for encrypting and protecting this data. For example, Amazon Relational Database Service (RDS) supports encryption of data at rest and in transit, with options tailored for each supported database engine (MySQL, SQL Server, Oracle, MariaDB, PostgreSQL, and Aurora).

Many customers use AWS Key Management Service (KMS) to centralize their key management, with others taking advantage of the hardware-based key management, encryption, and decryption provided by AWS CloudHSM to meet stringent security and compliance requirements for their most sensitive data and regulated workloads (you can read my post, AWS CloudHSM – Secure Key Storage and Cryptographic Operations, to learn more about Hardware Security Modules, also known as HSMs).

Major CloudHSM Update
Today, building on what we have learned from our first-generation product, we are making a major update to CloudHSM, with a set of improvements designed to make the benefits of hardware-based key management available to a much wider audience while reducing the need for specialized operating expertise. Here’s a summary of the improvements:

Pay As You Go – CloudHSM is now offered under a pay-as-you-go model that is simpler and more cost-effective, with no up-front fees.

Fully Managed – CloudHSM is now a scalable managed service; provisioning, patching, high availability, and backups are all built-in and taken care of for you. Scheduled backups extract an encrypted image of your HSM from the hardware (using keys that only the HSM hardware itself knows) that can be restored only to identical HSM hardware owned by AWS. For durability, those backups are stored in Amazon Simple Storage Service (S3), and for an additional layer of security, encrypted again with server-side S3 encryption using an AWS KMS master key.

Open & Compatible  – CloudHSM is open and standards-compliant, with support for multiple APIs, programming languages, and cryptography extensions such as PKCS #11, Java Cryptography Extension (JCE), and Microsoft CryptoNG (CNG). The open nature of CloudHSM gives you more control and simplifies the process of moving keys (in encrypted form) from one CloudHSM to another, and also allows migration to and from other commercially available HSMs.

More Secure – CloudHSM Classic (the original model) supports the generation and use of keys that comply with FIPS 140-2 Level 2. We’re stepping that up a notch today with support for FIPS 140-2 Level 3, with security mechanisms that are designed to detect and respond to physical attempts to access or modify the HSM. Your keys are protected with exclusive, single-tenant access to tamper-resistant HSMs that appear within your Virtual Private Clouds (VPCs). CloudHSM supports quorum authentication for critical administrative and key management functions. This feature allows you to define a list of N possible identities that can access the functions, and then require at least M of them to authorize the action. It also supports multi-factor authentication using tokens that you provide.

AWS-Native – The updated CloudHSM is an integral part of AWS and plays well with other tools and services. You can create and manage a cluster of HSMs using the AWS Management Console, AWS Command Line Interface (CLI), or API calls.

Diving In
You can create CloudHSM clusters that contain 1 to 32 HSMs, each in a separate Availability Zone in a particular AWS Region. Spreading HSMs across AZs gives you high availability (including built-in load balancing); adding more HSMs gives you additional throughput. The HSMs within a cluster are kept in sync: performing a task or operation on one HSM in a cluster automatically updates the others. Each HSM in a cluster has its own Elastic Network Interface (ENI).

All interaction with an HSM takes place via the AWS CloudHSM client. It runs on an EC2 instance and uses certificate-based mutual authentication to create secure (TLS) connections to the HSMs.

At the hardware level, each HSM includes hardware-enforced isolation of crypto operations and key storage. Each customer HSM runs on dedicated processor cores.

Setting Up a Cluster
Let’s set up a cluster using the CloudHSM Console:

I click on Create cluster to get started, select my desired VPC and the subnets within it (I can also create a new VPC and/or subnets if needed):

Then I review my settings and click on Create:

After a few minutes, my cluster exists, but is uninitialized:

Initialization simply means retrieving a certificate signing request (the Cluster CSR):

And then creating a private key and using it to sign the request (these commands were copied from the Initialize Cluster docs and I have omitted the output. Note that ID identifies the cluster):

$ openssl genrsa -out CustomerRoot.key 2048
$ openssl req -new -x509 -days 365 -key CustomerRoot.key -out CustomerRoot.crt
$ openssl x509 -req -days 365 -in ID_ClusterCsr.csr   \
                              -CA CustomerRoot.crt    \
                              -CAkey CustomerRoot.key \
                              -CAcreateserial         \
                              -out ID_CustomerHsmCertificate.crt

The next step is to apply the signed certificate to the cluster using the console or the CLI. After this has been done, the cluster can be activated by changing the password for the HSM’s administrative user, otherwise known as the Crypto Officer (CO).

Once the cluster has been created, initialized and activated, it can be used to protect data. Applications can use the APIs in AWS CloudHSM SDKs to manage keys, encrypt & decrypt objects, and more. The SDKs provide access to the CloudHSM client (running on the same instance as the application). The client, in turn, connects to the cluster across an encrypted connection.

Available Today
The new HSM is available today in the US East (Northern Virginia), US West (Oregon), US East (Ohio), and EU (Ireland) Regions, with more in the works. Pricing starts at $1.45 per HSM per hour.

Jeff;

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  VotesTable:
    Type: "AWS::Serverless::SimpleTable"
  VoteSpacesTabs:
    Type: "AWS::Serverless::Function"
    Properties:
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref VotesTable
      Events:
        Vote:
          Type: Api
          Properties:
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    print(event)
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' http://127.0.0.1:3000/ and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
        'dynamodb',
        endpoint_url="http://docker.for.mac.localhost:8000/"
    ).Table("spaces-tabs-votes")
else:
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:
.

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.

Randall

New – GPU-Powered Streaming Instances for Amazon AppStream 2.0

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-gpu-powered-streaming-instances-for-amazon-appstream-2-0/

We launched Amazon AppStream 2.0 at re:Invent 2016. This application streaming service allows you to deliver Windows applications to a desktop browser.

AppStream 2.0 is fully managed and provides consistent, scalable performance by running applications on general purpose, compute optimized, and memory optimized streaming instances, with delivery via NICE DCV – a secure, high-fidelity streaming protocol. Our enterprise and public sector customers have started using AppStream 2.0 in place of legacy application streaming environments that are installed on-premises. They use AppStream 2.0 to deliver both commercial and line of business applications to a desktop browser. Our ISV customers are using AppStream 2.0 to move their applications to the cloud as-is, with no changes to their code. These customers focus on demos, workshops, and commercial SaaS subscriptions.

We are getting great feedback on AppStream 2.0 and have been adding new features very quickly (even by AWS standards). So far this year we have added an image builder, federated access via SAML 2.0, CloudWatch monitoring, Fleet Auto Scaling, Simple Network Setup, persistent storage for user files (backed by Amazon S3), support for VPC security groups, and built-in user management including web portals for users.

New GPU-Powered Streaming Instances
Many of our customers have told us that they want to use AppStream 2.0 to deliver specialized design, engineering, HPC, and media applications to their users. These applications are generally graphically intensive and are designed to run on expensive, high-end PCs in conjunction with a GPU (Graphics Processing Unit). Due to the hardware requirements of these applications, cost considerations have traditionally kept them out of situations where part-time or occasional access would otherwise make sense. Recently, another requirement has come to the forefront. These applications almost always need shared, read-write access to large amounts of sensitive data that is best stored, processed, and secured in the cloud. In order to meet the needs of these users and applications, we are launching two new types of streaming instances today:

Graphics Desktop – Based on the G2 instance type, Graphics Desktop instances are designed for desktop applications that use the CUDA, DirectX, or OpenGL for rendering. These instances are equipped with 15 GiB of memory and 8 vCPUs. You can select this instance family when you build an AppStream image or configure an AppStream fleet:

Graphics Pro – Based on the brand-new G3 instance type, Graphics Pro instances are designed for high-end, high-performance applications that can use the NVIDIA APIs and/or need access to large amounts of memory. These instances are available in three sizes, with 122 to 488 GiB of memory and 16 to 64 vCPUs. Again, you can select this instance family when you configure an AppStream fleet:

To learn more about how to launch, run, and scale a streaming application environment, read Scaling Your Desktop Application Streams with Amazon AppStream 2.0.

As I noted earlier, you can use either of these two instance types to build an AppStream image. This will allow you to test and fine tune your applications and to see the instances in action.

Streaming Instances in Action
We’ve been working with several customers during a private beta program for the new instance types. Here are a few stories (and some cool screen shots) to show you some of the applications that they are streaming via AppStream 2.0:

AVEVA is a world leading provider of engineering design and information management software solutions for the marine, power, plant, offshore and oil & gas industries. As part of their work on massive capital projects, their customers need to bring many groups of specialist engineers together to collaborate on the creation of digital assets. In order to support this requirement, AVEVA is building SaaS solutions that combine the streamed delivery of engineering applications with access to a scalable project data environment that is shared between engineers across the globe. The new instances will allow AVEVA to deliver their engineering design software in SaaS form while maximizing quality and performance. Here’s a screen shot of their Everything 3D app being streamed from AppStream:

Nissan, a Japanese multinational automobile manufacturer, trains its automotive specialists using 3D simulation software running on expensive graphics workstations. The training software, developed by The DiSti Corporation, allows its specialists to simulate maintenance processes by interacting with realistic 3D models of the vehicles they work on. AppStream 2.0’s new graphics capability now allows Nissan to deliver these training tools in real time, with up to date content, to a desktop browser running on low-cost commodity PCs. Their specialists can now interact with highly realistic renderings of a vehicle that allows them to train for and plan maintenance operations with higher efficiency.

Cornell University is an American private Ivy League and land-grant doctoral university located in Ithaca, New York. They deliver advanced 3D tools such as AutoDesk AutoCAD and Inventor to students and faculty to support their course work, teaching, and research. Until now, these tools could only be used on GPU-powered workstations in a lab or classroom. AppStream 2.0 allows them to deliver the applications to a web browser running on any desktop, where they run as if they were on a local workstation. Their users are no longer limited by available workstations in labs and classrooms, and can bring their own devices and have access to their course software. This increased flexibility also means that faculty members no longer need to take lab availability into account when they build course schedules. Here’s a copy of Autodesk Inventor Professional running on AppStream at Cornell:

Now Available
Both of the graphics streaming instance families are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions and you can start streaming from them today. Your applications must run in a Windows 2012 R2 environment, and can make use of DirectX, OpenGL, CUDA, OpenCL, and Vulkan.

With prices in the US East (Northern Virginia) Region starting at $0.50 per hour for Graphics Desktop instances and $2.05 per hour for Graphics Pro instances, you can now run your simulation, visualization, and HPC workloads in the AWS Cloud on an economical, pay-by-the-hour basis. You can also take advantage of fast, low-latency access to Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), AWS Lambda, Amazon Redshift, and other AWS services to build processing workflows that handle pre- and post-processing of your data.

Jeff;

 

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the

s3discovery.properties

file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the

s3discovery.properties

file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:<your access key id here>
credentials-secret-access-key:<your secret access key here>

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Synchronizing Amazon S3 Buckets Using AWS Step Functions

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/synchronizing-amazon-s3-buckets-using-aws-step-functions/

Constantin Gonzalez is a Principal Solutions Architect at AWS

In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.

My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.

In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.

Step Functions overview

Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.

While this particular example focuses on synchronizing objects between two S3 buckets, it can be generalized to any other use case that involves coordinated processing of any number of objects in S3 buckets, or other, similar data processing patterns.

Bucket replication options

Before I dive into the details on how this particular example works, take a look at some alternatives for copying or replicating data between two Amazon S3 buckets:

  • The AWS CLI provides customers with a powerful aws s3 sync command that can synchronize the contents of one bucket with another.
  • S3DistCP is a powerful tool for users of Amazon EMR that can efficiently load, save, or copy large amounts of data between S3 buckets and HDFS.
  • The S3 cross-region replication functionality enables automatic, asynchronous copying of objects across buckets in different AWS regions.

In this use case, you are looking for a slightly different bucket synchronization solution that:

  • Works within the same region
  • Is more scalable than a CLI approach running on a single machine
  • Doesn’t require managing any servers
  • Uses a more finely grained cost model than the hourly based Amazon EMR approach

You need a scalable, serverless, and customizable bucket synchronization utility.

Solution architecture

Your solution needs to do three things:

  1. Copy all objects from a source bucket into a destination bucket, but leave out objects that are already present, for efficiency.
  2. Delete all "orphaned" objects from the destination bucket that aren’t present on the source bucket, because you don’t want obsolete objects lying around.
  3. Keep track of all objects for #1 and #2, regardless of how many objects there are.

In the beginning, you read in the source and destination buckets as parameters and perform basic parameter validation. Then, you operate two separate, independent loops, one for copying missing objects and one for deleting obsolete objects. Each loop is a sequence of Step Functions states that read in chunks of S3 object lists and use the continuation token to decide in a choice state whether to continue the loop or not.

This solution is based on the following architecture that uses Step Functions, Lambda, and two S3 buckets:

As you can see, this setup involves no servers, just two main building blocks:

  • Step Functions manages the overall flow of synchronizing the objects from the source bucket with the destination bucket.
  • A set of Lambda functions carry out the individual steps necessary to perform the work, such as validating input, getting lists of objects from source and destination buckets, copying or deleting objects in batches, and so on.

To understand the synchronization flow in more detail, look at the Step Functions state machine diagram for this example.

Walkthrough

Here’s a detailed discussion of how this works.

To follow along, use the code in the sync-buckets-state-machine GitHub repo. The code comes with a ready-to-run deployment script in Python that takes care of all the IAM roles, policies, Lambda functions, and of course the Step Functions state machine deployment using AWS CloudFormation, as well as instructions on how to use it.

Fine print: Use at your own risk

Before I start, here are some disclaimers:

  • Educational purposes only.

    The following example and code are intended for educational purposes only. Make sure that you customize, test, and review it on your own before using any of this in production.

  • S3 object deletion.

    In particular, using the code included below may delete objects on S3 in order to perform synchronization. Make sure that you have backups of your data. In particular, consider using the Amazon S3 Versioning feature to protect yourself against unintended data modification or deletion.

Step Functions execution starts with an initial set of parameters that contain the source and destination bucket names in JSON:

{
    "source":       "my-source-bucket-name",
    "destination":  "my-destination-bucket-name"
}

Armed with this data, Step Functions execution proceeds as follows.

Step 1: Detect the bucket region

First, you need to know the regions where your buckets reside. In this case, take advantage of the Step Functions Parallel state. This allows you to use a Lambda function get_bucket_location.py inside two different, parallel branches of task states:

  • FindRegionForSourceBucket
  • FindRegionForDestinationBucket

Each task state receives one bucket name as an input parameter, then detects the region corresponding to "their" bucket. The output of these functions is collected in a result array containing one element per parallel function.

Step 2: Combine the parallel states

The output of a parallel state is a list with all the individual branches’ outputs. To combine them into a single structure, use a Lambda function called combine_dicts.py in its own CombineRegionOutputs task state. The function combines the two outputs from step 1 into a single JSON dict that provides you with the necessary region information for each bucket.

Step 3: Validate the input

In this walkthrough, you only support buckets that reside in the same region, so you need to decide if the input is valid or if the user has given you two buckets in different regions. To find out, use a Lambda function called validate_input.py in the ValidateInput task state that tests if the two regions from the previous step are equal. The output is a Boolean.

Step 4: Branch the workflow

Use another type of Step Functions state, a Choice state, which branches into a Failure state if the comparison in step 3 yields false, or proceeds with the remaining steps if the comparison was successful.

Step 5: Execute in parallel

The actual work is happening in another Parallel state. Both branches of this state are very similar to each other and they re-use some of the Lambda function code.

Each parallel branch implements a looping pattern across the following steps:

  1. Use a Pass state to inject either the string value "source" (InjectSourceBucket) or "destination" (InjectDestinationBucket) into the listBucket attribute of the state document.

    The next step uses either the source or the destination bucket, depending on the branch, while executing the same, generic Lambda function. You don’t need two Lambda functions that differ only slightly. This step illustrates how to use Pass states as a way of injecting constant parameters into your state machine and as a way of controlling step behavior while re-using common step execution code.

  2. The next step UpdateSourceKeyList/UpdateDestinationKeyList lists objects in the given bucket.

    Remember that the previous step injected either "source" or "destination" into the state document’s listBucket attribute. This step uses the same list_bucket.py Lambda function to list objects in an S3 bucket. The listBucket attribute of its input decides which bucket to list. In the left branch of the main parallel state, use the list of source objects to work through copying missing objects. The right branch uses the list of destination objects, to check if they have a corresponding object in the source bucket and eliminate any orphaned objects. Orphans don’t have a source object of the same S3 key.

  3. This step performs the actual work. In the left branch, the CopySourceKeys step uses the copy_keys.py Lambda function to go through the list of source objects provided by the previous step, then copies any missing object into the destination bucket. Its sister step in the other branch, DeleteOrphanedKeys, uses its destination bucket key list to test whether each object from the destination bucket has a corresponding source object, then deletes any orphaned objects.

  4. The S3 ListObjects API action is designed to be scalable across many objects in a bucket. Therefore, it returns object lists in chunks of configurable size, along with a continuation token. If the API result has a continuation token, it means that there are more objects in this list. You can work from token to token to continue getting object list chunks, until you get no more continuation tokens.

By breaking down large amounts of work into chunks, you can make sure each chunk is completed within the timeframe allocated for the Lambda function, and within the maximum input/output data size for a Step Functions state.

This approach comes with a slight tradeoff: the more objects you process at one time in a given chunk, the faster you are done. There’s less overhead for managing individual chunks. On the other hand, if you process too many objects within the same chunk, you risk going over time and space limits of the processing Lambda function or the Step Functions state so the work cannot be completed.

In this particular case, use a Lambda function that maximizes the number of objects listed from the S3 bucket that can be stored in the input/output state data. This is currently up to 32,768 bytes, assuming (based on some experimentation) that the execution of the COPY/DELETE requests in the processing states can always complete in time.

A more sophisticated approach would use the Step Functions retry/catch state attributes to account for any time limits encountered and adjust the list size accordingly through some list site adjusting.

Step 6: Test for completion

Because the presence of a continuation token in the S3 ListObjects output signals that you are not done processing all objects yet, use a Choice state to test for its presence. If a continuation token exists, it branches into the UpdateSourceKeyList step, which uses the token to get to the next chunk of objects. If there is no token, you’re done. The state machine then branches into the FinishCopyBranch/FinishDeleteBranch state.

By using Choice states like this, you can create loops exactly like the old times, when you didn’t have for statements and used branches in assembly code instead!

Step 7: Success!

Finally, you’re done, and can step into your final Success state.

Lessons learned

When implementing this use case with Step Functions and Lambda, I learned the following things:

  • Sometimes, it is necessary to manipulate the JSON state of a Step Functions state machine with just a few lines of code that hardly seem to warrant their own Lambda function. This is ok, and the cost is actually pretty low given Lambda’s 100 millisecond billing granularity. The upside is that functions like these can be helpful to make the data more palatable for the following steps or for facilitating Choice states. An example here would be the combine_dicts.py function.
  • Pass states can be useful beyond debugging and tracing, they can be used to inject arbitrary values into your state JSON and guide generic Lambda functions into doing specific things.
  • Choice states are your friend because you can build while-loops with them. This allows you to reliably grind through large amounts of data with the patience of an engine that currently supports execution times of up to 1 year.

    Currently, there is an execution history limit of 25,000 events. Each Lambda task state execution takes up 5 events, while each choice state takes 2 events for a total of 7 events per loop. This means you can loop about 3500 times with this state machine. For even more scalability, you can split up work across multiple Step Functions executions through object key sharding or similar approaches.

  • It’s not necessary to spend a lot of time coding exception handling within your Lambda functions. You can delegate all exception handling to Step Functions and instead simplify your functions as much as possible.

  • Step Functions are great replacements for shell scripts. This could have been a shell script, but then I would have had to worry about where to execute it reliably, how to scale it if it went beyond a few thousand objects, etc. Think of Step Functions and Lambda as tools for scripting at a cloud level, beyond the boundaries of servers or containers. "Serverless" here also means "boundary-less".

Summary

This approach gives you scalability by breaking down any number of S3 objects into chunks, then using Step Functions to control logic to work through these objects in a scalable, serverless, and fully managed way.

To take a look at the code or tweak it for your own needs, use the code in the sync-buckets-state-machine GitHub repo.

To see more examples, please visit the Step Functions Getting Started page.

Enjoy!

AWS Greengrass – Run AWS Lambda Functions on Connected Devices

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-greengrass-run-aws-lambda-functions-on-connected-devices/

I first told you about AWS Greengrass in the post that I published during re:Invent (AWS Greengrass – Ubiquitous Real-World Computing). We launched a limited preview of Greengrass at that time and invited you to sign up if you were interested.

As I noted at the time, many AWS customers want to collect and process data out in the field, where connectivity is often slow and sometimes either intermittent or unreliable. Greengrass allows them to extend the AWS programming model to small, simple, field-based devices. It builds on AWS IoT and AWS Lambda, and supports access to the ever-increasing variety of services that are available in the AWS Cloud.

Greengrass gives you access to compute, messaging, data caching, and syncing services that run in the field, and that do not depend on constant, high-bandwidth connectivity to an AWS Region. You can write Lambda functions in Python 2.7 and deploy them to your Greengrass devices from the cloud while using device shadows to maintain state. Your devices and peripherals can talk to each other using local messaging that does not pass through the cloud.

Now Generally Available
Today we are making Greengrass generally available in the US East (Northern Virginia) and US West (Oregon) Regions. During the preview, AWS customers were able to get hands-on experience with Greengrass and to start building applications and businesses around it. I’ll share a few of these early successes later in this post.

The Greengrass Core code runs on each device. It allows you to deploy and run Lambda applications on the device, supports local MQTT messaging across a secure network, and also ensures that conversations between devices and the cloud are made across secure connections. The Greengrass Core also supports secure, over-the-air software updates, including Lambda functions. It includes a message broker, a Lambda runtime, a Thing Shadows implementation, and a deployment agent. Greengrass Core and (optionally) other devices make up a Greengrass Group. The group includes configuration data, the list of devices and the identity of the Greengrass Core, a list of Lambda functions, and a set of subscriptions that define where the messages should go. All of this information is copied to the Greengrass core devices during the deployment process.

Your Lambda functions can use APIs in three distinct SDKs:

AWS SDK for Python – This SDK allows your code to interact with Amazon Simple Storage Service (S3), Amazon DynamoDB, Amazon Simple Queue Service (SQS), and other AWS services.

AWS IoT Device SDK – This SDK (available for Node.js, Python, Java, and C++) helps you to connect your hardware devices to AWS IoT. The C++ SDK has a few extra features including access to the Greengrass Discovery Service and support for root CA downloads.

AWS Greengrass Core SDK – This SDK provides APIs that allow local invocation of other Lambda functions, publish messages, and work with thing shadows.

You can run the Greengrass Core on x86 and ARM devices that have version 4.4.11 (or newer) of the Linux kernel, with the OverlayFS and user namespace features enabled. While most deployments of Greengrass will be targeted at specialized, industrial-grade hardware, you can also run the Greengrass Core on a Raspberry Pi or an EC2 instance for development and test purposes.

For this post, I used a Raspberry Pi attached to a BrickPi, connected to my home network via WiFi:

The Raspberry Pi, the BrickPi, the case, and all of the other parts are available in the BrickPi 3 Starter Kit. You will need some Linux command-line expertise and a decent amount of manual dexterity to put all of this together, but if I did it then you surely can.

Greengrass in Action
I can access Greengrass from the Console, API, or CLI. I’ll use the Console. The intro page of the Greengrass Console lets me define groups, add Greengrass Cores, and add devices to my groups:

I click on Get Started and then on Use easy creation:

Then I name my group:

And name my first Greengrass Core:

I’m ready to go, so I click on Create Group and Core:

This runs for a few seconds and then offers up my security resources (two keys and a certificate) for downloading, along with the Greengrass Core:

I download the security resources and put them in a safe place, and select and download the desired version of the Greengrass Core software (ARMv7l for my Raspberry Pi), and click on Finish.

Now I power up my Pi, and copy the security resources and the software to it (I put them in an S3 bucket and pulled them down with wget). Here’s my shell history at that point:

Following the directions in the user guide, I create a new user and group, run the rpi-update script, and install several packages including sqlite3 and openssl. After a couple of reboots, I am ready to proceed!

Next, still following the directions, I untar the Greengrass Core software and move the security resources to their final destination (/greengrass/configuration/certs), giving them generic names along the way. Here’s what the directory looks like:

The next step is to associate the core with an AWS IoT thing. I return to the Console, click through the group and the Greengrass Core, and find the Thing ARN:

I insert the names of the certificates and the Thing ARN into the config.json file, and also fill in the missing sections of the iotHost and ggHost:

I start the Greengrass demon (this was my second attempt; I had a typo in one of my path names the first time around):

After all of this pleasant time at the command line (taking me back to my Unix v7 and BSD 4.2 days), it is time to go visual once again! I visit my AWS IoT dashboard and see that my Greengrass Core is making connections to IoT:

I go to the Lambda Console and create a Lambda function using the Python 2.7 runtime (the IAM role does not matter here):

I publish the function in the usual way and, hop over to the Greengrass Console, click on my group, and choose to add a Lambda function:

Then I choose the version to deploy:

I also configure the function to be long-lived instead of on-demand:

My code will publish messages to AWS IoT, so I create a subscription by specifying the source and destination:

I set up a topic filter (hello/world) on the subscription as well:

I confirm my settings and save my subscription and I am just about ready to deploy my code. I revisit my group, click on Deployments, and choose Deploy from the Actions menu:

I choose Automatic detection to move forward:

Since this is my first deployment, I need to create a service-level role that gives Greengrass permission to access other AWS services. I simply click on Grant permission:

I can see the status of each deployment:

The code is now running on my Pi! It publishes messages to topic hello/world; I can see them by going to the IoT Console, clicking on Test, and subscribing to the topic:

And here are the messages:

With all of the setup work taken care of, I can do iterative development by uploading, publishing, and deploying new versions of my code. I plan to use the BrickPi to control some LEGO Technic motors and to publish data collected from some sensors. Stay tuned for that post!

Greengrass Pricing
You can run the Greengrass Core on three devices free for one year as part of the AWS Free Tier. At the next level (3 to 10,000 devices) two options are available:

  • Pay as You Go – $0.16 per month per device.
  • Annual Commitment – $1.49 per year per device, a 17.5% savings.

If you want to run the Greengrass Core on more than 10,000 devices or make a longer commitment, please get in touch with us; details on all pricing models are on the Greengrass Pricing page.

Jeff;

AWS Lambda Support for AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-lambda-support-for-aws-x-ray/

Today we’re announcing general availability of AWS Lambda support for AWS X-Ray. As you may already know from Jeff’s GA POST, X-Ray is an AWS service for analyzing the execution and performance behavior of distributed applications. Traditional debugging methods don’t work so well for microservice based applications, in which there are multiple, independent components running on different services. X-Ray allows you to rapidly diagnose errors, slowdowns, and timeouts by breaking down the latency in your applications. I’ll demonstrate how you can use X-Ray in your own applications in just a moment by walking us through building and analyzing a simple Lambda based application.

If you just want to get started right away you can easily turn on X-Ray for your existing Lambda functions by navigating to your function’s configuration page and enabling tracing:

Or in the AWS Command Line Interface (CLI) by updating the functions’s tracing-config (Be sure to pass in a --function-name as well):

$ aws lambda update-function-configuration --tracing-config '{"Mode": "Active"}'

When tracing mode is active Lambda will attempt to trace your function (unless explicitly told not to trace by an upstream service). Otherwise, your function will only be traced if it is explicitly told to do so by an upstream service. Once tracing is enabled, you’ll start generating traces and you’ll get a visual representation of the resources in your application and the connections (edges) between them. One thing to note is that the X-Ray daemon does consume some of your Lambda function’s resources. If you’re getting close to your memory limit Lambda will try to kill the X-Ray daemon to avoid throwing an out-of-memory error.

Let’s test this new integration out by building a quick application that uses a few different services.


As twenty-something with a smartphone I have a lot of pictures selfies (10000+!) and I thought it would be great to analyze all of them. We’ll write a simple Lambda function with the Java 8 runtime that responds to new images uploaded into an Amazon Simple Storage Service (S3) bucket. We’ll use Amazon Rekognition on the photos and store the detected labels in Amazon DynamoDB.

service map

First, let’s define a few quick X-Ray vocabulary words: subsegments, segments, and traces. Got that? X-Ray is easy to understand if you remember that subsegments and segments make up traces which X-Ray processes to generate service graphs. Service graphs make a nice visual representation we can see above (with different colors indicating various request responses). The compute resources that run your applications send data about the work they’re doing in the form of segments. You can add additional annotations about that data and more granular timing of your code by creating subsgements. The path of a request through your application is tracked with a trace. A trace collects all the segments generated by a single request. That means you can easily trace Lambda events coming in from S3 all the way to DynamoDB and understand where errors and latencies are cropping up.

So, we’ll create an S3 bucket called selfies-bucket, a DynamoDB table called selfies-table, and a Lambda function. We’ll add a trigger to our Lambda function for the S3 bucket on ObjectCreated:All events. Our Lambda function code will be super simple and you can look at it in it’s entirety here. With no code changes we can enable X-Ray in our Java function by including the aws-xray-sdk and aws-xray-sdk-recorder-aws-sdk-instrumentor packages in our JAR.

Let’s trigger some photo uploads and get a look at the traces in X-Ray.

We’ve got some data! We can click on one of these individual traces for a lot of detailed information on our invocation.

In the first AWS::Lambda segmet we see the dwell time of the function, how long it spent waiting to execute, followed by the number of execution attempts.

In the second AWS::Lambda::Function segment there are a few possible subsegments:

  • The inititlization subsegment includes all of the time spent before your function handler starts executing
  • The outbound service calls
  • Any of your custom subsegments (these are really easy to add)

Hmm, it seems like there’s a bit of an issue on the DynamoDB side. We can even dive deeper and get the full exception stacktrace by clicking on the error icon. You can see we’ve been throttled by DynamoDB because we’re out of write capacity units. Luckily we can add more with just a few clicks or a quick API call. As we do that we’ll see more and more green on our service map!

The X-Ray SDKs make it super easy to emit data to X-Ray, but you don’t have to use them to talk to the X-Ray daemon. For Python, you can check out this library from rackspace called fleece. The X-Ray service is full of interesting stuff and the best place to learn more is by hopping over to the documentation. I’ve been using it for my @awscloudninja bot and it’s working great! Just keep in mind that this isn’t an official library and isn’t supported by AWS.

Personally, I’m really excited to use X-Ray in all of my upcoming projects because it really will save me some time and effort debugging and operating. I look forward to seeing what our customers can build with it as well. If you come up with any cool tricks or hacks please let me know!

– Randall

Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-redshift-spectrum-exabyte-scale-in-place-queries-of-s3-data/

Now that we can launch cloud-based compute and storage resources with a couple of clicks, the challenge is to use these resources to go from raw data to actionable results as quickly and efficiently as possible.

Amazon Redshift allows AWS customers to build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat. Once the data is loaded, our customers can make use of a plethora of enterprise reporting and business intelligence tools provided by the Redshift Partners.

One of the most challenging aspects of running a data warehouse involves loading data that is continuously changing and/or arriving at a rapid pace. In order to provide great query performance, loading data into a data warehouse includes compression, normalization, and optimization steps. While these steps can be automated and scaled, the loading process introduces overhead and complexity, and also gets in the way of those all-important actionable results.

Data formats present another interesting challenge. Some applications will process the data in its original form, outside of the data warehouse. Others will issue queries to the data warehouse. This model leads to storage inefficiencies because the data must be stored twice, and can also mean that results from one form of processing may not align with those from another due to delays introduced by the loading process.

Amazon Redshift Spectrum
In order to allow you to process your data as-is, where-is, while taking advantage of the power and flexibility of Amazon Redshift, we are launching Amazon Redshift Spectrum. You can use Spectrum to run complex queries on data stored in Amazon Simple Storage Service (S3), with no need for loading or other data prep.

You simply create a data source and issue your queries to your Redshift cluster as usual. Behind the scenes, Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your data set grows up to an beyond an exabyte! Being able to query data stored in S3 means that you can scale your compute and your storage independently, with the full power of the Redshift query model and all of the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Redshift tables and in S3.

When you issue a query, Redshift rips it apart and generates a query plan that minimizes the amount of S3 data that will be read, taking advantage of both column-oriented formats and data that is partitioned by date or another key. Then Redshift requests Spectrum workers from a large, shared pool and directs them to project, filter, and aggregate the S3 data. The final processing is performed within the Redshift cluster and the results are returned to you.

Because Spectrum operates on data that is stored in S3, you can process the data using other AWS services such as Amazon EMR and Amazon Athena. You can also implement hybrid models where frequently queried data is kept in Redshift local storage and the rest is S3, or where dimension tables are in Redshift along with the recent portions of the fact tables, with older data in S3. In order to drive even higher levels of concurrency, you can point multiple Redshift clusters at the same stored data.

Spectrum supports open, common data types including CSV/TSV, Parquet, SequenceFile, and RCFile. Files can be compressed using GZip or Snappy, with other data types and compression methods in the works.

Spectrum in Action
In order to get some first-hand experience with Spectrum I loaded up a sample data set and ran some queries!

I started by creating an external schema and database:

Then I created an external table within the database:

I ran a simple query to get a feel for the size of the data set (6.1 billion rows):

And then I ran a query that processed all of the rows:

As you can see, Spectrum was able to churn through all 6 billion rows in a little over four minutes. I checked my cluster’s performance metrics and it looked like I had enough CPU power to run many such queries simultaneously:

Available Now
Amazon Redshift Spectrum is available now and you can start using it today!

Spectrum pricing is based on the amount of data pulled from S3 during query processing and is charged at the rate of $5 per terabyte (you can save money by compressing your data and/or storing it in column-oriented form). You pay the usual charges to run your Redshift cluster and to store your data in S3, but there are no Spectrum charges when you are not running queries.

Jeff;

 

EC2 F1 Instances with FPGAs – Now Generally Available

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-f1-instances-with-fpgas-now-generally-available/

We launched the Developer Preview of the FPGA-equipped F1 instances at AWS re:Invent. The response to the announcement was quick and overwhelming! We received over 2000 requests for entry, and were able to provide over 200 developers with access to the Hardware Development Kit (HDK) and the actual F1 instances.

In the post that I wrote for re:Invent, I told you that:

This highly parallelized model is ideal for building custom accelerators to process compute-intensive problems. Properly programmed, an FPGA has the potential to provide a 30x speedup to many types of genomics, seismic analysis, financial risk analysis, big data search, and encryption algorithms and applications.

During the preview, partners and developers have been working on all sorts of exciting tools, services, and applications. I’ll tell you more about them in just a moment.

Now Generally Available
Today we are making the F1 instances generally available in the US East (Northern Virginia) Region, with plans to bring them to other regions before too long.

We continued to add features and functions during the preview, while also making the development tools more efficient and easier to use. Here’s a summary:

Developer Community – We launched the AWS FPGA Development Forum to provide a place for FPGA developers to hang out and to communicate with us and with each other.

HDK and SDK – We published the EC2 FPGA Hardware (HDK) and Software Development Kit to GitHub, and made many improvements in response to feedback that we received during the preview.

The improvements include support for VHDL (in addition to Verilog), an improved virtual lab environment (Virtual JTAG, Virtual LED, and Virtual DipSwitch), AWS libraries for FPGA management and the FPGA runtime, and support for OpenCL including the AWS OpenCL runtime library.

FPGA Developer AMI – This Marketplace AMI contains a full set of FPGA development tools including an RTL compiler and simulator, along with Xilinx SDAccel for OpenCL development, all tuned for use on C4, M4, and R4 instances.

FPGAs At Work
Here’s a sampling of the impressive work that our partners have been doing with the F1’s:

Edico Genome is deploying their DRAGEN Bio-IT Platform on F1 instances, with the expectation that it will provide whole-genome sequencing that runs in real time.

Ryft offers the Ryft Cloud, an accelerator for data analytics and machine learning that extends Elastic Stack. It sources data from Amazon Kinesis, Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS), and local instance storage and uses massive bitwise parallelism to drive performance. The product supports high-level JDBC, ODBC, and REST interfaces along with low-level C, C++, Java, and Python APIs (see the Ryft API page for more information).

Reconfigure.io launched a cloud-based service that allows you to program FPGAs using the Go programming language. You can build, test, and deploy your code from within their cloud-based environment while taking advantage of concurrency-oriented language features such as goroutines (lightweight threads), channels, and selects.

NGCodec ported their RealityCodec video encoder to the F1 and used it to produce broadcast-quality video at 80 frames per second. Their solution can encode up to 32 independent video streams on a single F1 instance (read their new post, You Deserve Better than Grainy Giraffes, to learn more).

FPGAs In School & Research
Research groups and graduate classes at top-tier universities contacted us via AWS Educate and were eager to gain access to F1 instances.

UCLA‘s CS133 class (Parallel and Distributed Computing) is setting up an F1-based FPGA lab that will be operational within 3 or 4 weeks. According to UCLA Chancellor’s Professor Jason Cong, they are expanding multiple research projects to cover F1 including FPGA performance debugging, machine learning acceleration, Spark to FPGA compilation, and systolic array compilation.

Last month we announced that we are collaborating with the National Science Foundation (NSF) to foster innovation in big data research (read AWS Collaborates With the National Science Foundation to Foster Innovation to learn more and to find out how to apply for a grant).

FPGA’s in the AWS Marketplace
As I shared in my original post, we have built a complete beginning to end solution that lets developers build FPGA-powered applications and services and list them in the AWS Marketplace. I can’t wait to see what kinds of cool things show up there!

Jeff;

Launch: Amazon Athena adds support for Querying Encrypted Data

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/launch-amazon-athena-adds-support-for-querying-encrypted-data/

In November of last year, we brought a service to the market that we hoped would be a major step toward helping those who have the need to securely access and examine massive amounts of data on a daily basis.  This service is none other than Amazon Athena which I think of as a managed service that is attempting “to leap tall queries in a single bound” with querying of object storage. A service that provides AWS customers the power to easily analyze and query large amounts of data stored in Amazon S3.

Amazon Athena is a serverless interactive query service that enables users to easily analyze data in Amazon S3 using standard SQL. At Athena’s core is Presto, a distributed SQL engine to run queries with ANSI SQL support and Apache Hive which allows Athena to work with popular data formats like CSV, JSON, ORC, Avro, and Parquet and adds common Data Definition Language (DDL) operations like create, drop, and alter tables. Athena enables the performant query access to datasets stored in Amazon Simple Storage Service (S3) with structured and unstructured data formats.

You can write Hive-compliant DDL statements and ANSI SQL statements in the Athena Query Editor from the AWS Management Console, from SQL clients such as SQL Workbench by downloading and taking advantage of the Athena JDBC driver. Additionally, by using the JDBC driver you can run queries programmatically from your desired BI tools. You can read more about the Amazon Athena service from Jeff’s blog post during the service release in November.

After releasing the initial features of the Amazon Athena service, the Athena team kept with the Amazon tradition of focusing on the customer by working diligently to make your customer experience with the service better. Therefore, the team has added a feature that I am excited to announce; Amazon Athena now provides support for Querying Encrypted data in Amazon S3. This new feature not only makes it possible for Athena to provide support for querying encrypted data in Amazon S3, but also enables the encryption of data from Athena’s query results. Businesses and customers who have requirements and/or regulations to encrypt sensitive data stored in Amazon S3 are able to take advantage of the serverless dynamic queries Athena offers with their encrypted data.

 

Supporting Encryption

Before we dive into the using the new Athena feature, let’s take some time to review the supported encryption options that S3 and Athena supports for customers needing to secure and encrypt data. Currently, S3 supports encrypting data with AWS Key Management Service (KMS). AWS KMS is a managed service for the creation and management of encryption keys used to encrypt data. In addition, S3 supports customers using their own encryption keys to encrypt data. Since it is important to understand the encrypted options that Athena supports for datasets stored on S3, in the chart below I have provided a breakdown of the encryption options supported with S3 and Athena, as well as, noted when the new Athena table property, has_encrypted_data, is required for encrypted data access.

 

For more information on Amazon S3 encryption with AWS KMS or Amazon S3 Encryption options, review the information in the AWS KMS Developer Guide on How Amazon Simple Storage Service (Amazon S3) Uses AWS KMS and Amazon S3 Developer Guide on Protecting Data Using Encryption respectively.

 

Creating & Accessing Encrypted Databases and Tables

As I noted before, there are a couple of ways to access Athena. Of course, you can access Athena through the AWS Management Console, but you also have the option to use the JDBC driver with SQL clients like SQL Workbench and other Business Intelligence tools. In addition, the JDBC driver allows for programmatic query access.

Enough discussion, it is time to dig into this new Athena service feature by creating a database and some tables, running queries from the table and encryption of the query results. We’ll accomplish all this by using encrypted data stored in Amazon S3.

If this is your first time logging into the service, you will see the Amazon Athena Getting Started screen as shown below. You would need to click the Get Started button to be taken the Athena Query Editor.

Now that we are in the Athena Query Editor, let’s create a database. If the sample database is shown when you open your Query Editor you would simply start typing your query statement in the Query Editor window to clear the sample query and create the new database.

I will now issue the Hive DDL Command, CREATE DATABASE <dbname> within the Query Editor window to create my database, tara_customer_db.

Once I receive the confirmation that my query execution was successful in the Results tab of Query Editor, my database should be created and available for selection in the dropdown.

I now will change my selected database in the dropdown to my newly created database, tara_customer_db.

 

 

With my database created, I am able to create tables from my data stored in S3. Since I did not have data encrypted with the various encryption types, the product group was kind enough to give me some sample data files to place in my S3 buckets. The first batch of sample data that I received was encrypted with SSE-KMS which if you recall from the encryption table matrix we discussed above is encryption type, Server-Side Encryption with AWS KMS–Managed Keys. I stored this set of encrypted data in my S3 bucket aptly named: aws-blog-tew-posts/SSE_KMS_EncryptionData. The second batch of sample data was encrypted with CSE-KMS, which is the encryption type, Client-Side Encryption with AWS, and is stored in my aws-blog-tew-posts/ CSE_KMS_EncryptionData S3 bucket. The last batch of data I received is just good old-fashioned plain text, and I have stored this data in the S3 bucket, aws-blog-tew-posts/PlainText_Table.

Remember to access my data in the S3 buckets from the Athena service, I must ensure that my data buckets have the correct permissions to allow Athena access each bucket and data contained therein. In addition, working with AWS KMS encrypted data requires users to have roles that include the appropriate KMS key policies. It is important to note that to successfully read KMS encrypted data, users must have the correct permissions for access to S3, Athena, and KMS collectively.

There are several ways that I can provide the appropriate access permissions between S3 and the Athena service:

  1. Allow access via user policy
  2. Allow access via bucket policy
  3. Allow access with both a bucket policy and user policy.

To learn more about the Amazon Athena access permissions and/or the Amazon S3 permissions by reviewing the Athena documentation on Setting User and Amazon S3 Bucket Permissions.

Since my data is ready and setup in my S3 buckets, I just need to head over to Athena Query Editor and create my first new table from the SSE-KMS encrypted data. My DDL commands that I will use to create my new table, sse_customerinfo, is as follows:

CREATE EXTERNAL TABLE sse_customerinfo( 
  c_custkey INT, 
  c_name STRING, 
  c_address STRING, 
  c_nationkey INT, 
  c_phone STRING, 
  c_acctbal DOUBLE, 
  c_mktsegment STRING, 
  c_comment STRING
  ) 
ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION  's3://aws-blog-tew-posts/SSE_KMS_EncryptionData';

I will enter my DDL command statement for the sse_customerinfo table creation into my Athena Query Editor and click the Run Query button. The Results tab will note that query was run successfully and you will see my new table show up under the tables available for the tara_customer_db database.

I will repeat this process to create my cse_customerinfo table from the CSE-KMS encrypted batch of data and then the plain_customerinfo table from the unencrypted data source stored in my S3 bucket. The DDL statements used to create my cse_customerinfo table are as follows:

CREATE EXTERNAL TABLE cse_customerinfo (
  c_custkey INT, 
  c_name STRING, 
  c_address STRING, 
  c_nationkey INT, 
  c_phone STRING, 
  c_acctbal DOUBLE, 
  c_mktsegment STRING, 
  c_comment STRING
)
ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION   's3://aws-blog-tew-posts/CSE_KMS_EncryptionData'
TBLPROPERTIES ('has_encrypted_data'='true');

Again, I will enter my DDL statements above into the Athena Query Editor and click the Run Query button. If you review the DDL statements used to create the cse_customerinfo table carefully, you will notice a new table property (TBLPROPERTIES) flag, has_encrypted_data, was introduced with the new Athena encryption capability. This flag is used to tell Athena that the data in S3 to be used with queries for the specified table is encrypted data. If take a moment and refer back to the encryption matrix table we I reviewed earlier for the Athena and S3 encryption options, you will see that this flag is only required when you are using the Client-Side Encryption with AWS KMS–Managed Keys option. Once the cse_customerinfo table has been successfully created, a key symbol will appear next to the table identifying the table as an encrypted data table.

Finally, I will create the last table, plain_customerinfo, from our sample data. Same steps as we performed for the previous tables. The DDL commands for this table are:

CREATE EXTERNAL TABLE plain_customerinfo(
  c_custkey INT, 
  c_name STRING, 
  c_address STRING, 
  c_nationkey INT, 
  c_phone STRING, 
  c_acctbal DOUBLE, 
  c_mktsegment STRING, 
  c_comment STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 's3://aws-blog-tew-posts/PlainText_Table';


Great! We have successfully read encrypted data from S3 with Athena, and created tables based on the encrypted data. I can now run queries against my newly created encrypted data tables.

 

Running Queries

Running Queries against our new database tables is very simple. Again, common DDL statements and commands can be used to create queries against your data stored in Amazon S3. For our query review, I am going to use Athena’s preview data feature. In the list of tables, you will see two icons beside the tables. One icon is a table property icon, selecting this will bring up the selected table properties, however, the other icon, displayed as an eye symbol, and is the preview data feature that will generate a simple SELECT query statement for the table.

 

 

To demonstrate running queries with Athena, I have selected to preview data for my plain_customerinfo by selecting the eye symbol/icon next to the table. The preview data feature creates the following DDL statement:

SELECT * FROM plain_customerinfo limit 10;

The query results from using the preview data feature with my plain_customerinfo table are displayed in the Results tab of the Athena Query Editor and provides the option to download the query results by clicking the file icon.

The new Athena encrypted data feature also supports encrypting query results and storing these results in Amazon S3. To take advantage of this feature with my query results, I will now encrypt and save my query data in a bucket of my choice. You should note that the data table that I have selected is currently unencrypted.
First, I’ll select the Athena Settings menu and the review the current storage settings for my query results. Since I do not have a KMS key to use for encryption, I will select the Create KMS key hyperlink and create a KMS key for use in encrypting my query results with Athena and S3. For details on how to create a KMS key and configure the appropriate user permissions, please see http://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html.

After successfully creating my s3encryptathena KMS key and copying the key ARN for use in my Athena settings, I return to the Athena console Settings dialog and select the Encrypt query results textbox. I, then update the Query result location textbox point to my s3 bucket, aws-athena-encrypted, which will be the location for storing my encrypted query results.

The only thing that is left is to select the Encryption type and enter my KMS key. I can do this by either selecting the s3encryptathena key from the Encryption key dropdown or enter its ARN in the KMS key ARN textbox. In this example, I have chosen to use SSE-KMS for the encryption type. You can see both examples of selecting the KMS key below. Clicking the Save button completes the process.

Now I will rerun my current query for my plain_customerinfo table. Remember this table is not encrypted, but with the Athena settings changes made for adding encryption for the query results, I have enabled the query results run against this table to be stored with SSE-KMS encryption using my KMS key.

After my query rerun, I can see the fruits of my labor by going to the Amazon S3 console and viewing the CSV data files saved in my designated bucket, aws-athena-encrypted, and the SSE-KMS encryption of the bucket and files.

 

Summary

Needless to say, this Athena launch has several benefits for those needing to secure data via encryption while still retaining the ability to perform queries and analytics for data stored in varying data formats. Additionally, this release includes improvements I did not dive into with this blog post.

  • A new version of the JDBC driver that supports new encryption feature and key updates.
  • Added the ability to add, replace, and change columns using ALTER TABLE.
  • Added support for querying LZO-compressed data.

See the release documentation in the Athena user guide to more details and start leveraging Athena to query your encrypted data stored in Amazon S3 now, by reviewing the Configuring Encryption Options section in the Athena documentation.

Learn more about Athena and serverless queries on Amazon S3 by visiting the Athena product page or reviewing the Athena User Guide. In addition, you can dig deeper on the functionality of Athena and data encryption with S3 by reviewing the AWS Big Data Blog post: Analyzing Data in S3 using Amazon Athena and the AWS KMS Developer Guide.

Happy Encrypting!

Tara

In Case You Missed These: AWS Security Blog Posts from January, February, and March

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/in-case-you-missed-these-aws-security-blog-posts-from-january-february-and-march/

Image of lock and key

In case you missed any AWS Security Blog posts published so far in 2017, they are summarized and linked to below. The posts are shown in reverse chronological order (most recent first), and the subject matter ranges from protecting dynamic web applications against DDoS attacks to monitoring AWS account configuration changes and API calls to Amazon EC2 security groups.

March

March 22: How to Help Protect Dynamic Web Applications Against DDoS Attacks by Using Amazon CloudFront and Amazon Route 53
Using a content delivery network (CDN) such as Amazon CloudFront to cache and serve static text and images or downloadable objects such as media files and documents is a common strategy to improve webpage load times, reduce network bandwidth costs, lessen the load on web servers, and mitigate distributed denial of service (DDoS) attacks. AWS WAF is a web application firewall that can be deployed on CloudFront to help protect your application against DDoS attacks by giving you control over which traffic to allow or block by defining security rules. When users access your application, the Domain Name System (DNS) translates human-readable domain names (for example, www.example.com) to machine-readable IP addresses (for example, 192.0.2.44). A DNS service, such as Amazon Route 53, can effectively connect users’ requests to a CloudFront distribution that proxies requests for dynamic content to the infrastructure hosting your application’s endpoints. In this blog post, I show you how to deploy CloudFront with AWS WAF and Route 53 to help protect dynamic web applications (with dynamic content such as a response to user input) against DDoS attacks. The steps shown in this post are key to implementing the overall approach described in AWS Best Practices for DDoS Resiliency and enable the built-in, managed DDoS protection service, AWS Shield.

March 21: New AWS Encryption SDK for Python Simplifies Multiple Master Key Encryption
The AWS Cryptography team is happy to announce a Python implementation of the AWS Encryption SDK. This new SDK helps manage data keys for you, and it simplifies the process of encrypting data under multiple master keys. As a result, this new SDK allows you to focus on the code that drives your business forward. It also provides a framework you can easily extend to ensure that you have a cryptographic library that is configured to match and enforce your standards. The SDK also includes ready-to-use examples. If you are a Java developer, you can refer to this blog post to see specific Java examples for the SDK. In this blog post, I show you how you can use the AWS Encryption SDK to simplify the process of encrypting data and how to protect your encryption keys in ways that help improve application availability by not tying you to a single region or key management solution.

March 21: Updated CJIS Workbook Now Available by Request
The need for guidance when implementing Criminal Justice Information Services (CJIS)–compliant solutions has become of paramount importance as more law enforcement customers and technology partners move to store and process criminal justice data in the cloud. AWS services allow these customers to easily and securely architect a CJIS-compliant solution when handling criminal justice data, creating a durable, cost-effective, and secure IT infrastructure that better supports local, state, and federal law enforcement in carrying out their public safety missions. AWS has created several documents (collectively referred to as the CJIS Workbook) to assist you in aligning with the FBI’s CJIS Security Policy. You can use the workbook as a framework for developing CJIS-compliant architecture in the AWS Cloud. The workbook helps you define and test the controls you operate, and document the dependence on the controls that AWS operates (compute, storage, database, networking, regions, Availability Zones, and edge locations).

March 9: New Cloud Directory API Makes It Easier to Query Data Along Multiple Dimensions
Today, we made available a new Cloud Directory API, ListObjectParentPaths, that enables you to retrieve all available parent paths for any directory object across multiple hierarchies. Use this API when you want to fetch all parent objects for a specific child object. The order of the paths and objects returned is consistent across iterative calls to the API, unless objects are moved or deleted. In case an object has multiple parents, the API allows you to control the number of paths returned by using a paginated call pattern. In this blog post, I use an example directory to demonstrate how this new API enables you to retrieve data across multiple dimensions to implement powerful applications quickly.

March 8: How to Access the AWS Management Console Using AWS Microsoft AD and Your On-Premises Credentials
AWS Directory Service for Microsoft Active Directory, also known as AWS Microsoft AD, is a managed Microsoft Active Directory (AD) hosted in the AWS Cloud. Now, AWS Microsoft AD makes it easy for you to give your users permission to manage AWS resources by using on-premises AD administrative tools. With AWS Microsoft AD, you can grant your on-premises users permissions to resources such as the AWS Management Console instead of adding AWS Identity and Access Management (IAM) user accounts or configuring AD Federation Services (AD FS) with Security Assertion Markup Language (SAML). In this blog post, I show how to use AWS Microsoft AD to enable your on-premises AD users to sign in to the AWS Management Console with their on-premises AD user credentials to access and manage AWS resources through IAM roles.

March 7: How to Protect Your Web Application Against DDoS Attacks by Using Amazon Route 53 and an External Content Delivery Network
Distributed Denial of Service (DDoS) attacks are attempts by a malicious actor to flood a network, system, or application with more traffic, connections, or requests than it is able to handle. To protect your web application against DDoS attacks, you can use AWS Shield, a DDoS protection service that AWS provides automatically to all AWS customers at no additional charge. You can use AWS Shield in conjunction with DDoS-resilient web services such as Amazon CloudFront and Amazon Route 53 to improve your ability to defend against DDoS attacks. Learn more about architecting for DDoS resiliency by reading the AWS Best Practices for DDoS Resiliency whitepaper. You also have the option of using Route 53 with an externally hosted content delivery network (CDN). In this blog post, I show how you can help protect the zone apex (also known as the root domain) of your web application by using Route 53 to perform a secure redirect to prevent discovery of your application origin.

Image of lock and key

February

February 27: Now Generally Available – AWS Organizations: Policy-Based Management for Multiple AWS Accounts
Today, AWS Organizations moves from Preview to General Availability. You can use Organizations to centrally manage multiple AWS accounts, with the ability to create a hierarchy of organizational units (OUs). You can assign each account to an OU, define policies, and then apply those policies to an entire hierarchy, specific OUs, or specific accounts. You can invite existing AWS accounts to join your organization, and you can also create new accounts. All of these functions are available from the AWS Management Console, the AWS Command Line Interface (CLI), and through the AWS Organizations API.To read the full AWS Blog post about today’s launch, see AWS Organizations – Policy-Based Management for Multiple AWS Accounts.

February 23: s2n Is Now Handling 100 Percent of SSL Traffic for Amazon S3
Today, we’ve achieved another important milestone for securing customer data: we have replaced OpenSSL with s2n for all internal and external SSL traffic in Amazon Simple Storage Service (Amazon S3) commercial regions. This was implemented with minimal impact to customers, and multiple means of error checking were used to ensure a smooth transition, including client integration tests, catching potential interoperability conflicts, and identifying memory leaks through fuzz testing.

February 22: Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials. IAM roles for EC2 make it easier for your applications to make API requests securely from an instance because they do not require you to manage AWS security credentials that the applications use. Recently, we enabled you to use temporary security credentials for your applications by attaching an IAM role to an existing EC2 instance by using the AWS CLI and SDK. To learn more, see New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI. Starting today, you can attach an IAM role to an existing EC2 instance from the EC2 console. You can also use the EC2 console to replace an IAM role attached to an existing instance. In this blog post, I will show how to attach an IAM role to an existing EC2 instance from the EC2 console.

February 22: How to Audit Your AWS Resources for Security Compliance by Using Custom AWS Config Rules
AWS Config Rules enables you to implement security policies as code for your organization and evaluate configuration changes to AWS resources against these policies. You can use Config rules to audit your use of AWS resources for compliance with external compliance frameworks such as CIS AWS Foundations Benchmark and with your internal security policies related to the US Health Insurance Portability and Accountability Act (HIPAA), the Federal Risk and Authorization Management Program (FedRAMP), and other regimes. AWS provides some predefined, managed Config rules. You also can create custom Config rules based on criteria you define within an AWS Lambda function. In this post, I show how to create a custom rule that audits AWS resources for security compliance by enabling VPC Flow Logs for an Amazon Virtual Private Cloud (VPC). The custom rule meets requirement 4.3 of the CIS AWS Foundations Benchmark: “Ensure VPC flow logging is enabled in all VPCs.”

February 13: AWS Announces CISPE Membership and Compliance with First-Ever Code of Conduct for Data Protection in the Cloud
I have two exciting announcements today, both showing AWS’s continued commitment to ensuring that customers can comply with EU Data Protection requirements when using our services.

February 13: How to Enable Multi-Factor Authentication for AWS Services by Using AWS Microsoft AD and On-Premises Credentials
You can now enable multi-factor authentication (MFA) for users of AWS services such as Amazon WorkSpaces and Amazon QuickSight and their on-premises credentials by using your AWS Directory Service for Microsoft Active Directory (Enterprise Edition) directory, also known as AWS Microsoft AD. MFA adds an extra layer of protection to a user name and password (the first “factor”) by requiring users to enter an authentication code (the second factor), which has been provided by your virtual or hardware MFA solution. These factors together provide additional security by preventing access to AWS services, unless users supply a valid MFA code.

February 13: How to Create an Organizational Chart with Separate Hierarchies by Using Amazon Cloud Directory
Amazon Cloud Directory enables you to create directories for a variety of use cases, such as organizational charts, course catalogs, and device registries. Cloud Directory offers you the flexibility to create directories with hierarchies that span multiple dimensions. For example, you can create an organizational chart that you can navigate through separate hierarchies for reporting structure, location, and cost center. In this blog post, I show how to use Cloud Directory APIs to create an organizational chart with two separate hierarchies in a single directory. I also show how to navigate the hierarchies and retrieve data. I use the Java SDK for all the sample code in this post, but you can use other language SDKs or the AWS CLI.

February 10: How to Easily Log On to AWS Services by Using Your On-Premises Active Directory
AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as Microsoft AD, now enables your users to log on with just their on-premises Active Directory (AD) user name—no domain name is required. This new domainless logon feature makes it easier to set up connections to your on-premises AD for use with applications such as Amazon WorkSpaces and Amazon QuickSight, and it keeps the user logon experience free from network naming. This new interforest trusts capability is now available when using Microsoft AD with Amazon WorkSpaces and Amazon QuickSight Enterprise Edition. In this blog post, I explain how Microsoft AD domainless logon works with AD interforest trusts, and I show an example of setting up Amazon WorkSpaces to use this capability.

February 9: New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials that AWS creates, distributes, and rotates automatically. Using temporary credentials is an IAM best practice because you do not need to maintain long-term keys on your instance. Using IAM roles for EC2 also eliminates the need to use long-term AWS access keys that you have to manage manually or programmatically. Starting today, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance. You can also replace the IAM role attached to an existing EC2 instance. In this blog post, I show how you can attach an IAM role to an existing EC2 instance by using the AWS CLI.

February 8: How to Remediate Amazon Inspector Security Findings Automatically
The Amazon Inspector security assessment service can evaluate the operating environments and applications you have deployed on AWS for common and emerging security vulnerabilities automatically. As an AWS-built service, Amazon Inspector is designed to exchange data and interact with other core AWS services not only to identify potential security findings but also to automate addressing those findings. Previous related blog posts showed how you can deliver Amazon Inspector security findings automatically to third-party ticketing systems and automate the installation of the Amazon Inspector agent on new Amazon EC2 instances. In this post, I show how you can automatically remediate findings generated by Amazon Inspector. To get started, you must first run an assessment and publish any security findings to an Amazon Simple Notification Service (SNS) topic. Then, you create an AWS Lambda function that is triggered by those notifications. Finally, the Lambda function examines the findings and then implements the appropriate remediation based on the type of issue.

February 6: How to Simplify Security Assessment Setup Using Amazon EC2 Systems Manager and Amazon Inspector
In a July 2016 AWS Blog post, I discussed how to integrate Amazon Inspector with third-party ticketing systems by using Amazon Simple Notification Service (SNS) and AWS Lambda. This AWS Security Blog post continues in the same vein, describing how to use Amazon Inspector to automate various aspects of security management. In this post, I show you how to install the Amazon Inspector agent automatically through the Amazon EC2 Systems Manager when a new Amazon EC2 instance is launched. In a subsequent post, I will show you how to update EC2 instances automatically that run Linux when Amazon Inspector discovers a missing security patch.

Image of lock and key

January

January 30: How to Protect Data at Rest with Amazon EC2 Instance Store Encryption
Encrypting data at rest is vital for regulatory compliance to ensure that sensitive data saved on disks is not readable by any user or application without a valid key. Some compliance regulations such as PCI DSS and HIPAA require that data at rest be encrypted throughout the data lifecycle. To this end, AWS provides data-at-rest options and key management to support the encryption process. For example, you can encrypt Amazon EBS volumes and configure Amazon S3 buckets for server-side encryption (SSE) using AES-256 encryption. Additionally, Amazon RDS supports Transparent Data Encryption (TDE). Instance storage provides temporary block-level storage for Amazon EC2 instances. This storage is located on disks attached physically to a host computer. Instance storage is ideal for temporary storage of information that frequently changes, such as buffers, caches, and scratch data. By default, files stored on these disks are not encrypted. In this blog post, I show a method for encrypting data on Linux EC2 instance stores by using Linux built-in libraries. This method encrypts files transparently, which protects confidential data. As a result, applications that process the data are unaware of the disk-level encryption.

January 27: How to Detect and Automatically Remediate Unintended Permissions in Amazon S3 Object ACLs with CloudWatch Events
Amazon S3 Access Control Lists (ACLs) enable you to specify permissions that grant access to S3 buckets and objects. When S3 receives a request for an object, it verifies whether the requester has the necessary access permissions in the associated ACL. For example, you could set up an ACL for an object so that only the users in your account can access it, or you could make an object public so that it can be accessed by anyone. If the number of objects and users in your AWS account is large, ensuring that you have attached correctly configured ACLs to your objects can be a challenge. For example, what if a user were to call the PutObjectAcl API call on an object that is supposed to be private and make it public? Or, what if a user were to call the PutObject with the optional Acl parameter set to public-read, therefore uploading a confidential file as publicly readable? In this blog post, I show a solution that uses Amazon CloudWatch Events to detect PutObject and PutObjectAcl API calls in near-real time and helps ensure that the objects remain private by making automatic PutObjectAcl calls, when necessary.

January 26: Now Available: Amazon Cloud Directory—A Cloud-Native Directory for Hierarchical Data
Today we are launching Amazon Cloud Directory. This service is purpose-built for storing large amounts of strongly typed hierarchical data. With the ability to scale to hundreds of millions of objects while remaining cost-effective, Cloud Directory is a great fit for all sorts of cloud and mobile applications.

January 24: New SOC 2 Report Available: Confidentiality
As with everything at Amazon, the success of our security and compliance program is primarily measured by one thing: our customers’ success. Our customers drive our portfolio of compliance reports, attestations, and certifications that support their efforts in running a secure and compliant cloud environment. As a result of our engagement with key customers across the globe, we are happy to announce the publication of our new SOC 2 Confidentiality report. This report is available now through AWS Artifact in the AWS Management Console.

January 18: Compliance in the Cloud for New Financial Services Cybersecurity Regulations
Financial regulatory agencies are focused more than ever on ensuring responsible innovation. Consequently, if you want to achieve compliance with financial services regulations, you must be increasingly agile and employ dynamic security capabilities. AWS enables you to achieve this by providing you with the tools you need to scale your security and compliance capabilities on AWS. The following breakdown of the most recent cybersecurity regulations, NY DFS Rule 23 NYCRR 500, demonstrates how AWS continues to focus on your regulatory needs in the financial services sector.

January 9: New Amazon GameDev Blog Post: Protect Multiplayer Game Servers from DDoS Attacks by Using Amazon GameLift
In online gaming, distributed denial of service (DDoS) attacks target a game’s network layer, flooding servers with requests until performance degrades considerably. These attacks can limit a game’s availability to players and limit the player experience for those who can connect. Today’s new Amazon GameDev Blog post uses a typical game server architecture to highlight DDoS attack vulnerabilities and discusses how to stay protected by using built-in AWS Cloud security, AWS security best practices, and the security features of Amazon GameLift. Read the post to learn more.

January 6: The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016
The following list includes the 10 most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

January 6: FedRAMP Compliance Update: AWS GovCloud (US) Region Receives a JAB-Issued FedRAMP High Baseline P-ATO for Three New Services
Three new services in the AWS GovCloud (US) region have received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) under the Federal Risk and Authorization Management Program (FedRAMP). JAB issued the authorization at the High baseline, which enables US government agencies and their service providers the capability to use these services to process the government’s most sensitive unclassified data, including Personal Identifiable Information (PII), Protected Health Information (PHI), Controlled Unclassified Information (CUI), criminal justice information (CJI), and financial data.

January 4: The Top 20 Most Viewed AWS IAM Documentation Pages in 2016
The following 20 pages were the most viewed AWS Identity and Access Management (IAM) documentation pages in 2016. I have included a brief description with each link to give you a clearer idea of what each page covers. Use this list to see what other people have been viewing and perhaps to pique your own interest about a topic you’ve been meaning to research.

January 3: The Most Viewed AWS Security Blog Posts in 2016
The following 10 posts were the most viewed AWS Security Blog posts that we published during 2016. You can use this list as a guide to catch up on your blog reading or even read a post again that you found particularly useful.

January 3: How to Monitor AWS Account Configuration Changes and API Calls to Amazon EC2 Security Groups
You can use AWS security controls to detect and mitigate risks to your AWS resources. The purpose of each security control is defined by its control objective. For example, the control objective of an Amazon VPC security group is to permit only designated traffic to enter or leave a network interface. Let’s say you have an Internet-facing e-commerce website, and your security administrator has determined that only HTTP (TCP port 80) and HTTPS (TCP 443) traffic should be allowed access to the public subnet. As a result, your administrator configures a security group to meet this control objective. What if, though, someone were to inadvertently change this security group’s rules and enable FTP or other protocols to access the public subnet from any location on the Internet? That expanded access could weaken the security posture of your assets. Consequently, your administrator might need to monitor the integrity of your company’s security controls so that the controls maintain their desired effectiveness. In this blog post, I explore two methods for detecting unintended changes to VPC security groups. The two methods address not only control objectives but also control failures.

If you have questions about or issues with implementing the solutions in any of these posts, please start a new thread on the forum identified near the end of each post.

– Craig

s2n Is Now Handling 100 Percent of SSL Traffic for Amazon S3

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/s2n-is-now-handling-100-percent-of-of-ssl-traffic-for-amazon-s3/

s2n logo

In June 2015, we introduced s2n, an open-source implementation of the TLS encryption protocol, making the source code publicly available under the terms of the Apache Software License 2.0 from the s2n GitHub repository. One of the key benefits to s2n is far less code surface, with approximately 6,000 lines of code (compared to OpenSSL’s approximately 500,000 lines). In less than two years, we’ve seen significant enhancements to s2n, with more than 1,000 code commits, plus the addition of fuzz testing and a static analysis tool, tis-interpreter.

Today, we’ve achieved another important milestone for securing customer data: we have replaced OpenSSL with s2n for all internal and external SSL traffic in Amazon Simple Storage Service (Amazon S3) commercial regions. This was implemented with minimal impact to customers, and multiple means of error checking were used to ensure a smooth transition, including client integration tests, catching potential interoperability conflicts, and identifying memory leaks through fuzz testing.

It was only last week that AWS CEO Andy Jassy reiterated something that’s been a continual theme for us here at AWS: “There’s so much security built into cloud computing platforms today, for us, it’s our No. 1 priority—it’s not even close, relative to anything else.” Yes, security remains our top priority, and our commitment to making formal verification of automated reasoning more efficient exemplifies the way we think about our tools and services. Making encryption more developer friendly is critical to what can be a complicated architectural universe. To help make security more robust and precise, we put mechanisms in place to verify every change, including negative test cases that “verify the verifier” by deliberately introducing an error into a test-only build and confirming that the tools reject it.

If you are interested in using or contributing to s2n, the source code, documentation, commits, and enhancements are all publicly available under the terms of the Apache Software License 2.0 from the s2n GitHub repository.

– Steve

AWS Announces CISPE Membership and Compliance with First-Ever Code of Conduct for Data Protection in the Cloud

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-announces-cispe-membership-and-compliance-with-first-ever-code-of-conduct-for-data-protection-in-the-cloud/

CISPE logo

I have two exciting announcements today, both showing AWS’s continued commitment to ensuring that customers can comply with EU Data Protection requirements when using our services.

AWS and CISPE

First, I’m pleased to announce AWS’s membership in the Association of Cloud Infrastructure Services Providers in Europe (CISPE).

CISPE is a coalition of about twenty cloud infrastructure (also known as Infrastructure as a Service) providers who offer cloud services to customers in Europe. CISPE was created to promote data security and compliance within the context of cloud infrastructure services. This is a vital undertaking: both customers and providers now understand that cloud infrastructure services are very different from traditional IT services (and even from other cloud services such as Software as a Service). Many entities were treating all cloud services as the same in the context of data protection, which led to confusion on both the part of the customer and providers with regard to their individual obligations.

One of CISPE’s key priorities is to ensure customers get what they need from their cloud infrastructure service providers in order to comply with the new EU General Data Protection Regulation (GDPR). With the publication of its Data Protection Code of Conduct for Cloud Infrastructure Services Providers, CISPE has already made significant progress in this space.

AWS and the Code of Conduct

My second announcement is in regard to the CISPE Code of Conduct itself. I’m excited to inform you that today, AWS has declared that Amazon EC2, Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), AWS Identity and Access Management (IAM), AWS CloudTrail, and Amazon Elastic Block Store (Amazon EBS) are now fully compliant with the aforementioned CISPE Code of Conduct. This provides our customers with additional assurances that they fully control their data in a safe, secure, and compliant environment when they use AWS. Our compliance with the Code of Conduct adds to the long list of internationally recognized certifications and accreditations AWS already has, including ISO 27001, ISO 27018, ISO 9001, SOC 1, SOC 2, SOC 3, PCI DSS Level 1, and many more.

Additionally, the Code of Conduct is a powerful tool to help our customers who must comply with the EU GDPR.

A few key benefits of the Code of Conduct include:

  • Clarifying who is responsible for what when it comes to data protection: The Code of Conduct explains the role of both the provider and the customer under the GDPR, specifically within the context of cloud infrastructure services.
  • The Code of Conduct sets out what principles providers should adhere to: The Code of Conduct develops key principles within the GDPR about clear actions and commitments that providers should undertake to help customers comply. Customers can rely on these concrete benefits in their own compliance and data protection strategies.
  • The Code of Conduct gives customers the security information they need to make decisions about compliance: The Code of Conduct requires providers to be transparent about the steps they are taking to deliver on their security commitments. To name but a few, these steps involve notification around data breaches, data deletion, and third-party sub-processing, as well as law enforcement and governmental requests. Customers can use this information to fully understand the high levels of security provided.

I’m proud that AWS is now a member of CISPE and that we’ve played a part in the development of the Code of Conduct. Due to the very specific considerations that apply to cloud infrastructure services, and given the general lack of understanding of how cloud infrastructure services actually work, there is a clear need for an association such as CISPE. It’s important for AWS to play an active role in CISPE in order to represent the best interests of our customers, particularly when it comes to the EU Data Protection requirements.

AWS has always been committed to enabling our customers to meet their data protection needs. Whether it’s allowing our customers to choose where in the world they wish to store their content, obtaining approval from the EU Data Protection authorities (known as the Article 29 Working Party) of the AWS Data Processing Addendum and Model Clauses to enable transfers of personal data outside Europe, or simply being transparent about the way our services operate, we work hard to be market leaders in the area of security, compliance, and data protection.

Our decision to participate in CISPE and its Code of Conduct sends a clear a message to our customers that we continue to take data protection very seriously.

– Steve

The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/the-top-10-most-downloaded-aws-security-and-compliance-documents-in-2016/

The following list includes the ten most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

  1. Service Organization Controls (SOC) 3 Report – This publicly available report describes internal controls for security, availability, processing integrity, confidentiality, or privacy.
  2. AWS Best Practices for DDoS Resiliency – This whitepaper covers techniques to mitigate distributed denial of service (DDoS) attacks.
  3. Architecting for HIPAA Security and Compliance on AWS – This whitepaper describes how to leverage AWS to develop applications that meet HIPAA and HITECH compliance requirements.
  4. ISO 27001 Certification – The ISO 27001 certification of our Information Security Management System (ISMS) covers our infrastructure, data centers, and services including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), and Amazon Virtual Private Cloud (Amazon VPC).
  5. AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
  6. AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
  7. ISO 27017 Certification – The ISO 27017 certification provides guidance about the information security aspects of cloud computing, recommending the implementation of cloud-specific information security controls that supplement the guidance of the ISO 27002 and ISO 27001 standards.
  8. AWS Whitepaper on EU Data Protection – This whitepaper provides information about how to meet EU compliance requirements when using AWS services.
  9. PCI Compliance in the AWS Cloud: Technical Workbook – This workbook provides guidance about building an environment in AWS that is compliant with the Payment Card Industry Data Security Standard (PCI DSS).
  10. Auditing Security Checklist – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.

– Sara

Now Open – AWS London Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-london-region/

Last week we launched our 15th AWS Region and today we are launching our 16th. We have expanded the AWS footprint into the United Kingdom with a new Region in London, our third in Europe. AWS customers can use the new London Region to better serve end-users in the United Kingdom and can also use it to store data in the UK.

The Details
The new London Region provides a broad suite of AWS services including Amazon CloudWatch, Amazon DynamoDB, Amazon ECS, Amazon ElastiCache, Amazon Elastic Block Store (EBS), Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon EMR, Amazon Glacier, Amazon Kinesis Streams, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Elastic Beanstalk, AWS Snowball, AWS Snowmobile, AWS Key Management Service (KMS), AWS Marketplace, AWS OpsWorks, AWS Personal Health Dashboard, AWS Shield Standard, AWS Storage Gateway, AWS Support API, Elastic Load Balancing, VM Import/Export, Amazon CloudFront, Amazon Route 53, AWS WAF, AWS Trusted Advisor, and AWS Direct Connect (follow the links for pricing and other information).

The London Region supports all sizes of C4, D2, M4, T2, and X1 instances.

Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

From Our Customers
Many AWS customers are getting ready to use this new Region. Here’s a very small sample:

Trainline is Europe’s number one independent rail ticket retailer. Every day more than 100,000 people travel using tickets bought from Trainline. Here’s what Mark Holt (CTO of Trainline) shared with us:

We recently completed the migration of 100 percent of our eCommerce infrastructure to AWS and have seen awesome results: improved security, 60 percent less downtime, significant cost savings and incredible improvements in agility. From extensive testing, we know that 0.3s of latency is worth more than 8 million pounds and so, while AWS connectivity is already blazingly fast, we expect that serving our UK customers from UK datacenters should lead to significant top-line benefits.

Kainos Evolve Electronic Medical Records (EMR) automates the creation, capture and handling of medical case notes and operational documents and records, allowing healthcare providers to deliver better patient safety and quality of care for several leading NHS Foundation Trusts and market leading healthcare technology companies.

Travis Perkins, the largest supplier of building materials in the UK, is implementing the biggest systems and business change in its history including the migration of its datacenters to AWS.

Just Eat is the world’s leading marketplace for online food delivery. Using AWS, JustEat has been able to experiment faster and reduce the time to roll out new feature updates.

OakNorth, a new bank focused on lending between £1m-£20m to entrepreneurs and growth businesses, became the UK’s first cloud-based bank in May after several months of working with AWS to drive the development forward with the regulator.

Partners
I’m happy to report that we are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in the United Kingdom. Here’s a partial list:

  • AWS Premier Consulting Partners – Accenture, Claranet, Cloudreach, CSC, Datapipe, KCOM, Rackspace, and Slalom.
  • AWS Consulting Partners – Attenda, Contino, Deloitte, KPMG, LayerV, Lemongrass, Perfect Image, and Version 1.
  • AWS Technology Partners – Splunk, Sage, Sophos, Trend Micro, and Zerolight.
  • AWS Managed Service Partners – Claranet, Cloudreach, KCOM, and Rackspace.
  • AWS Direct Connect Partners – AT&T, BT, Hutchison Global Communications, Level 3, Redcentric, and Vodafone.

Here are a few examples of what our partners are working on:

KCOM is a professional services provider offering consultancy, architecture, project delivery and managed service capabilities to large UK-based enterprise businesses. The scalability and flexibility of AWS gives them a significant competitive advantage with their enterprise and public sector customers. The new Region will allow KCOM to build innovative solutions for their public sector clients while meeting local regulatory requirements.

Splunk is a member of the AWS Partner Network and a market leader in analyzing machine data to deliver operational intelligence for security, IT, and the business. They use cloud computing and big data analytics to help their customers to embrace digital transformation and continuous innovation. The new Region will provide even more companies with real-time visibility into the operation of their systems and infrastructure.

Redcentric is a NHS Digital-approved N3 Commercial Aggregator. Their work allows health and care providers such as NHS acute, emergency and mental trusts, clinical commissioning groups (CCGs), and the ISV community to connect securely to AWS. The London Region will allow health and care providers to deliver new digital services and to improve outcomes for citizens and patients.

Visit the AWS Partner Network page to read some case studies and to learn how to join.

Compliance & Connectivity
Every AWS Region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, SOC3, PCI DSS Level 1, and many more. Our Cloud Compliance page includes information about these standards, along with those that are specific to the UK, including Cyber Essentials Plus.

The UK Government recognizes that local datacenters from hyper scale public cloud providers can deliver secure solutions for OFFICIAL workloads. In order to meet the special security needs of public sector organizations in the UK with respect to OFFICIAL workloads, we have worked with our Direct Connect Partners to make sure that obligations for connectivity to the Public Services Network (PSN) and N3 can be met.

Use it Today
The London Region is open for business now and you can start using it today! If you need additional information about this Region, please feel free to contact our UK team at [email protected].

Jeff;