Tag Archives: Amazon Glacier

New – AWS Resource Tagging API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-resource-tagging-api/

AWS customers frequently use tags to organize their Amazon EC2 instances, Amazon EBS volumes, Amazon S3 buckets, and other resources. Over the past couple of years we have been working to make tagging more useful and more powerful. For example, we have added support for tagging during Auto Scaling, the ability to use up to 50 tags per resource, console-based support for the creation of resources that share a common tag (also known as resource groups), and the option to use Config Rules to enforce the use of tags.

As customers grow to the point where they are managing thousands of resources, each with up to 50 tags, they have been looking to us for additional tooling and options to simplify their work. Today I am happy to announce that our new Resource Tagging API is now available. You can use these APIs from the AWS SDKs or via the AWS Command Line Interface (CLI). You now have programmatic access to the same resource group operations that had been accessible only from the AWS Management Console.

Recap: Console-Based Resource Group Operations
Before I get in to the specifics of the new API functions, I thought you would appreciate a fresh look at the console-based grouping and tagging model. I already have the ability to find and then tag AWS resources using a search that spans one or more regions. For example, I can select a long list of regions and then search them for my EC2 instances like this:

After I locate and select all of the desired resources, I can add a new tag key by clicking Create a new tag key and entering the desired tag key:

Then I enter a value for each instance (the new ProjectCode column):

Then I can create a resource group that contains all of the resources that are tagged with P100:

After I have created the resource group, I can locate all of the resources by clicking on the Resource Groups menu:

To learn more about this feature, read Resource Groups and Tagging for AWS.

New API for Resource Tagging
The API that we are announcing today gives you power to tag, untag, and locate resources using tags, all from your own code. With these new API functions, you are now able to operate on multiple resource types with a single set of functions.

Here are the new functions:

TagResources – Add tags to up to 20 resources at a time.

UntagResources – Remove tags from up to 20 resources at a time.

GetResources – Get a list of resources, with optional filtering by tags and/or resource types.

GetTagKeys – Get a list of all of the unique tag keys used in your account.

GetTagValues – Get all tag values for a specified tag key.

These functions support the following AWS services and resource types:

AWS Service Resource Types
Amazon CloudFront Distribution.
Amazon EC2 AMI, Customer Gateway, DHCP Option, EBS Volume, Instance, Internet Gateway, Network ACL, Network Interface, Reserved Instance, Reserved Instance Listing, Route Table, Security Group – EC2 Classic, Security Group – VPC, Snapshot, Spot Batch, Spot Instance Request, Spot Instance, Subnet, Virtual Private Gateway, VPC, VPN Connection.
Amazon ElastiCache Cluster, Snapshot.
Amazon Elastic File System Filesystem.
Amazon Elasticsearch Service Domain.
Amazon EMR Cluster.
Amazon Glacier Vault.
Amazon Inspector Assessment.
Amazon Kinesis Stream.
Amazon Machine Learning Batch Prediction, Data Source, Evaluation, ML Model.
Amazon Redshift Cluster.
Amazon Relational Database Service DB Instance, DB Option Group, DB Parameter Group, DB Security Group, DB Snapshot, DB Subnet Group, Event Subscription, Read Replica, Reserved DB Instance.
Amazon Route 53 Domain, Health Check, Hosted Zone.
Amazon S3 Bucket.
Amazon WorkSpaces WorkSpace.
AWS Certificate Manager Certificate.
AWS CloudHSM HSM.
AWS Directory Service Directory.
AWS Storage Gateway Gateway, Virtual Tape, Volume.
Elastic Load Balancing Load Balancer, Target Group.

Things to Know
Here are a couple of things to keep in mind when you build code or write scripts that use the new API functions or the CLI equivalents:

Compatibility – The older, service-specific functions remain available and you can continue to use them.

Write Permission – The new tagging API adds another layer of permission on top of existing policies that are specific to a single AWS service. For example, you will need to have access to tag:tagResources and EC2:createTags in order to add a tag to an EC2 instance.

Read Permission – You will need to have access to tag:GetResources, tag:GetTagKeys, and tag:GetTagValues in order to call functions that access tags and tag values.

Pricing – There is no charge for the use of these functions or for tags.

Available Now
The new functions are supported by the latest versions of the AWS SDKs. You can use them to tag and access resources in all commercial AWS regions.

Jeff;

 

AWS Achieves FedRAMP Authorization for New Services in the AWS GovCloud (US) Region

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-authorization-for-a-wide-array-of-services/

Today, we’re pleased to announce an array of AWS services that are available in the AWS GovCloud (US) Region and have achieved Federal Risk and Authorization Management Program (FedRAMP) High authorizations. The FedRAMP Joint Authorization Board (JAB) has issued Provisional Authority to Operate (P-ATO) approvals, which are effective immediately. If you are a federal or commercial customer, you can use these services to process and store your critical workloads in the AWS GovCloud (US) Region’s authorization boundary with data up to the high impact level.

The services newly available in the AWS GovCloud (US) Region include database, storage, data warehouse, security, and configuration automation solutions that will help you increase your ability to manage data in the cloud. For example, with AWS CloudFormation, you can deploy AWS resources by automating configuration processes. AWS Key Management Service (KMS) enables you to create and control the encryption keys used to secure your data. Amazon Redshift enables you to analyze all your data cost effectively by using existing business intelligence tools to automate common administrative tasks for managing, monitoring, and scaling your data warehouse.

Our federal and commercial customers can now leverage our FedRAMP P-ATO to access the following services:

  • CloudFormation – CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion. You can use sample templates in CloudFormation, or create your own templates to describe the AWS resources and any associated dependencies or run-time parameters required to run your application.
  • Amazon DynamoDBAmazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models.
  • Amazon EMRAmazon EMR provides a managed Hadoop framework that makes it efficient and cost effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and DynamoDB.
  • Amazon GlacierAmazon Glacier is a secure, durable, and low-cost cloud storage service for data archiving and long-term backup. Customers can reliably store large or small amounts of data for as little as $0.004 per gigabyte per month, a significant savings compared to on-premises solutions.
  • KMS – KMS is a managed service that makes it easier for you to create and control the encryption keys used to encrypt your data, and uses Hardware Security Modules (HSMs) to protect the security of your keys. KMS is integrated with other AWS services to help you protect the data you store with these services. For example, KMS is integrated with CloudTrail to provide you with logs of all key usage and help you meet your regulatory and compliance needs.
  • Redshift – Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost effective to analyze all your data by using your existing business intelligence tools.
  • Amazon Simple Notification Service (SNS)Amazon SNS is a fast, flexible, fully managed push notification service that lets you send individual messages or “fan out” messages to large numbers of recipients. SNS makes it simple and cost effective to send push notifications to mobile device users and email recipients or even send messages to other distributed services.
  • Amazon Simple Queue Service (SQS)Amazon SQS is a fully-managed message queuing service for reliably communicating among distributed software components and microservices—at any scale. Using SQS, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be always available.
  • Amazon Simple Workflow Service (SWF)Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. SWF is a fully managed state tracker and task coordinator in the cloud.

AWS works closely with the FedRAMP Program Management Office (PMO), National Institute of Standards and Technology (NIST), and other federal regulatory and compliance bodies to ensure that we provide you with the cutting-edge technology you need in a secure and compliant fashion. We are working with our authorizing officials to continue to expand the scope of our authorized services, and we are fully committed to ensuring that AWS GovCloud (US) continues to offer government customers the most comprehensive mix of functionality and security.

– Chad

AWS Monthly Online Tech Talks – March, 2017

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-monthly-online-tech-talks-march-2017/

Unbelievably it is March already, as you enter into the madness of March don’t forget to take some time and learning more about the latest service innovations from AWS. Each month, we have a series of webinars targeting best practices and new service features in the AWS Cloud.

I have shared below the schedule for the live, online technical sessions scheduled for the month of March. Remember these talks are free, but they fill up quickly so register ahead of time. The online tech talks scheduled times are shown in Pacific Time (PT) time zone.

Webinars featured this month are as follows:

Tuesday, March 21

Big Data

9:00 AM – 10:00 AM: Deploying a Data Lake in AWS

Databases

10:30 AM – 11:30 AM: Optimizing the Data Tier for Serverless Web Applications

IoT

12:00 Noon – 1:00 PM: One Click Enterprise IoT Services

 

Wednesday, March 22

Databases

10:30 – 11:30 AM: ElastiCache Deep Dive: Best Practices and Usage Patterns

Mobile

12:00 Noon – 1:00 PM: A Deeper Dive into Apache MXNet on AWS

 

Thursday, March 23

IoT

9:00 – 10:00 AM: Developing Applications with the IoT Button

Compute

10:30 – 11:30 AM: Automating Management of Amazon EC2 Instances with Auto Scaling

 

Friday, March 24

Compute

10:30 – 11:30 AM: An Overview of Designing Microservices Based Applications on AWS

 

Monday, March 27

AI

9:00 – 10:00 AM: How to get the most out of Amazon Polly, a text-to-speech service

 

Tuesday, March 28

Compute

10:30 AM – 11:30 AM: Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements

Getting Started

12:00 Noon – 1:30 PM: Getting Started with AWS

 

Wednesday, March 29

Security

9:00 – 10:00 AM: Best Practices for Managing Security Operations in AWS

Storage

10:30 – 11:30 AM: Deep Dive on Amazon S3

Big Data

12:00 Noon – 1:00 PM: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis

 

Thursday, March 30

Storage

9:00 – 10:00 AM: Active Archiving with Amazon S3 and Tiering to Amazon Glacier

Mobile

10:30 AM – 11:30 AM: Deep Dive on Amazon Cognito

Compute

12:00 Noon – 1:00 PM: Building a Development Workflow for Serverless Applications

 

The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These technical sessions are led by AWS solutions architects and engineers and feature live demonstrations & customer examples. You can also check out the AWS on-demand webinar series on the AWS YouTube channel.

Tara

 

What’s the Diff: Hot and Cold Data Storage

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/whats-the-diff-hot-and-cold-data-storage/

Hot And Cold Storage

Differentiating cloud data storage by “temperature” is common practice when it comes to describing the tiered storage setups offered by various cloud storage providers. “Hot” and “cold” describes how often that data is accessed. What’s the actual difference, and how does each temperature fit your cloud storage strategy? Let’s take a look.

First of all, let’s get this out of the way: There’s no set industry definition of what hot and cold actually mean. So some of this may need to be adapted to your specific circumstances. You’re bound to see some variance or disagreement if you research the topic.

Hot Storage

“Hot” storage is data you need to access right away, where performance is at a premium. Hot storage often goes hand in hand with cloud computing. If you’re depending on cloud services not only to store your data but also to process it, you’re looking at hot storage.

Business-critical information that needs to be accessed frequently and quickly is hot storage. If performance is of the essence – if you need the data stored on SSDs instead of hard drives, because speed is that much of a factor – then that’s hot storagae.

High-performance primary storage comes at a price, though. Cloud data storage providers charge a premium for hot data storage, because it’s resource-intensive. Microsoft’s Azure Hot Blobs and Amazon AWS services don’t come cheap.

Read on for how our B2 Cloud Storage fits the hot storage model. But first, let’s talk about cold storage.

Cold Storage

“Cold” storage is information that you don’t need to access very often. Inactive data that doesn’t need to be accessed for months, years, decades, potentially ever. That’s the sort of content that cold storage is ideal for. Practical examples of data suitable for cold storage include old projects, records you might need for auditing or bookkeeping purposes at some point in the future, or other content you only need to access infrequently.

Data retrieval and response time for cold cloud storage systems are typically slower than services designed for active data manipulation. Practical examples of cold cloud storage include services like Amazon Glacier and Google Coldline.

Storage prices for cold cloud storage systems are typically lower than warm or hot storage. But cold storage often incur higher per-operation costs than other kinds of cloud storage. Access to the data typically requires patience and planning.

Apocryphally, “cold” storage meant just that: Data physically stored away from the hot machines running the media. Today, cold storage is still sometimes used to describe purely offline storage – that is, data that’s not stored in the cloud at all. Sometimes this is data that you might want to quarantine from from the Internet altogether – for example, cryptocurrency like BitCoin. Sometimes this is that old definition of cold storage: data that is archived on some sort of durable medium and stored in a secure offsite facility.

How B2 Cloud Storage Fits the Cold and Hot Model

We’ve designed B2 Cloud Storage to be instantly available. With B2, you won’t have delays accessing your information like you might have with offline or some nearline systems. Your data is available when you need it.

B2 is built on the physical architecture and advanced software framework we’ve been developing for the past decade to power our signature backup services. B2 Cloud Storage sports multiple layers of redundancy to make sure that your data is stored safely and is available when you need it.

We’ve taken the concept of hot storage a step further by offering reliable, affordable, and scalable storage in the cloud for a mere fraction of what others charge. We’re one-quarter the price of Amazon.

B2 Cloud Storage changes the pricing model for cloud storage. B2 changes the pricing model so much that our customers have found it economical to migrate away altogether from slow, inconvenient and frustrating cold storage and offline archival systems. Our media and entertainment customers are using B2 instead of LTO tape systems, for example.

What Temperature Is Your Cloud Storage?

Different organizations have different needs, so there’s no right answer about what temperature your cloud data should be. It’s imperative to your bottom line that you don’t pay for more than what you need. That’s why we’ve designed B2 to be an affordable and reliable cloud storage solution. Get started today and you’ll get the first 10GB for free!

Have a different idea of what hot and cold storage are? Have questions that aren’t answered here? Join the discussion!

The post What’s the Diff: Hot and Cold Data Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe

Post Syndicated from Rick Wiggins original https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/

Most systems use Java Script Object Notation (JSON) to log event information. Although it’s efficient and flexible, deriving information from JSON is difficult.

In this post, you will use the tightly coupled integration of Amazon Kinesis Firehose for log delivery, Amazon S3 for log storage, and Amazon Athena with JSONSerDe to run SQL queries against these logs without the need for data transformation or insertion into a database. It’s done in a completely serverless way. There’s no need to provision any compute.

Amazon SES provides highly detailed logs for every message that travels through the service and, with SES event publishing, makes them available through Firehose. However, parsing detailed logs for trends or compliance data would require a significant investment in infrastructure and development time. Athena is a boon to these data seekers because it can query this dataset at rest, in its native format, with zero code or architecture. On top of that, it uses largely native SQL queries and syntax.

Walkthrough: Establishing a dataset

We start with a dataset of an SES send event that looks like this:

{
	"eventType": "Send",
	"mail": {
		"timestamp": "2017-01-18T18:08:44.830Z",
		"source": "[email protected]",
		"sourceArn": "arn:aws:ses:us-west-2:111222333:identity/[email protected]",
		"sendingAccountId": "111222333",
		"messageId": "01010159b2c4471e-fc6e26e2-af14-4f28-b814-69e488740023-000000",
		"destination": ["[email protected]"],
		"headersTruncated": false,
		"headers": [{
				"name": "From",
				"value": "[email protected]"
			}, {
				"name": "To",
				"value": "[email protected]"
			}, {
				"name": "Subject",
				"value": "Bounced Like a Bad Check"
			}, {
				"name": "MIME-Version",
				"value": "1.0"
			}, {
				"name": "Content-Type",
				"value": "text/plain; charset=UTF-8"
			}, {
				"name": "Content-Transfer-Encoding",
				"value": "7bit"
			}
		],
		"commonHeaders": {
			"from": ["[email protected]"],
			"to": ["[email protected]"],
			"messageId": "01010159b2c4471e-fc6e26e2-af14-4f28-b814-69e488740023-000000",
			"subject": "Test"
		},
		"tags": {
			"ses:configuration-set": ["Firehose"],
			"ses:source-ip": ["54.55.55.55"],
			"ses:from-domain": ["amazon.com"],
			"ses:caller-identity": ["root"]
		}
	},
	"send": {}
}

This dataset contains a lot of valuable information about this SES interaction. There are thousands of datasets in the same format to parse for insights. Getting this data is straightforward.

1. Create a configuration set in the SES console or CLI that uses a Firehose delivery stream to send and store logs in S3 in near real-time.
NestedJson_1

2. Use SES to send a few test emails. Be sure to define your new configuration set during the send.

To do this, when you create your message in the SES console, choose More options. This will display more fields, including one for Configuration Set.
NestedJson_2
You can also use your SES verified identity and the AWS CLI to send messages to the mailbox simulator addresses.

$ aws ses send-email --to [email protected] --from [email protected] --subject "Bounced Like a Bad Check" --text "This should bounce" --configuration-set-name Firehose

3. Select your S3 bucket to see that logs are being created.
NestedJson_3

Walkthrough: Querying with Athena

Amazon Athena is an interactive query service that makes it easy to use standard SQL to analyze data resting in Amazon S3. Athena requires no servers, so there is no infrastructure to manage. You pay only for the queries you run. This makes it perfect for a variety of standard data formats, including CSV, JSON, ORC, and Parquet.

You now need to supply Athena with information about your data and define the schema for your logs with a Hive-compliant DDL statement. Athena uses Presto, a distributed SQL engine, to run queries. It also uses Apache Hive DDL syntax to create, drop, and alter tables and partitions. Athena uses an approach known as schema-on-read, which allows you to use this schema at the time you execute the query. Essentially, you are going to be creating a mapping for each field in the log to a corresponding column in your results.

If you are familiar with Apache Hive, you might find creating tables on Athena to be pretty similar. You can create tables by writing the DDL statement in the query editor or by using the wizard or JDBC driver. An important part of this table creation is the SerDe, a short name for “Serializer and Deserializer.” Because your data is in JSON format, you will be using org.openx.data.jsonserde.JsonSerDe, natively supported by Athena, to help you parse the data. Along the way, you will address two common problems with Hive/Presto and JSON datasets:

  • Nested or multi-level JSON.
  • Forbidden characters (handled with mappings).

In the Athena Query Editor, use the following DDL statement to create your first Athena table. For  LOCATION, use the path to the S3 bucket for your logs:

CREATE EXTERNAL TABLE sesblog (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

In this DDL statement, you are declaring each of the fields in the JSON dataset along with its Presto data type. You are using Hive collection data types like Array and Struct to set up groups of objects.

Walkthrough: Nested JSON

Defining the mail key is interesting because the JSON inside is nested three levels deep. In the example, you are creating a top-level struct called mail which has several other keys nested inside. This includes fields like messageId and destination at the second level. You can also see that the field timestamp is surrounded by the backtick (`) character. timestamp is also a reserved Presto data type so you should use backticks here to allow the creation of a column of the same name without confusing the table creation command. On the third level is the data for headers. It contains a group of entries in name:value pairs. You define this as an array with the structure of <name:string,value:string> defining your schema expectations here. You must enclose `from` in the commonHeaders struct with backticks to allow this reserved word column creation.

Now that you have created your table, you can fire off some queries!

SELECT * FROM sesblog limit 10;

This output shows your two top-level columns (eventType and mail) but this isn’t useful except to tell you there is data being queried. You can use some nested notation to build more relevant queries to target data you care about.

“Which messages did I bounce from Monday’s campaign?”

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID,
       mail.timestamp as Timestamp
FROM sesblog
WHERE eventType = 'Bounce' and mail.timestamp like '2017-01-09%'

“How many messages have I bounced to a specific domain?”

SELECT COUNT(*) as Bounces 
FROM sesblog
WHERE eventType = 'Bounce' and mail.destination like '%amazonses.com%'

“Which messages did I bounce to the domain amazonses.com?”

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID 
FROM sesblog
WHERE eventType = 'Bounce' and mail.destination like '%amazonses.com%'

There are much deeper queries that can be written from this dataset to find the data relevant to your use case. You might have noticed that your table creation did not specify a schema for the tags section of the JSON event. You’ll do that next.

Walkthrough: Handling forbidden characters with mappings

Here is a major roadblock you might encounter during the initial creation of the DDL to handle this dataset: you have little control over the data format provided in the logs and Hive uses the colon (:) character for the very important job of defining data types. You need to give the JSONSerDe a way to parse these key fields in the tags section of your event. This is some of the most crucial data in an auditing and security use case because it can help you determine who was responsible for a message creation.

In the Athena query editor, use the following DDL statement to create your second Athena table. For LOCATION, use the path to the S3 bucket for your logs:

CREATE EXTERNAL TABLE sesblog2 (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

In your new table creation, you have added a section for SERDEPROPERTIES. This allows you to give the SerDe some additional information about your dataset. For your dataset, you are using the mapping property to work around your data containing a column name with a colon smack in the middle of it. ses:configuration-set would be interpreted as a column named ses with the datatype of configuration-set. Unlike your earlier implementation, you can’t surround an operator like that with backticks. The JSON SERDEPROPERTIES mapping section allows you to account for any illegal characters in your data by remapping the fields during the table’s creation.

For example, you have simply defined that the column in the ses data known as ses:configuration-set will now be known to Athena and your queries as ses_configurationset. This mapping doesn’t do anything to the source data in S3. This is a Hive concept only. It won’t alter your existing data. You have set up mappings in the Properties section for the four fields in your dataset (changing all instances of colon to the better-supported underscore) and in your table creation you have used those new mapping names in the creation of the tags struct.

Now that you have access to these additional authentication and auditing fields, your queries can answer some more questions.

“Who is creating all of these bounced messages?”

SELECT eventtype as Event,
         mail.timestamp as Timestamp,
         mail.tags.ses_source_ip as SourceIP,
         mail.tags.ses_caller_identity as AuthenticatedBy,
         mail.commonHeaders."from" as FromAddress,
         mail.commonHeaders.to as ToAddress
FROM sesblog2
WHERE eventtype = 'Bounce'

Of special note here is the handling of the column mail.commonHeaders.”from”. Because from is a reserved operational word in Presto, surround it in quotation marks (“) to keep it from being interpreted as an action.

Walkthrough: Querying using SES custom tagging

What makes this mail.tags section so special is that SES will let you add your own custom tags to your outbound messages. Now you can label messages with tags that are important to you, and use Athena to report on those tags. For example, if you wanted to add a Campaign tag to track a marketing campaign, you could use the –tags flag to send a message from the SES CLI:

$ aws ses send-email --to [email protected] --from [email protected] --subject "Perfume Campaign Test" --text "Buy our Smells" --configuration-set-name Firehose --tags Name=Campaign,Value=Perfume

This results in a new entry in your dataset that includes your custom tag.

…
		"tags": {
			"ses:configuration-set": ["Firehose"],
			"Campaign": ["Perfume"],
			"ses:source-ip": ["54.55.55.55"],
			"ses:from-domain": ["amazon.com"],
			"ses:caller-identity": ["root"],
			"ses:outgoing-ip": ["54.240.27.11"]
		}
…

You can then create a third table to account for the Campaign tagging.

CREATE EXTERNAL TABLE sesblog3 (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:string,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,Campaign:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

Then you can use this custom value to begin to query which you can define on each outbound email.

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID,
       mail.tags.Campaign as Campaign
FROM sesblog3
where mail.tags.Campaign like '%Perfume%'

NestedJson_4

Walkthrough: Building your own DDL programmatically with hive-json-schema

In all of these examples, your table creation statements were based on a single SES interaction type, send. SES has other interaction types like delivery, complaint, and bounce, all which have some additional fields. I’ll leave you with this, a master DDL that can parse all the different SES eventTypes and can create one table where you can begin querying your data.

Building a properly working JSONSerDe DLL by hand is tedious and a bit error-prone, so this time around you’ll be using an open source tool commonly used by AWS Support. All you have to do manually is set up your mappings for the unsupported SES columns that contain colons.

This sample JSON file contains all possible fields from across the SES eventTypes. It has been run through hive-json-schema, which is a great starting point to build nested JSON DDLs.

Here is the resulting “master” DDL to query all types of SES logs:

CREATE EXTERNAL TABLE sesmaster (
  eventType string,
  complaint struct<arrivaldate:string, 
                   complainedrecipients:array<struct<emailaddress:string>>,
                   complaintfeedbacktype:string, 
                   feedbackid:string, 
                   `timestamp`:string, 
                   useragent:string>,
  bounce struct<bouncedrecipients:array<struct<action:string, diagnosticcode:string, emailaddress:string, status:string>>,
                bouncesubtype:string, 
                bouncetype:string, 
                feedbackid:string,
                reportingmta:string, 
                `timestamp`:string>,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,ses_source_ip:string,ses_outgoing_ip:string,ses_from_domain:string,ses_caller_identity:string>
              >,
  send string,
  delivery struct<processingtimemillis:int,
                  recipients:array<string>, 
                  reportingmta:string, 
                  smtpresponse:string, 
                  `timestamp`:string>
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity",
  "mapping.ses_outgoing_ip"="ses:outgoing-ip"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/'

Conclusion

In this post, you’ve seen how to use Amazon Athena in real-world use cases to query the JSON used in AWS service logs. Some of these use cases can be operational like bounce and complaint handling. Others report on trends and marketing data like querying deliveries from a campaign. Still others provide audit and security like answering the question, which machine or user is sending all of these messages? You’ve also seen how to handle both nested JSON and SerDe mappings so that you can use your dataset in its native format without making changes to the data to get your queries running.

With the new AWS QuickSight suite of tools, you also now have a data source that that can be used to build dashboards. This makes reporting on this data even easier. For information about using Athena as a QuickSight data source, see this blog post.

There are also optimizations you can make to these tables to increase query performance or to set up partitions to query only the data you need and restrict the amount of data scanned. If you only need to report on data for a finite amount of time, you could optionally set up S3 lifecycle configuration to transition old data to Amazon Glacier or to delete it altogether.

Feel free to leave questions or suggestions in the comments.

 


 

About the  Author

rick_wiggins_100Rick Wiggins is a Cloud Support Engineer for AWS Premium Support. He works with our customers to build solutions for Email, Storage and Content Delivery, helping them spend more time on their business and less time on infrastructure. In his spare time, he enjoys traveling the world with his family and volunteering at his children’s school teaching lessons in Computer Science and STEM.

 

 

Related

Migrate External Table Definitions from a Hive Metastore to Amazon Athena

exporting_hive

 

 

 

 

 

 

 

Create a Healthcare Data Hub with AWS and Mirth Connect

Post Syndicated from Joseph Fontes original https://aws.amazon.com/blogs/big-data/create-a-healthcare-data-hub-with-aws-and-mirth-connect/

As anyone visiting their doctor may have noticed, gone are the days of physicians recording their notes on paper. Physicians are more likely to enter the exam room with a laptop than with paper and pen. This change is the byproduct of efforts to improve patient outcomes, increase efficiency, and drive population health. Pushing for these improvements has created many new data opportunities as well as challenges. Using a combination of AWS services and open source software, we can use these new datasets to work towards these goals and beyond.

When you get a physical examination, your doctor’s office has an electronic chart with information about your demographics (name, date of birth, address, etc.), healthcare history, and current visit. When you go to the hospital for an emergency, a whole new record is created that may contain duplicate or conflicting information. A simple example would be that my primary care doctor lists me as Joe whereas the hospital lists me as Joseph.

Providers record patient information across different software platforms. Each of these platforms can have varying implementations of complex healthcare data standards. Also, each system needs to communicate with a central repository called a health information exchange (HIE) to build a central, complete clinical record for each patient.

In this post, I demonstrate the capability to consume different data types as messages, transform the information within the messages, and then use AWS service to take action depending on the message type.

Overview of Mirth Connect

Using open source technologies on AWS, you can build a system that transforms, stores, and processes this data as needed. The system can scale to meet the ever-increasing demands of modern medicine. The project that ingests and processes this data is called Mirth Connect.

Mirth Connect is an open source, cross-platform, bidirectional, healthcare integration engine. This project is a standalone server that functions as a central point for routing and processing healthcare information.

Running Mirth Connect on AWS provides the necessary scalability and elasticity to meet the current and future needs of healthcare organizations.

Healthcare data hub walkthrough

Healthcare information comes from various sources and can generate large amounts of data:

  • Health information exchange (HIE)
  • Electronic health records system (EHR)
  • Practice management system (PMS)
  • Insurance company systems
  • Pharmacy systems
  • Other source systems that can make data accessible

Messages typically require some form of modification (transformation) to accommodate ingestion and processing in other systems. Using another project, Blue Button, you can dissect large healthcare messages and locate the sections/items of interest. You can also convert those messages into other formats for storage and analysis.

Data types

The examples in this post focus on the following data types representing information made available from a typical healthcare organization:

HL7 version 2 messages define both a message format and communication protocol for health information. They are broken into different message types depending on the information that they transmit.

There are many message types available, such as ordering labs, prescription dispensing, billing, and more. During a routine doctor visit, numerous messages are created for each patient. This provides a lot of information but also a challenge in storage and processing. For a full list of message types, see Data Definition Tables, section A6. The two types used for this post are:

  • ADT A01 (patient admission and visit notification)

View a sample HL7 ADT A01 message

  • SIU S12 (new appointment booking)

View a sample SIU S12 message

As you can see, this text is formatted as delimited data, where the delimiters are defined in the top line message called the MSG segment. Mirth Connect can parse these messages and communicate using the standard HL7 network protocol.

o_mblog-016

CCD documents, on the other hand, were defined to give the patient a complete snapshot of their medical record. These documents are formatted in XML and use coding systems for procedures, diagnosis, prescriptions, and lab orders. In addition, they contain patient demographics, information about patient vital signs, and an ever-expanding list of other available patient information.

Health insurance companies can make data available to healthcare organizations through various formats. Many times, the term CSV is used to identify not just comma-delimited data types but also data types with other delimiters in use. This data type is commonly used to send datasets from many different industries, including healthcare.

Finally, DICOM is a standard used for medical imaging. This format is used for many different medical purposes such as x-rays, ultrasound, CT scans, MRI results, and more. These files can contain both image files as well as text results and details about the test. For many medical imaging tests, the image files can be quite large as they must contain very detailed information. The storage of these large files—in addition to the processing of the reporting data contained in the files—requires the ability to scale both storage and compute capacity. You must also use a storage option providing extensive data durability.

Infrastructure design

Before designing and deploying a healthcare application on AWS, make sure that you read through the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

To meet the needs of scalability and elasticity with the Mirth Connect application, use Amazon EC2 instances with Amazon EBS storage. This allows you to take advantage of AWS features such as Auto Scaling for instances. EBS allows you to deploy encrypted volumes, readily provision multiple block devices with varying sizes and throughput, and create snapshots for backups.

Mirth Connect also requires a database backend that must be secure, highly available, and scalable. To meet these needs with a HIPAA-eligible AWS service, use Amazon RDS with MySQL.

Next, implement JSON as the standard format for the message data storage to facilitate easier document query and retrieval. With the core functionality of Mirth Connect as well as other open source software offerings, you can convert each message type to JSON. The storage and retrieval of these converted messages requires the use of a database. If you were to use a relational database, you would either need to store blobs of text for items/fields not yet defined or create each column that may be needed for a message.

With the ever-updating standards and increases in data volume and complexity, it can be advantageous to use a database engine that can automatically expand the item schema. For this post, use Amazon DynamoDB. DynamoDB is a fully managed, fast, and flexible NoSQL database service providing scalable and reliable low latency data access.

If you use DynamoDB with JSON, you don’t have to define each data element field of a database message before insert. Because DynamoDB is a managed service, it eliminates the administrative overhead of deploying and maintaining a distributed, scalable NoSQL database cluster. For more information about the addition of the JSON data type to DynamoDB, see the Amazon DynamoDB Update – JSON, Expanded Free Tier, Flexible Scaling, Larger Items AWS Blog post.

The file storage requirements for running a healthcare integration engine can include the ability to store large numbers of small files, large numbers of large files, and everything in between. To ensure that the requirements for available storage space and data durability are met, use Amazon S3 to store the original files as well as processed files and images.

Finally, use some additional AWS services to further facilitate automation. One of these examples is Amazon CloudWatch. By sending Mirth Connect metrics to CloudWatch, we can enable AWS alerts and actions that ensure optimal application availability, efficiency, and responsiveness. An example Mirth Connect implementation design is available in the following diagram:

o_aws-service-diagram-2

AWS service configuration

When you launch the EC2 instance, make sure to encrypt volumes that contain patient information. Also, it would be a good practice to also encrypt the boot partition. For more information, see the New – Encrypted EBS Boot Volumes AWS Blog post. I am storing the protected information on a separate encrypted volume to make for easier snapshots, mounted as /mirth/. For more information, see Amazon EBS Encryption.

Also, launch an Amazon RDS instance to store the Mirth Connect configuration information, message information, and logs. For more information, see Encrypting Amazon RDS Resources.

During the configuration and deployment of your RDS instance, update the max_allowed_packet value to accommodate the larger size of DICOM images. Change this setting by creating a parameter group for Amazon RDS MySQL/MariaDB/Aurora. For more information, see Working with DB Parameter Groups. I have set the value to 1073741824.

After the change is complete, you can compare database parameter groups to ensure that the variable has been set properly. In the Amazon RDS console, choose Parameter groups and check the box next to the default parameter group for your database and the parameter group that you created. Choose Compare Parameters.

o_mblog-017

Use an Amazon DynamoDB table to store JSON-formatted message data. For more information, see Creating Tables and Loading Sample Data.

For your table name, use mirthdata. In the following examples, I used mirthid as the primary key as well as a sort key named mirthdate. The sort key contains the date and time that each message is processed.

o_mblog-009

o_mblog-018

Set up an Amazon S3 bucket to store your messages, images, and other information. Make your bucket name globally unique. For this post, I used the bucket named mirth-blog-data; you can create a name for your bucket and update the Mirth Connect example configuration to reflect that value. For more information, see Protecting Data Using Encryption.

Mirth Connect deployment

You can find installation packaging options for Mirth Connect at Downloads. Along with the Mirth Connect installation, also download and install the Mirth Connect command line interface. For information about installation and initial configuration steps, see the Mirth Connect Getting Started Guide.

After it’s installed, ensure that the Mirth Connect service starts up automatically on reboot, and then start the service.

 

chkconfig --add mcservice
/etc/init.d/mcservice start

o_mblog-001

The Mirth Connect Getting Started Guide covers the initial configuration steps and installation. On first login, you are asked to configure the admin user and to set a password.

o_mblog-004

o_mblog-005

o_mblog-006

You are now presented with the Mirth Connect administrative console. You can view an example console with a few deployed channels in the following screenshot.

o_mblog-019

By default, Mirth Connect uses the Derby database on the local instance for configuration and message storage. Your next agenda item is to switch to a MySQL-compatible database with Amazon RDS. For information about changing the Mirth Connect database type, see Moving Mirth Connect configuration from one database to another on the Mirth website.

Using the Mirth Connect command line interface (CLI), you can perform various functions for CloudWatch integration. Run the channel list command from an SSH console on your Mirth Connect EC2 instance.

In the example below, replace USER with the user name to use for connection (typically, admin if you have not created other users). Replace PASSWORD with the password that you set for the associated user. After you are logged in, run the command channel list to ensure that the connection is working properly:

 

mccommand -a https://127.0.0.1:8443 -u USER -p PASSWORD
channel list
quit

o_mblog-020

Using the command line interface, you can create bash scripts using the CloudWatch put-metric-data function. This allows you to push Mirth Connect metrics directly into CloudWatch. From there, you could build alerts and actions based on this information.

 

echo "dump stats \"/tmp/stats.txt\"" > m-cli-dump.txt
mccommand -a https://127.0.0.1:8443 -u admin -p $MIRTH_PASS -s m-cli-dump.txt
cat /tmp/stats.txt

o_mblog-021

More information for the administration and configuration of your new Mirth Connect implementation can be found on both the Mirth Connect Wiki as well as the PDF user guide.

Example configuration

Mirth Connect is designed around the concept of channels. Each channel has a source and one or more destinations. In addition, messages accepted from the source can be modified (transformed) when they are ingested at the source, or while they are being sent to their destination. The guides highlighted above as well as the Mirth Connect Wiki discuss these concepts and their configuration options.

The example configuration has been made available as independent channels available for download. You are encouraged to import and review the channel details to understand the design of the message flow and transformation. For a full list of example channels, see mirth-channels in the Big Data Blog GitHub repository.

Channels:

  • CCD-Pull – Processes the files in the directory
  • CCD-to-BB – Uses Blue Button to convert CCD into JSON
  • CCD-to-S3 – Sends files to S3
  • BB-to-DynamoDB – Inserts Blue Button JSON into DynamoDB tables
  • DICOM-Pull – Processes the DICOM files from the directory
  • DICOM-Image – Process DICOM embedded images and stores data in Amazon S3 and Amazon RDS
  • DICOM-to-JSON – Uses DCM4CHE to convert XML into JSON. Also inserts JSON into DynamoDB tables and saves the DICOM image attachment and original file in S3
  • HL7-to-DynamoDB – Inserts JSON data into Amazon DynamoDB
  • HL7-Inbound – Pulls HL7 messages into engine and uses NodeJS web service to convert data into JSON

For these examples, you are using file readers and file writers as opposed to the standard HL7 and DICOM network protocols. This allows you to test message flow without setting up HL7 and DICOM network brokers. To use the example configurations, create the necessary local file directories:

mkdir -p /mirth/app/{s3,ccd,bb,dicom,hl7}/{input,output,processed,raw}

Blue Button library

Blue Button is an initiative to make patient health records available and readable to patients. This initiative has driven the creation of a JavaScript library that you use to parse the CCD (XML) documents into JSON. This project can be used to both parse and create CCD documents.

For this post, run this library as a service using Node.js. This library is available for installation as blue-button via the Node.js package manager. For details about the package, see the npmjs website. Instructions for the installation of Node.js via a package manager can be found on nodejs.org.

For a simple example of running a Blue Button web server on the local Amazon EC2 instance with the test data, see the bb-conversion-listener.js script in the Big Data Blog repository.

Processing clinical messages

The flow diagram below illustrates the flow of CCDA/CCD messages as they are processed through Mirth Connect.

o_flow-ccd-1

Individual message files are placed in the “/mirth/app/ccd/input/” directory. From there, the channel, “CCD-Pull” reads the file and then transfers to another directory for processing. The message data is then sent to the Mirth Connect channel “CCD-to-S3”, which then saves a copy of the message to S3 while also creating an entry for each item in Amazon RDS. This allows you to create a more efficient file-naming pattern for S3 while still maintaining a link between the original file name and the new S3 key.

Next, the Mirth Connect channel “CCD-to-BB” passes the message data via an HTTP POST to a web service running on your Mirth Connect instance using the Blue Button open source healthcare project. The HTTP response contains the JSON-formatted translation of the document you sent. This JSON data can now be inserted into a DynamoDB table. When you are building a production environment, this web service could be separated onto its own instance and set up to run behind an Application Load Balancer with Auto Scaling.

You can find sample files online for processing across different locations. Try the following for CCD and DICOM examples:

AWS service and Mirth integration

To integrate with various AWS services, Mirth Connect can use the AWS SDK for Java by invoking custom Java code. For more information, see How to create and invoke custom Java code in Mirth Connect on the Mirth website.

For more information about the deployment and use of the AWS SDK for Java, see Getting Started with the AWS SDK for Java.

To facilitate the use of the example channels, a sample Java class implementation for use with Mirth Connect can be found at https://github.com/awslabs/aws-big-data-blog/tree/master/aws-blog-mirth-healthcare-hub.

For more information about building the code example using Maven, see Using the SDK with Apache Maven.

You can view the addition of these libraries to each channel in the following screenshots.

o_mblog-011

o_mblog-012

After test messages are processed, you can view the contents in DynamoDB.

o_mblog-015

DICOM messages

The dcm4che project is a collection of open source tools supporting the DICOM standard. When a DICOM message is passed through Mirth Connect, the text content is transformed to XML with the image included as an attachment. You convert the DICOM XML into JSON for storage in DynamoDB and store both the original DICOM message as well as the separate attachment in S3, using the AWS SDK for Java and the dcm4che project. The source code for dcm4che can be found in the dcm4che GitHub repository.

Readers can find the example dcm4che functions used in this example in the mirth-aws-dicom-app in the AWS Big Data GitHub repository.

The following screenshots are examples of the DICOM message data converted to XML in Mirth Connect, along with the view of the attached medical image. Also, you can view an example of the data stored in DynamoDB.

o_mblog-013

o_mblog-014

 

The DICOM process flow can be seen in the following diagram.

o_flow-dicom-1

Conclusion

With the combination of channels, you now have your original messages saved and available for archive via Amazon Glacier. You also have each message stored in DynamoDB as discrete data elements, with references to each message stored in Amazon RDS for use with message correlation. Finally, you have your transformed messages, processed messages, and image attachments saved in S3, making them available for direct use or archiving.

Later, you can implement analysis of the data with Amazon EMR. This would allow the processing of information directly from S3 as well as the data in Amazon RDS and DynamoDB.

With an open source health information hub, you have a centralized repository for information traveling throughout healthcare organizations, with community support and development constantly improving functionality and capabilities. Using AWS, health organizations are able to focus on building better healthcare solutions rather than building data centers.

If you have questions or suggestions, please comment below.


About the Author

joe_fontes_90Joseph Fontes is a Solutions Architect at AWS with the Emerging Partner program. He works with AWS Partners of all sizes, industries, and technologies to facilitate technical adoption of the AWS platform with emphasis on efficiency, performance, reliability, and scalability. Joe has a passion for technology and open source which is shared with his loving wife and two sons.

 

 


Related

How Eliza Corporation Moved Healthcare Data to the Cloud

o_eliza_1

AWS Webinars – January 2017 (Bonus: December Recap)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-webinars-january-2017-bonus-december-recap/

Have you had time to digest all of the announcements that we made at AWS re:Invent? Are you ready to debug with AWS X-Ray, analyze with Amazon QuickSight, or build conversational interfaces using Amazon Lex? Do you want to learn more about AWS Lambda, set up CI/CD with AWS CodeBuild, or use Polly to give your applications a voice?

January Webinars
In our continued quest to provide you with training and education resources, I am pleased to share the webinars that we have set up for January. These are free, but they do fill up and you should definitely register ahead of time. All times are PT and each webinar runs for one hour:

January 16:

January 17:

January 18::

January 19:

January 20

December Webinar Recap
The December webinar series is already complete; here’s a quick recap with links to the recordings:

December 12:

December 13:

December 14:

December 15:

Jeff;

PS – If you want to get a jump start on your 2017 learning objectives, the re:Invent 2016 Presentations and re:Invent 2016 Videos are just a click or two away.

Now Open – AWS London Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-london-region/

Last week we launched our 15th AWS Region and today we are launching our 16th. We have expanded the AWS footprint into the United Kingdom with a new Region in London, our third in Europe. AWS customers can use the new London Region to better serve end-users in the United Kingdom and can also use it to store data in the UK.

The Details
The new London Region provides a broad suite of AWS services including Amazon CloudWatch, Amazon DynamoDB, Amazon ECS, Amazon ElastiCache, Amazon Elastic Block Store (EBS), Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon EMR, Amazon Glacier, Amazon Kinesis Streams, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Elastic Beanstalk, AWS Snowball, AWS Snowmobile, AWS Key Management Service (KMS), AWS Marketplace, AWS OpsWorks, AWS Personal Health Dashboard, AWS Shield Standard, AWS Storage Gateway, AWS Support API, Elastic Load Balancing, VM Import/Export, Amazon CloudFront, Amazon Route 53, AWS WAF, AWS Trusted Advisor, and AWS Direct Connect (follow the links for pricing and other information).

The London Region supports all sizes of C4, D2, M4, T2, and X1 instances.

Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

From Our Customers
Many AWS customers are getting ready to use this new Region. Here’s a very small sample:

Trainline is Europe’s number one independent rail ticket retailer. Every day more than 100,000 people travel using tickets bought from Trainline. Here’s what Mark Holt (CTO of Trainline) shared with us:

We recently completed the migration of 100 percent of our eCommerce infrastructure to AWS and have seen awesome results: improved security, 60 percent less downtime, significant cost savings and incredible improvements in agility. From extensive testing, we know that 0.3s of latency is worth more than 8 million pounds and so, while AWS connectivity is already blazingly fast, we expect that serving our UK customers from UK datacenters should lead to significant top-line benefits.

Kainos Evolve Electronic Medical Records (EMR) automates the creation, capture and handling of medical case notes and operational documents and records, allowing healthcare providers to deliver better patient safety and quality of care for several leading NHS Foundation Trusts and market leading healthcare technology companies.

Travis Perkins, the largest supplier of building materials in the UK, is implementing the biggest systems and business change in its history including the migration of its datacenters to AWS.

Just Eat is the world’s leading marketplace for online food delivery. Using AWS, JustEat has been able to experiment faster and reduce the time to roll out new feature updates.

OakNorth, a new bank focused on lending between £1m-£20m to entrepreneurs and growth businesses, became the UK’s first cloud-based bank in May after several months of working with AWS to drive the development forward with the regulator.

Partners
I’m happy to report that we are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in the United Kingdom. Here’s a partial list:

  • AWS Premier Consulting Partners – Accenture, Claranet, Cloudreach, CSC, Datapipe, KCOM, Rackspace, and Slalom.
  • AWS Consulting Partners – Attenda, Contino, Deloitte, KPMG, LayerV, Lemongrass, Perfect Image, and Version 1.
  • AWS Technology Partners – Splunk, Sage, Sophos, Trend Micro, and Zerolight.
  • AWS Managed Service Partners – Claranet, Cloudreach, KCOM, and Rackspace.
  • AWS Direct Connect Partners – AT&T, BT, Hutchison Global Communications, Level 3, Redcentric, and Vodafone.

Here are a few examples of what our partners are working on:

KCOM is a professional services provider offering consultancy, architecture, project delivery and managed service capabilities to large UK-based enterprise businesses. The scalability and flexibility of AWS gives them a significant competitive advantage with their enterprise and public sector customers. The new Region will allow KCOM to build innovative solutions for their public sector clients while meeting local regulatory requirements.

Splunk is a member of the AWS Partner Network and a market leader in analyzing machine data to deliver operational intelligence for security, IT, and the business. They use cloud computing and big data analytics to help their customers to embrace digital transformation and continuous innovation. The new Region will provide even more companies with real-time visibility into the operation of their systems and infrastructure.

Redcentric is a NHS Digital-approved N3 Commercial Aggregator. Their work allows health and care providers such as NHS acute, emergency and mental trusts, clinical commissioning groups (CCGs), and the ISV community to connect securely to AWS. The London Region will allow health and care providers to deliver new digital services and to improve outcomes for citizens and patients.

Visit the AWS Partner Network page to read some case studies and to learn how to join.

Compliance & Connectivity
Every AWS Region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, SOC3, PCI DSS Level 1, and many more. Our Cloud Compliance page includes information about these standards, along with those that are specific to the UK, including Cyber Essentials Plus.

The UK Government recognizes that local datacenters from hyper scale public cloud providers can deliver secure solutions for OFFICIAL workloads. In order to meet the special security needs of public sector organizations in the UK with respect to OFFICIAL workloads, we have worked with our Direct Connect Partners to make sure that obligations for connectivity to the Public Services Network (PSN) and N3 can be met.

Use it Today
The London Region is open for business now and you can start using it today! If you need additional information about this Region, please feel free to contact our UK team at [email protected].

Jeff;

Now Open AWS Canada (Central) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-canada-central-region/

We are growing the AWS footprint once again. Our new Canada (Central) Region is now available and you can start using it today. AWS customers in Canada and the northern parts of the United States have fast, low-latency access to the suite of AWS infrastructure services.

The Details
The new Canada (Central) Region supports Amazon Elastic Compute Cloud (EC2) and related services including Amazon Elastic Block Store (EBS), Amazon Virtual Private Cloud, Auto Scaling, Elastic Load Balancing, NAT Gateway, Spot Instances, and Dedicated Hosts.

It also supports Amazon Aurora, AWS Certificate Manager (ACM), AWS CloudFormation, Amazon CloudFront, AWS CloudHSM, AWS CloudTrail, Amazon CloudWatch, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, Amazon DynamoDB, Amazon ECS, EC2 Container Registry, AWS Elastic Beanstalk, Amazon EMR, Amazon ElastiCache, Amazon Glacier, AWS Identity and Access Management (IAM), AWS Snowball, AWS Key Management Service (KMS), Amazon Kinesis, AWS Marketplace, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Route 53, AWS Shield Standard, Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Workflow Service (SWF), AWS Storage Gateway, AWS Trusted Advisor, VM Import/Export, and AWS WAF.

The Region supports all sizes of C4, D2, M4, T2, and X1 instances.

As part of our on-going focus on making cloud computing available to you in an environmentally friendly fashion, AWS data centers in Canada draw power from a grid that generates 99% of its electricity using hydropower (read about AWS Sustainability to learn more).

Well Connected
After receiving a lot of positive feedback on the network latency metrics that I shared when we launched the AWS Region in Ohio, I am happy to have a new set to share as part of today’s launch (these times represent a lower bound on latency and may change over time).

The first set of metrics are to other Canadian cities:

  • 9 ms to Toronto.
  • 14 ms to Ottawa.
  • 47 ms to Calgary.
  • 49 ms to Edmonton.
  • 60 ms to Vancouver.

The second set are to locations in the US:

  • 9 ms to New York.
  • 19 ms to Chicago.
  • 16 ms to US East (Northern Virginia).
  • 27 ms to US East (Ohio).
  • 75 ms to US West (Oregon).

Canada is also home to CloudFront edge locations in Toronto, Ontario, and Montreal, Quebec.

And Canada Makes 15
Today’s launch brings our global footprint to 15 Regions and 40 Availability Zones, with seven more Availability Zones and three more Regions coming online through the next year. As a reminder, each Region is a physical location where we have two or more Availability Zones or AZs. Each Availability Zone, in turn, consists of one or more data centers, each with redundant power, networking, and connectivity, all housed in separate facilities. Having two or more AZ’s in each Region gives you the ability to run applications that are more highly available, fault tolerant, and durable than would be the case if you were limited to a single AZ.

For more information about current and future AWS Regions, take a look at the AWS Global Infrastructure page.

Jeff;


Région AWS Canada (Centre) Maintenant Ouverte

Nous étendons la portée d’AWS une fois de plus. Notre nouvelle Région du Canada (Centre) est maintenant disponible et vous pouvez commencer à l’utiliser dès aujourd’hui. Les clients d’AWS au Canada et dans les régions du nord des États-Unis ont un accès rapide et à latence réduite à l’ensemble des services d’infrastructure AWS.

Les détails
La nouvelle Région du Canada (Centre) supporte Amazon Elastic Compute Cloud (EC2) et les services associés incluant Amazon Elastic Block Store (EBS), Amazon Virtual Private Cloud, Auto Scaling, Elastic Load Balancing, NAT Gateway, Spot Instances et Dedicated Hosts.

Également supportés sont Amazon Aurora, AWS Certificate Manager (ACM), AWS CloudFormation, Amazon CloudFront, AWS CloudHSM, AWS CloudTrail, Amazon CloudWatch, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, Amazon DynamoDB, Amazon ECS, EC2 Container Registry, AWS Elastic Beanstalk, Amazon EMR, Amazon ElastiCache, Amazon Glacier, AWS Identity and Access Management (IAM), AWS Snowball, AWS Key Management Service (KMS), Amazon Kinesis, AWS Marketplace, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Route 53, AWS Shield Standard, Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Workflow Service (SWF), AWS Storage Gateway, AWS Trusted Advisor, VM Import/Export, et AWS WAF.

La région supporte toutes les tailles des instances C4, D2, M4, T2 et X1.

Dans le cadre de notre mission continue de vous offrir des services infonuagiques de manière écologique, les centres de données d’AWS au Canada sont alimentés par un réseau électrique dont 99 pour cent de l’énergie fournie est de nature hydroélectrique (consultez AWS Sustainability pour en savoir plus).

Bien connecté
Après avoir reçu beaucoup de commentaires positifs sur les mesures de latence du réseau dont je vous ai fait part lorsque nous avons lancé la région AWS en Ohio, je suis heureux de vous faire part d’un nouvel ensemble de mesures dans le cadre du lancement d’aujourd’hui (ces mesures représentent une limite inférieure à la latence et pourraient changer au fil du temps).

Le premier ensemble de mesures concerne d’autres villes canadiennes:

  • 9 ms à Toronto.
  • 14 ms à Ottawa.
  • 47 ms à Calgary.
  • 49 ms à Edmonton.
  • 60 ms à Vancouver.

Le deuxième ensemble concerne des emplacements aux États-Unis :

  • 9 ms à New York.
  • 19 ms à Chicago.
  • 16 ms à USA Est (Virginie du Nord).
  • 27 ms à USA Est (Ohio).
  • 75 ms à USA Ouest (Oregon).

Le Canada compte également des emplacements périphériques CloudFront à Toronto, en Ontario, et à Montréal, au Québec.

Et le Canada fait 15
Le lancement d’aujourd’hui porte notre présence mondiale à 15 régions et 40 zones de disponibilité avec sept autres zones de disponibilité et trois autres régions qui seront mises en opération au cours de la prochaine année. Pour vous rafraîchir la mémoire, chaque région est un emplacement physique où nous avons deux ou plusieurs zones de disponibilité. Chaque zone de disponibilité, à son tour, comprend un ou plusieurs centres de données, chacun doté d’une alimentation, d’une mise en réseau et d’une connectivité redondantes dans des installations distinctes. Avoir deux zones de disponibilité ou plus dans chaque région vous donne la possibilité d’opérer des applications qui sont plus disponibles, plus tolérantes aux pannes et plus durables qu’elles ne le seraient si vous étiez limité à une seule zone de disponibilité.

Pour plus d’informations sur les régions AWS actuelles et futures, consultez la page Infrastructure mondiale AWS.

Jeff;

AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-snowmobile-move-exabytes-of-data-to-the-cloud-in-weeks/

Moving large amounts of on-premises data to the cloud as part of a migration effort is still more challenging than it should be! Even with high-end connections, moving petabytes or exabytes of film vaults, financial records, satellite imagery, or scientific data across the Internet can take years or decades.  On the business side, adding new networking or better connectivity to data centers that are scheduled to be decommissioned after a migration is expensive and hard to justify.

Last year we announced the AWS Snowball (see AWS Snowball – Transfer 1 Petabyte Per Week Using Amazon-Owned Storage Appliances for more information) as a step toward addressing large-scale data migrations. With 80 TB of storage, these appliances address the needs of many of our customers, and are in widespread use today.

However, customers with exabyte-scale on-premises storage look at the 80 TB, do the math, and realize that an all-out data migration would still require lots of devices and some headache-inducing logistics.

Introducing AWS Snowmobile
In order to meet the needs of these customers, we are launching Snowmobile today. This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks (you can get more than one if necessary). Designed to meet the needs of our customers in the financial services, media & entertainment, scientific, and other industries, Snowmobile attaches to your network and appears as a local, NFS-mounted volume. You can use your existing backup and archiving tools to fill it up with data destined for Amazon Simple Storage Service (S3) or Amazon Glacier.

Physically, Snowmobile is a ruggedized, tamper-resistant shipping container 45 feet long, 9.6 feet high, and 8 feet wide. It is water-proof, climate-controlled, and can be parked in a covered or uncovered area adjacent to your existing data center. Each Snowmobile consumes about 350 KW of AC power; if you don’t have sufficient capacity on site we can arrange for a generator.

On the security side, Snowmobile incorporates multiple layers of logical and physical protection including chain-of-custody tracking and video surveillance. Your data is encrypted with your AWS Key Management Service (KMS) keys before it is written. Each container includes GPS tracking, with cellular or satellite connectivity back to AWS. We will arrange for a security vehicle escort when the Snowmobile is in transit; we can also arrange for dedicated security guards while your Snowmobile is on-premises.

Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.

Snowmobile in Action
I don’t happen to have an exabyte-scale data center and I certainly don’t have room next to my house for a 45 foot long container. In order to illustrate the process of arranging for and using a Snowmobile, I sat down at my LEGO table and (in the finest Doc Brown tradition) built a scale model. I hope that you enjoy this brick-based story telling!

Let’s start in your data center. It was built a while ago and is definitely showing its age. The racks are full of disk and tape drives of multiple vintages, each storing precious, mission-critical data. You and your colleagues spend too much time inside of the raised floor, tracking cables and trying to squeeze out just a bit more performance:

Your manager is getting frustrated and does not know what to do next:

Fortunately, one of your colleagues reads this blog every day and she knows just what to do:

A quick phone call to AWS and a meeting is set up:

Everyone gets together at a convenient AWS office to learn more about Snowmobile and to plan the migration:

Everyone gathers around to look at the scale model of the Snowmobile. Even the dog is intrigued, and your manager takes a picture:

A Snowmobile shows up at your data center:

AWS Professional Services helps you to get it connected and you initiate the data transfer:

The Snowmobile heads back to AWS and your data is imported as you specified!

Snowmobile at DigitalGlobe
Our friends at DigitalGlobe are using a Snowmobile to move 100 PB of satellite imagery to AWS. Here’s what Jay Littlepage (former Amazonian and now VP of Infrastructure & Operations at DigitalGlobe) has to say about this effort:

Like many large enterprises, we are in the process of migrating IT operations from our data centers to AWS. Our geospatial big data platform, GBDX, has been based in AWS since inception. But our unmatchable 16-year archive of high-resolution satellite imagery, visualizing 6 billion square kilometers of the Earth’s surface, has been stored within our facilities. We have slowly been migrating our archive to AWS but that process has been slow and inefficient. Our constellation of satellites generate more earth imagery each year (10 PB) than we have been able to migrate by these methods.

We needed a solution that could move our 100 PB archive but could not find one until now with AWS Snowmobile. DigitalGlobe is currently migrating our entire raw imagery archive with one Snowmobile transfer directly into an Amazon Glacier Vault. AWS Snowmobile operators are providing an amazing customized service where they manage the configuration, monitoring, and logistics. Using Snowmobile’s data transfer abilities will get our time-lapse imagery archive to the cloud more quickly, allowing our customers and partners to have access to uniquely massive data sets. By using AWS’ elastic computing platform within GBDX, we will run distributed image analysis, revealing the pace and pattern of world-wide change on an extraordinary scale, with unprecedented speed, in a more cost-effective manner – prioritizing insights over infrastructure. Without Snowmobile, we would not have been able to transfer our extremely large volume of data in such a short time or create new business opportunities for our customers. Snowmobile is truly a game changer!

Things to Know
Here are a couple of final things you should know about Snowmobile:

Data Export – The initial launch is aimed at data import (on-premises to AWS). We do know that some of our customers are interested in data export, with a particular focus on disaster recovery (DR) use cases.

AvailabilitySnowmobile is available in all AWS Regions. As you can see from reading the previous section, this is not a self-serve product. My AWS Sales colleagues are ready to discuss your data import needs with you.

Pricing – I don’t have pricing info to share. However, we intend to make sure that Snowmobile is both faster and less expensive than using a network-based data transfer model.

Jeff;

PS – Check out my Snowmobile Photo Album for some high-res pictures of my creation. Special thanks to Matt Gutierrez (Symbionix) for the final staging and the photo shoot.

PPS – I will build and personally deliver (in exchange for a photo op and a bloggable story) Snowmobile models to the first 5 customers.

Announcing AWS Organizations: Centrally Manage Multiple AWS Accounts

Post Syndicated from Caitlyn Shim original https://aws.amazon.com/blogs/security/announcing-aws-organizations-centrally-manage-multiple-aws-accounts/

Today, AWS launched AWS Organizations: a new way for you to centrally manage all the AWS accounts your organization owns. Now you can arrange your AWS accounts into groups called organizational units (OUs) and apply policies to OUs or directly to accounts. For example, you can organize your accounts by application, environment, team, or any other grouping that makes sense for your business.

Organizations removes the need to manage security policies through separate AWS accounts. Before Organizations, if you had a set of AWS accounts, you had to ensure that users in those AWS accounts had the right level of access to AWS services. You had to either configure security settings on each account individually or write a custom script to iterate through each account. However, any user with administrative permissions in those AWS accounts could have bypassed the defined permissions. Organizations includes the launch of service control policies (SCPs), which give you the ability to configure one policy and have it apply to your entire organization, an OU, or an individual account. In this blog post, I walk through an example of how to use Organizations.

Create an organization

Let’s say you own software regulated by the healthcare or financial industry, and you want to deploy and run your software on AWS. In order to process, store, or transmit any protected health information on AWS, regulations such as ISO and SOC limit your software to using only designated AWS services. With Organizations, you can limit your software to access only the AWS services allowed by such regulations. SCPs can limit access to AWS services across all accounts in your organization such that the enforced policy cannot be overridden by any user in the AWS accounts on whom the SCP is enforced.

When creating a new organization, you want to start in the account that will become the master account. The master account is the account that will own the organization and all of the administrative actions applied to accounts by the organization. The master account can set up the SCPs, account groupings, and payment method of the other AWS accounts in the organization called member accounts. Like Consolidated Billing, the master account is also the account that pays the bills for the AWS activity of the member accounts, so choose it wisely.

For all my examples in this post, I use the AWS CLI. Organizations is also available through the AWS Management Console and AWS SDK.

In the following case, I use the create-organization API to create a new organization in Full_Control mode:

server:~$ aws organizations create-organization --mode FULL_CONTROL

returns:

{
    "organization": {
        "availablePolicyTypes": [
            {
                "status": "ENABLED",
                "type": "service_control_policy"
            }
        ],
        "masterAccountId": "000000000001",
        "masterAccountArn": "arn:aws:organizations::000000000001:account/o-1234567890/000000000001",
        "mode": "FULL_CONTROL",
        "masterAccountEmail": "[email protected]",
        "id": "o-1234567890",
        "arn": "arn:aws:organizations::000000000001:organization/o-1234567890"
    }
}

In this example, I want to set security policies on the organization, so I create it in Full_Control mode. You can create organizations in two different modes: Full_Control or Billing. Full_Control has all of the benefits of Billing mode but adds the additional ability to apply service control policies.

Note: If you currently use Consolidated Billing, your Consolidated Billing family is migrated automatically to AWS Organizations in Billing mode, which is equivalent to Consolidated Billing. All account activity charges accrued by member accounts are billed to the master account.

Add members to the organization

To populate the organization you’ve just created, you can either create new AWS accounts in the organization or invite existing accounts.

Create new AWS accounts

Organizations gives you APIs to programmatically create new AWS accounts in your organization. All you have to specify is the email address and account name. The rest of the account information (address, etc.) is inherited from the master account.

To create a new account by using the AWS CLI, you call create-account with the email address to assign to the new account and what you want to set the account name to be:

server:~$ aws organizations create-account --account-name DataIngestion --email [email protected]

Creating an AWS account is an asynchronous process, so create-account returns a createAccountStatus response:

{
    "createAccountStatus": {
        "requestedTimestamp": 1479783743.401,
        "state": "IN_PROGRESS",
        "id": "car-12345555555555555555555555556789",
        "accountName": "DataIngestion"
    }
}

You can use describe-create-account-status with the createAccountStatus id to see when the AWS account is ready to use. Accounts are generally ready within minutes.

Invite existing AWS accounts

You can also invite existing AWS accounts to your organization. To invite an AWS account to your organization, you either need to know the AWS account ID or email address.

The command through the AWS CLI is invite-account-to-organization:

server:~$ aws organizations invite-account-to-organization --target [email protected],type=EMAIL

That command returns a handshake object, which contains all information about the request sent. In this instance, this was a request sent to the account at [email protected] to join the organization with the id 1234567890:

{
    "handshake": {
        "id": "h-77888888888888888888888888888777",
        "state": "OPEN",
        "resources": [
            {
               "type": "ORGANIZATION",
               "resources": [
                  {
                     "type": "MASTER_EMAIL",
                     "value": "[email protected]"
                  },
                  {
                     "type": "MASTER_NAME",
                     "value": "Example Corp"
                  },
                  {
                     "type": "ORGANIZATION_MODE",
                     "value": "FULL"
                  }
             ],
             "value": "o-1234567890"
          },
          {
             "type": "EMAIL",
             "value": "[email protected]"
          }
        ],
        "parties": [
            {
              "type": "EMAIL",
              "id": "[email protected]"
            },
            {
              "type": "ORGANIZATION",
              "id": "1234567890"
            }
            ],
            "action": "invite",
            "requestedTimestamp": 1479784006.008,
            "expirationTimestamp": 1481080006.008,
            "arn": "arn:aws:organizations::000000000001:handshake/o-1234567890/invite/h-77888888888888888888888888888777"
   }
}

The preceding command sends an email to the account owner to notify them that you invited their account to join your organization. If the owner of the invited AWS account agrees to the terms of joining the organization, they can accept the request through a link in the email or by calling accept-handshake-request.

Viewing your organization

At this point, the organization is populated with two member accounts: DataIngestion and DataProcessing. The master account is [email protected]. By default, accounts are placed directly under the root of your organization. The root is the base of your organizational structure—all OUs or accounts branch from it, as shown in the following illustration.

You can view all the accounts in the organization with the CLI command, list-accounts:

server:~$ aws organizations list-accounts

The list-accounts command returns a list of accounts and the details of how they joined the organization. Note that the master account is the organization along with the member accounts:

{
    "accounts": [
        {
           "status": "ACTIVE",
           "name": "DataIngestion",
           "joinedMethod": "CREATED",
           "joinedTimestamp": 1479784102.074,
           "id": "200000000001",
           "arn": "arn:aws:organizations::000000000001:account/o-1234567890/200000000001"
        },
        {
           "status": "ACTIVE",
           "name": " DataProcessing ",
           "joinedMethod": "INVITED",
           "joinedTimestamp": 1479784102.074,
           "id": "200000000002",
           "arn": "arn:aws:organizations::000000000001:account/o-1234567890/200000000002"
        },
        {
           "status": "ACTIVE",
           "name": "Example Corp",
           "joinedMethod": "INVITED",
           "joinedTimestamp": 1479783034.579,
           "id": "000000000001",
           "arn": "arn:aws:organizations::000000000001:account/o-1234567890/000000000001"
        }
    ]
}

Service control policies

Service control policies (SCPs) let you apply controls on which AWS services and APIs are accessible to an AWS account. The SCPs apply to all entities within an account: the root user, AWS Identity and Access Management (IAM) users, and IAM roles. For this example, we’re going to set up an SCP to only allow access to seven AWS services.

Note that SCPs work in conjunction with IAM policies. Our SCP allows access to a set of services, but it is possible to set up an IAM policy to further restrict access for a user. AWS will enforce the most restrictive combination of the policies.

SCPs affect the master account differently than other accounts. To ensure you do not lock yourself out of your organization, SCPs applied to your organization do not apply to the master account. Only SCPs applied directly to the master account can modify the master account’s access.

Creating a policy

In this example, I want to make sure that only the given set of services is accessible in our AWS account, so I will create an Allow policy to grant access only to listed services. The following policy allows access to Amazon DynamoDB, Amazon RDS, Amazon EC2, Amazon S3, Amazon EMR, Amazon Glacier, and Elastic Load Balancing.

server:~$ aws organizations create-policy --name example-prod-policy --type service_control_policy --description "All services allowed by regulation policy." --content file://./Regulation-Policy.json

and inside Regulation-Policy.json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                 "dynamodb:*","rds:*","ec2:*","s3:*","elasticmapreduce:*",
                 "glacier:*","elasticloadbalancing:*"
             ],
             "Effect": "Allow",
             "Resource": "*"
        }
    ]
}

SCPs follow the same structure as IAM policies. This example is an Allow policy, but SCPs also offer the ability to write Deny policies, restricting access to specific services and APIs.

To create an SCP in organizations through the CLI, you can store the earlier SCP in file Regulation-Policy.json and call create-policy:

server:~$ aws organizations create-policy --name example-prod-policy --type service_control_policy --description "All services allowed for regulation policy." --content file://./Regulation-Policy.json

The create-policy command returns the policy just created. Save the policy id for later when you want to apply it to an element in organizations:

{
"policy": {
"content": "{\n\"Version\": \"2012-10-17\",\n\"Statement\": [\n {\n \"Action\": [\n \"dynamodb: *\",\"rds: *\",\"ec2: *\",\"s3: *\",\"elasticmapreduce: *\",\"glacier: *\",\"elasticloadbalancing: *\"\n],\n \"Effect\": \"Allow\",\n\"Resource\": \"*\"\n}\n ]\n}\n",
"policySummary": {
"description": "All services allowed for regulation policy.",
"type": "service_control_policy",
"id": "p-yyy12346",
"arn": "arn:aws:organizations:: 000000000001:policy/o-1234567890/service_control_policy/p-yyy12346",
"name": "example-prod-policy"
        }
    }
}

Attaching a policy

After you create your policy, you can attach it to entities in your organization. You can attach policies to the root account, an OU, or an individual account. Accounts inherit all policies above them in the organizational tree. To keep this example simple, attach the Example Corp. Prod SCP (as defined in the example-prod-policy) to your entire organization and attach the policy to the root. Because the root is the top element of an organization’s tree, policies applied to the tree apply to every member account in the organization.

Using the CLI, you first need to call list-roots to get the root id. Once you have the root id, you can call attach-policy, passing it in the example-prod-policy id (collected from the create-policy response) and the root id.

server:~$ aws organizations list-roots
{
    "roots": [
        {
            "policyTypes": [
                {
                    "status": "ENABLED",
                    "type": "service_control_policy"
                }
            ],
            "id": "r-4444",
            "arn": "arn:aws:organizations::000000000001:root/o-1234567890/r-4444",
             "name": "Root"
        }
    ]
}
server:~$ aws organizations attach-policy --target-id r-4444 --policy-id p-yyy12346

Let’s now verify the policies on the root. Calling list-policies-for-target on the root will display all policies attached to the root of the organization.

server:~$ aws organizations list-policies-for-target --target-id r-4444 --filter service_control_policy
 
{
    "policies": [
        {
            "awsManaged": false,
            "description": "All services allowed by regulation policy.",
            "type": "service_control_policy",
            "id": " p-yyy12346",
            "arn": "arn:aws:organizations::000000000001:policy/o-1234567890/service_control_policy/p-yyy12346",
            "name": "example-prod-policy"
        },
        {
            "awsManaged": true,
            "description": "Allows access to every operation",
            "type": "service_control_policy",
            "id": "p-FullAWSAccess",
            "arn": "arn:aws:organizations::aws:policy/service_control_policy/p-FullAWSAccess",
            "name": "FullAWSAccess"
        }
    ]
}

Two policies are returned from list-policies-for-target on the root: the example-prod-policy, and the FullAWSAccess policy. FullAWSAccess is an AWS managed policy that is applied to every element in the organization by default. When you create your organization, every root, OU, and account has the FullAWSAccess policy, which allows your organization to have access to everything in AWS (*:*).

If you want to limit your organization’s access to only the services specific to the Example Corp. Prod SCP, you must remove the FullAWSAccess policy. In this case, Organizations will allow the root account to have access to the services allowed in the FullAWSAccess policy and the Example Corp. Prod SCP, which is all AWS services. To have the restrictions intended by the Example Corp. Prod SCP, you have to remove the FullAWSAccess policy. To do this with the CLI:

aws organizations detach-policy --policy-id p-FullAWSAccess --target-id r-4444

And one final list to verify only the example-prod-policy is attached to the root:

aws organizations list-policies-for-target --target-id r-4444 --filter service_control_policy
 
{
    "policies": [
        {
            "awsManaged": false,
            "description": "All services allowed by regulation policy.",
            "type": "service_control_policy",
            "id": "p-yyy12346",
            "arn": "arn:aws:organizations:: 000000000001:policy/o-1234567890/service_control_policy/p-yyy12346",
            "name": "example-prod-policy"
        }
    ]
}

Now, you have an extra layer of protection, limiting all accounts in your organization to only those AWS services that are compliant with Example Corp’s regulations.

The organization limiting all accounts in your organization to only those AWS services that are compliant with Example Corp’s regulations

More advanced structure

If you want to break down your organization further, you can add OUs and apply policies. For example, some of your AWS accounts might not be used for protected health information and therefore have no need for such a restrictive SCP.

For example, you may have a development team that needs to use and experiment with different AWS services to do their job. In this case, you want to set the controls on these accounts to be lighter than the controls on the accounts that have access to regulated customer information. With Organizations, you can, for example, create an OU to represent the medical software you run in production, and a different OU to hold the accounts developers use. You can then apply different security restrictions to each OU.

The following illustration shows three developer accounts that are part of a Developer Research OU. The original two accounts that require regulation protections are in a Production OU. We also detached our Example Corp. Prod Policy from the root and instead attached it to the Production OU. Finally, any restrictions not related to regulations are kept at the root level by placing them in a new Company policy.

In the preceding illustration, the accounts in the Developer Research OU have the Company policy applied to them because their OU is still under the root with that policy. The accounts in the Production OU have both SCPs (Company policy and Example Corp. Prod Policy) applied to them. As is true with IAM policies, SCPs apply the more restrictive intersection of the two policies to the account. If the Company Policy SCP allowed Glacier, EMR, S3, EC2, RDS, Dynamo, Amazon Kinesis, AWS Lambda, and Amazon Redshift, the policy application for accounts in the Production OU would be the intersection, as shown in the following chart.

The accounts in the Production OU have both the Company policy SCP and the Example Corp. Prod Policy SCP applied to them. The intersection of the two policies applies, so these accounts have access to DynamoDB, Glacier, EMR, RDS, EC2, and S3. Because Redshift, Lambda, and Kinesis are not in the Example Corp. Prod Policy SCP, the accounts in the Production OU are not allowed to access them.

To summarize policy application: all elements (root account, OUs, accounts) have FullAWSAccess (*:*) by default. If you have multiple SCPs on the same element, Organizations takes the union. When policies are inherited across multiple elements in the organizational tree, Organizations takes the intersection of the SCPs. And finally, the master account has unique protections on it; only SCPs attached directly to it will change its access.

Conclusion

AWS Organizations makes it easy to manage multiple AWS accounts from a single master account. You can use Organizations to group accounts into organizational units and manage your accounts by application, environment, team, or any other grouping that makes sense for your business. SCPs give you the ability to apply policies to multiple accounts at once.

To learn more, see the AWS Organizations home page and sign up for the AWS Organizations Preview today. If you are at AWS re:Invent 2016, you can also attend session SAC323 – NEW SERVICE: Centrally Manage Multiple AWS Accounts with AWS Organizations on Wednesday, November 30, at 11:00 A.M.

– Caitlyn

AWS Storage Update – S3 & Glacier Price Reductions + Additional Retrieval Options for Glacier

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glacier-price-reductions/

Back in 2006, we launched S3 with a revolutionary pay-as-you-go pricing model, with an initial price of 15 cents per GB per month. Over the intervening decade, we reduced the price per GB by 80%, launched S3 in every AWS Region, and enhanced the original one-size-fits-all model with user-driven features such as web site hosting, VPC integration, and IPv6 support, while adding new storage options including S3 Infrequent Access.

Because many AWS customers archive important data for legal, compliance, or other reasons and reference it only infrequently, we launched Glacier in 2012, and then gave you the ability to transition data between S3, S3 Infrequent Access, and Glacier by using lifecycle rules.

Today I have two big pieces of news for you: we are reducing the prices for S3 Standard Storage and for Glacier storage. We are also introducing additional retrieval options for Glacier.

S3 & Glacier Price Reduction
As long-time AWS customers already know, we work relentlessly to reduce our own costs, and to pass the resulting savings along in the form of a steady stream of AWS Price Reductions.

We are reducing the per-GB price for S3 Standard Storage in most AWS regions, effective December 1, 2016. The bill for your December usage will automatically reflect the new, lower prices. Here are the new prices for Standard Storage:

Regions 0-50 TB
($ / GB / Month)
51 – 500 TB
($ / GB / Month)
500+ TB
($ / GB / Month)
  • US East (Northern Virginia)
  • US East (Ohio)
  • US West (Oregon)
  • EU (Ireland)

(Reductions range from 23.33% to 23.64%)

 $0.0230 $0.0220 $0.0210
  • US West (Northern California)

(Reductions range from 20.53% to 21.21%)

 $0.0260 $0.0250 $0.0240
  • EU (Frankfurt)

(Reductions range from 24.24% to 24.38%)

 $0.0245 $0.0235 $0.0225
  • Asia Pacific (Singapore)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Bombay)

(Reductions range from 16.36% to 28.13%)

 $0.0250 $0.0240 $0.0230

As you can see from the table above, we are also simplifying the pricing model by consolidating six pricing tiers into three new tiers.

We are also reducing the price of Glacier storage in most AWS Regions. For example, you can now store 1 GB for 1 month in the US East (Northern Virginia), US West (Oregon), or EU (Ireland) Regions for just $0.004 (less than half a cent) per month, a 43% decrease. For reference purposes, this amount of storage cost $0.010 when we launched Glacier in 2012, and $0.007 after our last Glacier price reduction (a 30% decrease).

The lower pricing is a direct result of the scale that comes about when our customers trust us with trillions of objects, but it is just one of the benefits. Based on the feedback that I get when we add new features, the real value of a cloud storage platform is the rapid, steady evolution. Our customers often tell me that they love the fact that we anticipate their needs and respond with new features accordingly.

New Glacier Retrieval Options
Many AWS customers use Amazon Glacier as the archival component of their tiered storage architecture. Glacier allows them to meet compliance requirements (either organizational or regulatory) while allowing them to use any desired amount of cloud-based compute power to process and extract value from the data.

Today we are enhancing Glacier with two new retrieval options for your Glacier data. You can now pay a little bit more to expedite your data retrieval. Alternatively, you can indicate that speed is not of the essence and pay a lower price for retrieval.

We launched Glacier with a pricing model for data retrieval that was based on the amount of data that you had stored in Glacier and the rate at which you retrieved it. While this was an accurate reflection of our own costs to provide the service, it was somewhat difficult to explain. Today we are replacing the rate-based retrieval fees with simpler per-GB pricing.

Our customers in the Media and Entertainment industry archive their TV footage to Glacier. When an emergent situation calls for them to retrieve a specific piece of footage, minutes count and they want fast, cost-effective access to the footage. Healthcare customers are looking for rapid, “while you wait” access to archived medical imagery and genome data; photo archives and companies selling satellite data turn out to have similar requirements. On the other hand, some customers have the ability to plan their retrievals ahead of time, and are perfectly happy to get their data in 5 to 12 hours.

Taking all of this in to account, you can now select one of the following options for retrieving your data from Glacier (The original rate-based retrieval model is no longer applicable):

Standard retrieval is the new name for what Glacier already provides, and is the default for all API-driven retrieval requests. You get your data back in a matter of hours (typically 3 to 5), and pay $0.01 per GB along with $0.05 for every 1,000 requests.

Expedited retrieval addresses the need for “while you wait access.” You can get your data back quickly, with retrieval typically taking 1 to 5 minutes.  If you store (or plan to store) more than 100 TB of data in Glacier and need to make infrequent, yet urgent requests for subsets of your data, this is a great model for you (if you have less data, S3’s Infrequent Access storage class can be a better value). Retrievals cost $0.03 per GB and $0.01 per request.

Retrieval generally takes between 1 and 5 minutes, depending on overall demand. If you need to get your data back in this time frame even in rare situations where demand is exceptionally high, you can provision retrieval capacity. Once you have done this, all Expedited retrievals will automatically be served via your Provisioned capacity. Each unit of Provisioned capacity costs $100 per month and ensures that you can perform at least 3 Expedited Retrievals every 5 minutes, with up to 150 MB/second of retrieval throughput.

Bulk retrieval is a great fit for planned or non-urgent use cases, with retrieval typically taking 5 to 12 hours at a cost of $0.0025 per GB (75% less than for Standard Retrieval) along with $0.025 for every 1,000 requests. Bulk retrievals are perfect when you need to retrieve large amounts of data within a day, and are willing to wait a few extra hours in exchange for a very significant discount.

If you do not specify a retrieval option when you call InitiateJob to retrieve an archive, a Standard Retrieval will be initiated. Your existing jobs will continue to work as expected, and will be charged at the new rate.

To learn more, read about Data Retrieval in the Glacier FAQ.

As always, I am thrilled to be able to share this news with you, and I hope that you are equally excited!

Jeff;