Tag Archives: Case Studies

New free online course about building makerspaces

Post Syndicated from Andrew Collins original https://www.raspberrypi.org/blog/futurelearn-course-makerspace/

Helping people to get into making is at the heart of what we do, and so we’ve created a brand-new, free online course to support educators to start their own makerspaces. If you’re interested in the maker movement, then this course is for you! Sign up now and start learning with Build a Makerspace for Young People on FutureLearn.

Building a makerspace – free online learning

Find out how to create and run a makerspace for young people. Look at the pedagogy and approaches behind digital making.

Dive into the maker movement

From planning to execution, this course will cover everything you need to know to set up and lead your very own makerspace. You’ll learn about different approaches to designing makerspace environments, understand the pedagogy that underpins the maker movement, and create your own makerspace action plan. By the end of the course, you will be well versed in makerspace culture, and you’ll have the skills and knowledge to build a successful and thriving makerspace in your community.

Raspberry Pi Makerspace FutureLearn Online Course

Let makerspace experts lead your journey

This new course features five fantastic case studies about real-life makerspace educators. They’ll share their stories of starting a makerspace: what worked, what didn’t, and what’s next on their journey. Hear from Jessica Simons as she describes her experience starting the MCHS Maker Lab, connect with Patrick Ferrell as he details his teaching at the Jocelyn H. Lee Innovation Lab, and learn from Nick Provenzano as he shares his top tips on how to ensure the legacy of your makerspace. These accomplished educators will give you their practical advice and expert insights, helping you learn the best practices of starting a makerspace environment.

Raspberry Pi Makerspace FutureLearn Online Course

Connect with educators worldwide

By taking this course, you’ll also be connecting with talented and like-minded educators from across the globe. This is your opportunity to develop a community of practice while learning from fellow teachers, librarians, and community leaders who are also engaged in the maker movement.

“I like this course and how it progresses from introducing the concept of makerspaces and how they have come to education, all the way through to creating my own action plan to get started.”— Makerspace Educator in Hayward, California USA

Sign up now

The first run of our Build a Makerspace for Young People course starts on 12 March 2018. You can sign up and access all content for four weeks. After that period, we’ll run the course again multiple times throughout the year. Enjoy, and happy making!

The post New free online course about building makerspaces appeared first on Raspberry Pi.

Amazon Redshift – 2017 Recap

Post Syndicated from Larry Heathcote original https://aws.amazon.com/blogs/big-data/amazon-redshift-2017-recap/

We have been busy adding new features and capabilities to Amazon Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your Amazon Redshift implementation.

In 2017, we made more than 30 announcements about Amazon Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of Amazon Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.

To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.

Major launches in 2017

Amazon Redshift Spectrumextend analytics to your data lake, without moving data

We launched Amazon Redshift Spectrum to give you the freedom to store data in Amazon S3, in open file formats, and have it available for analytics without the need to load it into your Amazon Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.

With Redshift Spectrum, you can run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. And you can do it without loading data or resizing the Amazon Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.

For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.

To learn more about Redshift Spectrum, watch our AWS Summit session Intro to Amazon Redshift Spectrum: Now Query Exabytes of Data in S3, and read our announcement blog post Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data.

DC2 nodes—twice the performance of DC1 at the same price

We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.

Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”

Read our customer testimonials to see the performance gains our customers are experiencing with DC2 nodes. To learn more, read our blog post Amazon Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price.

Performance enhancements— 3x-5x faster queries

On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.

We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.

We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.

We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, Amazon Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.

We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.

Customer insights

Amazon Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using Amazon Redshift to drive innovation and business results.

In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with Amazon Redshift:

Partner solutions

You can enhance your Amazon Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with Amazon Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our Amazon Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.

To see what our Partners are saying about Amazon Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:

Resources

Blog posts

Visit the AWS Big Data Blog for a list of all Amazon Redshift articles.

YouTube videos

GitHub

Our community of experts contribute on GitHub to provide tips and hints that can help you get the most out of your deployment. Visit GitHub frequently to get the latest technical guidance, code samples, administrative task automation utilities, the analyze & vacuum schema utility, and more.

Customer support

If you are evaluating or considering a proof of concept with Amazon Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to Amazon Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.

If you are an Amazon Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing Amazon Redshift and Amazon Redshift Spectrum for your specific workloads. To learn more, email us at [email protected].

If you have any questions, email us at [email protected].

 


Additional Reading

If you found this post useful, be sure to check out Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data, Using Amazon Redshift for Fast Analytical Reports and How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS.


About the Author

Larry Heathcote is a Principle Product Marketing Manager at Amazon Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.

 

 

 

Estimating the Cost of Internet Insecurity

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/estimating_the_.html

It’s really hard to estimate the cost of an insecure Internet. Studies are all over the map. A methodical study by RAND is the best work I’ve seen at trying to put a number on this. The results are, well, all over the map:

Estimating the Global Cost of Cyber Risk: Methodology and Examples“:

Abstract: There is marked variability from study to study in the estimated direct and systemic costs of cyber incidents, which is further complicated by the considerable variation in cyber risk in different countries and industry sectors. This report shares a transparent and adaptable methodology for estimating present and future global costs of cyber risk that acknowledges the considerable uncertainty in the frequencies and costs of cyber incidents. Specifically, this methodology (1) identifies the value at risk by country and industry sector; (2) computes direct costs by considering multiple financial exposures for each industry sector and the fraction of each exposure that is potentially at risk to cyber incidents; and (3) computes the systemic costs of cyber risk between industry sectors using Organisation for Economic Co-operation and Development input, output, and value-added data across sectors in more than 60 countries. The report has a companion Excel-based modeling and simulation platform that allows users to alter assumptions and investigate a wide variety of research questions. The authors used a literature review and data to create multiple sample sets of parameters. They then ran a set of case studies to show the model’s functionality and to compare the results against those in the existing literature. The resulting values are highly sensitive to input parameters; for instance, the global cost of cyber crime has direct gross domestic product (GDP) costs of $275 billion to $6.6 trillion and total GDP costs (direct plus systemic) of $799 billion to $22.5 trillion (1.1 to 32.4 percent of GDP).

Here’s Rand’s risk calculator, if you want to play with the parameters yourself.

Note: I was an advisor to the project.

Separately, Symantec has published a new cybercrime report with their own statistics.

Acoustical Attacks against Hard Drives

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/acoustical_atta.html

Interesting destructive attack: “Acoustic Denial of Service Attacks on HDDs“:

Abstract: Among storage components, hard disk drives (HDDs) have become the most commonly-used type of non-volatile storage due to their recent technological advances, including, enhanced energy efficacy and significantly-improved areal density. Such advances in HDDs have made them an inevitable part of numerous computing systems, including, personal computers, closed-circuit television (CCTV) systems, medical bedside monitors, and automated teller machines (ATMs). Despite the widespread use of HDDs and their critical role in real-world systems, there exist only a few research studies on the security of HDDs. In particular, prior research studies have discussed how HDDs can potentially leak critical private information through acoustic or electromagnetic emanations. Borrowing theoretical principles from acoustics and mechanics, we propose a novel denial-of-service (DoS) attack against HDDs that exploits a physical phenomenon, known as acoustic resonance. We perform a comprehensive examination of physical characteristics of several HDDs and create acoustic signals that cause significant vibrations in HDDs internal components. We demonstrate that such vibrations can negatively influence the performance of HDDs embedded in real-world systems. We show the feasibility of the proposed attack in two real-world case studies, namely, personal computers and CCTVs.

Serverless @ re:Invent 2017

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/serverless-reinvent-2017/

At re:Invent 2014, we announced AWS Lambda, what is now the center of the serverless platform at AWS, and helped ignite the trend of companies building serverless applications.

This year, at re:Invent 2017, the topic of serverless was everywhere. We were incredibly excited to see the energy from everyone attending 7 workshops, 15 chalk talks, 20 skills sessions and 27 breakout sessions. Many of these sessions were repeated due to high demand, so we are happy to summarize and provide links to the recordings and slides of these sessions.

Over the course of the week leading up to and then the week of re:Invent, we also had over 15 new features and capabilities across a number of serverless services, including AWS Lambda, Amazon API Gateway, AWS [email protected], AWS SAM, and the newly announced AWS Serverless Application Repository!

AWS Lambda

Amazon API Gateway

  • Amazon API Gateway Supports Endpoint Integrations with Private VPCs – You can now provide access to HTTP(S) resources within your VPC without exposing them directly to the public internet. This includes resources available over a VPN or Direct Connect connection!
  • Amazon API Gateway Supports Canary Release Deployments – You can now use canary release deployments to gradually roll out new APIs. This helps you more safely roll out API changes and limit the blast radius of new deployments.
  • Amazon API Gateway Supports Access Logging – The access logging feature lets you generate access logs in different formats such as CLF (Common Log Format), JSON, XML, and CSV. The access logs can be fed into your existing analytics or log processing tools so you can perform more in-depth analysis or take action in response to the log data.
  • Amazon API Gateway Customize Integration Timeouts – You can now set a custom timeout for your API calls as low as 50ms and as high as 29 seconds (the default is 30 seconds).
  • Amazon API Gateway Supports Generating SDK in Ruby – This is in addition to support for SDKs in Java, JavaScript, Android and iOS (Swift and Objective-C). The SDKs that Amazon API Gateway generates save you development time and come with a number of prebuilt capabilities, such as working with API keys, exponential back, and exception handling.

AWS Serverless Application Repository

Serverless Application Repository is a new service (currently in preview) that aids in the publication, discovery, and deployment of serverless applications. With it you’ll be able to find shared serverless applications that you can launch in your account, while also sharing ones that you’ve created for others to do the same.

AWS [email protected]

[email protected] now supports content-based dynamic origin selection, network calls from viewer events, and advanced response generation. This combination of capabilities greatly increases the use cases for [email protected], such as allowing you to send requests to different origins based on request information, showing selective content based on authentication, and dynamically watermarking images for each viewer.

AWS SAM

Twitch Launchpad live announcements

Other service announcements

Here are some of the other highlights that you might have missed. We think these could help you make great applications:

AWS re:Invent 2017 sessions

Coming up with the right mix of talks for an event like this can be quite a challenge. The Product, Marketing, and Developer Advocacy teams for Serverless at AWS spent weeks reading through dozens of talk ideas to boil it down to the final list.

From feedback at other AWS events and webinars, we knew that customers were looking for talks that focused on concrete examples of solving problems with serverless, how to perform common tasks such as deployment, CI/CD, monitoring, and troubleshooting, and to see customer and partner examples solving real world problems. To that extent we tried to settle on a good mix based on attendee experience and provide a track full of rich content.

Below are the recordings and slides of breakout sessions from re:Invent 2017. We’ve organized them for those getting started, those who are already beginning to build serverless applications, and the experts out there already running them at scale. Some of the videos and slides haven’t been posted yet, and so we will update this list as they become available.

Find the entire Serverless Track playlist on YouTube.

Talks for people new to Serverless

Advanced topics

Expert mode

Talks for specific use cases

Talks from AWS customers & partners

Looking to get hands-on with Serverless?

At re:Invent, we delivered instructor-led skills sessions to help attendees new to serverless applications get started quickly. The content from these sessions is already online and you can do the hands-on labs yourself!
Build a Serverless web application

Still looking for more?

We also recently completely overhauled the main Serverless landing page for AWS. This includes a new Resources page containing case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

AWS Achieves FedRAMP JAB Moderate Provisional Authorization for 20 Services in the AWS US East/West Region

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-jab-moderate-authorization-for-20-services-in-us-eastwest/

The AWS US East/West Region has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) Moderate baseline.

Though AWS has maintained an AWS US East/West Region Agency-ATO since early 2013, this announcement represents AWS’s carefully deliberated move to the JAB for the centralized maintenance of our P-ATO for 10 services already authorized. This also includes the addition of 10 new services to our FedRAMP program (see the complete list of services below). This doubles the number of FedRAMP Moderate services available to our customers to enable increased use of the cloud and support modernized IT missions. Our public sector customers now can leverage this FedRAMP P-ATO as a baseline for their own authorizations and look to the JAB for centralized Continuous Monitoring reporting and updates. In a significant enhancement for our partners that build their solutions on the AWS US East/West Region, they can now achieve FedRAMP JAB P-ATOs of their own for their Platform as a Service (PaaS) and Software as a Service (SaaS) offerings.

In line with FedRAMP security requirements, our independent FedRAMP assessment was completed in partnership with a FedRAMP accredited Third Party Assessment Organization (3PAO) on our technical, management, and operational security controls to validate that they meet or exceed FedRAMP’s Moderate baseline requirements. Effective immediately, you can begin leveraging this P-ATO for the following 20 services in the AWS US East/West Region:

  • Amazon Aurora (MySQL)*
  • Amazon CloudWatch Logs*
  • Amazon DynamoDB
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon EMR*
  • Amazon Glacier*
  • Amazon Kinesis Streams*
  • Amazon RDS (MySQL, Oracle, Postgres*)
  • Amazon Redshift
  • Amazon Simple Notification Service*
  • Amazon Simple Queue Service*
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service*
  • Amazon Virtual Private Cloud
  • AWS CloudFormation*
  • AWS CloudTrail*
  • AWS Identity and Access Management
  • AWS Key Management Service
  • Elastic Load Balancing

* Services with first-time FedRAMP Moderate authorizations

We continue to work with the FedRAMP Project Management Office (PMO), other regulatory and compliance bodies, and our customers and partners to ensure that we are raising the bar on our customers’ security and compliance needs.

To learn more about how AWS helps customers meet their security and compliance requirements, see the AWS Compliance website. To learn about what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. To review the public posting of our FedRAMP authorizations, see the FedRAMP Marketplace.

– Chris Gile, Senior Manager, AWS Public Sector Risk and Compliance

Backing Up the Modern Enterprise with Backblaze for Business

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/endpoint-backup-solutions/

Endpoint backup diagram

Organizations of all types and sizes need reliable and secure backup. Whether they have as few as 3 or as many as 300,000 computer users, an organization’s computer data is a valuable business asset that needs to be protected.

Modern organizations are changing how they work and where they work, which brings new challenges to making sure that company’s data assets are not only available, but secure. Larger organizations have IT departments that are prepared to address these needs, but often times in smaller and newer organizations the challenge falls upon office management who might not be as prepared or knowledgeable to face a work environment undergoing dramatic changes.

Whether small or large, local or world-wide, for-profit or non-profit, organizations need a backup strategy and solution that matches the new ways of working in the enterprise.

The Enterprise Has Changed, and So Has Data Use

More and more, organizations are working in the cloud. These days organizations can operate just fine without their own file servers, database servers, mail servers, or other IT infrastructure that used to be standard for all but the smallest organization.

The reality is that for most organizations, though, it’s a hybrid work environment, with a combination of cloud-based and PC and Macintosh-based applications. Legacy apps aren’t going away any time soon. They will be with us for a while, with their accompanying data scattered amongst all the desktops, laptops and other endpoints in corporate headquarters, home offices, hotel rooms, and airport waiting areas.

In addition, the modern workforce likely combines regular full-time employees, remote workers, contractors, and sometimes interns, volunteers, and other temporary workers who also use company IT assets.

The Modern Enterprise Brings New Challenges for IT

These changes in how enterprises work present a problem for anyone tasked with making sure that data — no matter who uses it or where it lives — is adequately backed-up. Cloud-based applications, when properly used and managed, can be adequately backed up, provided that users are connected to the internet and data transfers occur regularly — which is not always the case. But what about the data on the laptops, desktops, and devices used by remote employees, contractors, or just employees whose work keeps them on the road?

The organization’s backup solution must address all the needs of the modern organization or enterprise using both cloud and PC and Mac-based applications, and not be constrained by employee or computer location.

A Ten-Point Checklist for the Modern Enterprise for Backing Up

What should the modern enterprise look for when evaluating a backup solution?

1) Easy to deploy to workers’ computers

Whether installed by the computer user or an IT person locally or remotely, the backup solution must be easy to implement quickly with minimal demands on the user or administrator.

2) Fast and unobtrusive client software

Backups should happen in the background by efficient (native) PC and Macintosh software clients that don’t consume valuable processing power or take memory away from applications the user needs.

3) Easy to configure

The backup solutions must be easy to configure for both the user and the IT professional. Ease-of-use means less time to deploy, configure, and manage.

4) Defaults to backing up all valuable data

By default, the solution backs up commonly used files and folders or directories, including desktops. Some backup solutions are difficult and intimidating because they require that the user chose what needs to be backed up, often missing files and folders/directories that contain valuable data.

5) Works automatically in the background

Backups should happen automatically, no matter where the computer is located. The computer user, especially the remote or mobile one, shouldn’t be required to attach cables or drives, or remember to initiate backups. A working solution backs up automatically without requiring action by the user or IT administrator.

6) Data restores are fast and easy

Whether it’s a single file, directory, or an entire system that must be restored, a user or IT sysadmin needs to be able to restore backed up data as quickly as possible. In cases of large restores to remote locations, the ability to send a restore via physical media is a must.

7) No limitations on data

Throttling, caps, and data limits complicate backups and require guesses about how much storage space will be needed.

8) Safe & Secure

Organizations require that their data is secure during all phases of initial upload, storage, and restore.

9) Easy-to-manage

The backup solution needs to provide a clear and simple web management interface for all functions. Designing for ease-of-use leads to efficiency in management and operation.

10) Affordable and transparent pricing

Backup costs should be predictable, understandable, and without surprises.

Two Scenarios for the Modern Enterprise

Enterprises exist in many forms and types, but wanting to meet the above requirements is common across all of them. Below, we take a look at two common scenarios showing how enterprises face these challenges. Three case studies are available that provide more information about how Backblaze customers have succeeded in these environments.

Enterprise Profile 1

The needs of a smaller enterprise differ from those of larger, established organizations. This organization likely doesn’t have anyone who is devoted full-time to IT. The job of on-boarding new employees and getting them set up with a computer likely falls upon an executive assistant or office manager. This person might give new employees a checklist with the software and account information and lets users handle setting up the computer themselves.

Organizations in this profile need solutions that are easy to install and require little to no configuration. Backblaze, by default, backs up all user data, which lets the organization be secure in knowing all the data will be backed up to the cloud — including files left on the desktop. Combined with Backblaze’s unlimited data policy, organizations have a truly “set it and forget it” platform.

Customizing Groups To Meet Teams’ Needs

The Groups feature of Backblaze for Business allows an organization to decide whether an individual client’s computer will be Unmanaged (backups and restores under the control of the worker), or Managed, in which an administrator can monitor the status and frequency of backups and handle restores should they become necessary. One group for the entire organization might be adequate at this stage, but the organization has the option to add additional groups as it grows and needs more flexibility and control.

The organization, of course, has the choice of managing and monitoring users using Groups. With Backblaze’s Groups, organizations can set user-based access rules, which allows the administrator to create restores for lost files or entire computers on an employee’s behalf, to centralize billing for all client computers in the organization, and to redeploy a recovered computer or new computer with the backed up data.

Restores

In this scenario, the decision has been made to let each user manage her own backups, including restores, if necessary, of individual files or entire systems. If a restore of a file or system is needed, the restore process is easy enough for the user to handle it by herself.

Case Study 1

Read about how PagerDuty uses Backblaze for Business in a mixed enterprise of cloud and desktop/laptop applications.

PagerDuty Case Study

In a common approach, the employee can retrieve an accidentally deleted file or an earlier version of a document on her own. The Backblaze for Business interface is easy to navigate and was designed with feedback from thousands of customers over the course of a decade.

In the event of a lost, damaged, or stolen laptop,  administrators of Managed Groups can  initiate the restore, which could be in the form of a download of a restore ZIP file from the web management console, or the overnight shipment of a USB drive directly to the organization or user.

Enterprise Profile 2

This profile is for an organization with a full-time IT staff. When a new worker joins the team, the IT staff is tasked with configuring the computer and delivering it to the new employee.

Backblaze for Business Groups

Case Study 2

Global charitable organization charity: water uses Backblaze for Business to back up workers’ and volunteers’ laptops as they travel to developing countries in their efforts to provide clean and safe drinking water.

charity: water Case Study

This organization can take advantage of additional capabilities in Groups. A Managed Group makes sense in an organization with a geographically dispersed work force as it lets IT ensure that workers’ data is being regularly backed up no matter where they are. Billing can be company-wide or assigned to individual departments or geographical locations. The organization has the choice of how to divide the organization into Groups (location, function, subsidiary, etc.) and whether the Group should be Managed or Unmanaged. Using Managed Groups might be suitable for most of the organization, but there are exceptions in which sensitive data might dictate using an Unmanaged Group, such as could be the case with HR, the executive team, or finance.

Deployment

By Invitation Email, Link, or Domain

Backblaze for Business allows a number of options for deploying the client software to workers’ computers. Client installation is fast and easy on both Windows and Macintosh, so sending email invitations to users or automatically enrolling users by domain or invitation link, is a common approach.

By Remote Deployment

IT might choose to remotely and silently deploy Backblaze for Business across specific Groups or the entire organization. An administrator can silently deploy the Backblaze backup client via the command-line, or use common RMM (Remote Monitoring and Management) tools such as Jamf and Munki.

Restores

Case Study 3

Read about how Bright Bear Technology Solutions, an IT Managed Service Provider (MSP), uses the Groups feature of Backblaze for Business to manage customer backups and restores, deploy Backblaze licenses to their customers, and centralize billing for all their client-based backup services.

Bright Bear Case Study

Some organizations are better equipped to manage or assist workers when restores become necessary. Individual users will be pleased to discover they can roll-back files to an earlier version if they wish, but IT will likely manage any complete system restore that involves reconfiguring a computer after a repair or requisitioning an entirely new system when needed.

This organization might chose to retain a client’s entire computer backup for archival purposes, using Backblaze B2 as the cloud storage solution. This is another advantage of having a cloud storage provider that combines both endpoint backup and cloud object storage among its services.

The Next Step: Server Backup & Data Archiving with B2 Cloud Storage

As organizations grow, they have increased needs for cloud storage beyond Macintosh and PC data backup. Backblaze’s object cloud storage, Backblaze B2, provides low-cost storage and archiving of records, media, and server data that can grow with the organization’s size and needs.

B2 Cloud Storage is available through the same Backblaze management console as Backblaze Computer Backup. This means that Admins have one console for billing, monitoring, deployment, and role provisioning. B2 is priced at 1/4 the cost of Amazon S3, or $0.005 per month per gigabyte (which equals $5/month per terabyte).

Why Modern Enterprises Chose Backblaze

Backblaze for Business

Businesses and organizations select Backblaze for Business for backup because Backblaze is designed to meet the needs of the modern enterprise. Backblaze customers are part of a a platform that has a 10+ year track record of innovation and over 400 petabytes of customer data already under management.

Backblaze’s backup model is proven through head-to-head comparisons to back up data that other backup solutions overlook in their default configurations — including valuable files that are needed after an accidental deletion, theft, or computer failure.

Backblaze is the only enterprise-level backup company that provides TOTP (Time-based One-time Password) via both SMS and Authentication app to all accounts at no incremental charge. At just $50/year/computer, Backblaze is affordable for any size of enterprise.

Modern Enterprises can Meet The Challenge of The Changing Data Environment

With the right backup solution and strategy, the modern enterprise will be prepared to ensure that its data is protected from accident, disaster, or theft, whether its data is in one office or dispersed among many locations, and remote and mobile employees.

Backblaze for Business is an affordable solution that enables organizations to meet the evolving data demands facing the modern enterprise.

The post Backing Up the Modern Enterprise with Backblaze for Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Prime Day 2017 – Powered by AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/prime-day-2017-powered-by-aws/

The third annual Prime Day set another round of records for global orders, topping Black Friday and Cyber Monday, making it the biggest day in Amazon retail history. Over the course of the 30 hour event, tens of millions of Prime members purchased things like Echo Dots, Fire tablets, programmable pressure cookers, espresso machines, rechargeable batteries, and much more! July 11th also set a record for the number of new Prime memberships, as people signed up in order to take advantage of hundreds of thousands of deals. Amazon customers shopped online and made heavy use of the Amazon App, with mobile orders more than doubling from last Prime Day.

Powered by AWS
Last year I told you about How AWS Powered Amazon’s Biggest Day Ever, and shared what the team had learned with regard to preparation, automation, monitoring, and thinking big. All of those lessons still apply and you can read that post to learn more. Preparation for this year’s Prime Day (which started just days after Prime Day 2016 wrapped up) started by collecting and sharing best practices and identifying areas for improvement, proceeding to implementation and stress testing as the big day approached. Two of the best practices involve auditing and GameDay:

Auditing – This is a formal way for us to track preparations, identify risks, and to track progress against our objectives. Each team must respond to a series of detailed technical and operational questions that are designed to help them determine their readiness. On the technical side, questions could revolve around time to recovery after a database failure, including the all-important check of the TTL (time to live) for the CNAME. Operational questions address schedules for on-call personnel, points of contact, and ownership of services & instances.

GameDay – This practice (which I believe originated with former Amazonian Jesse Robbins), is intended to validate all of the capacity planning & preparation and to verify that all of the necessary operational practices are in place and work as expected. It introduces simulated failures and helps to train the team to identify and quickly resolve issues, building muscle memory in the process. It also tests failover and recovery capabilities, and can expose latent defects that are lurking under the covers. GameDays help teams to understand scaling drivers (page views, orders, and so forth) and gives them an opportunity to test their scaling practices. To learn more, read Resilience Engineering: Learning to Embrace Failure or watch the video: GameDay: Creating Resiliency Through Destruction.

Prime Day 2017 Metrics
So, how did we do this year?

The AWS teams checked their dashboards and log files, and were happy to share their metrics with me. Here are a few of the most interesting ones:

Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it.

NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat.

Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line.

API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day.

Configuration TrackingAWS Config generated over 14 million Configuration items for AWS resources.

You Can Do It
Running an event that is as large, complex, and mission-critical as Prime Day takes a lot of planning. If you have an event of this type in mind, please take a look at our new Infrastructure Event Readiness white paper. Inside, you will learn how to design and provision your applications to smoothly handle planned scaling events such as product launches or seasonal traffic spikes, with sections on automation, resiliency, cost optimization, event management, and more.

Jeff;

 

AWS HIPAA Eligibility Update (July 2017) – Eight Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-update-july-2017-eight-additional-services/

It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

In May we announced that we added Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon Simple Queue Service (SQS) to our list of HIPAA eligible services and discussed our how customers and partners are putting them to use.

Eight More Eligible Services
Today I am happy to share the news that we are adding another eight services to the list:

Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).

AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.

AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.

Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.

Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.

AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.

Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.

Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.

Additional HIPAA Resources
Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:

Keep in Touch
In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.

Jeff;

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

Hiring a Content Director

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/hiring-content-director/


Backblaze is looking to hire a full time Content Director. This role is an essential piece of our team, reporting directly to our VP of Marketing. As the hiring manager, I’d like to tell you a little bit more about the role, how I’m thinking about the collaboration, and why I believe this to be a great opportunity.

A Little About Backblaze and the Role

Since 2007, Backblaze has earned a strong reputation as a leader in data storage. Our products are astonishingly easy to use and affordable to purchase. We have engaged customers and an involved community that helps drive our brand. Our audience numbers in the millions and our primary interaction point is the Backblaze blog. We publish content for engineers (data infrastructure, topics in the data storage world), consumers (how to’s, merits of backing up), and entrepreneurs (business insights). In all categories, our Content Director drives our earned positioned as leaders.

Backblaze has a culture focused on being fair and good (to each other and our customers). We have created a sustainable business that is profitable and growing. Our team places a premium on open communication, being cleverly unconventional, and helping each other out. The Content Director, specifically, balances our needs as a commercial enterprise (at the end of the day, we want to sell our products) with the custodianship of our blog (and the trust of our audience).

There’s a lot of ground to be covered at Backblaze. We have three discreet business lines:

  • Computer Backup -> a 10 year old business focusing on backing up consumer computers.
  • B2 Cloud Storage -> Competing with Amazon, Google, and Microsoft… just at ¼ of the price (but with the same performance characteristics).
  • Business Backup -> Both Computer Backup and B2 Cloud Storage, but focused on SMBs and enterprise.

The Best Candidate Is…

An excellent writer – possessing a solid academic understanding of writing, the creative process, and delivering against deadlines. You know how to write with multiple voices for multiple audiences. We do not expect our Content Director to be a storage infrastructure expert; we do expect a facility with researching topics, accessing our engineering and infrastructure team for guidance, and generally translating the technical into something easy to understand. The best Content Director must be an active participant in the business/ strategy / and editorial debates and then must execute with ruthless precision.

Our Content Director’s “day job” is making sure the blog is running smoothly and the sales team has compelling collateral (emails, case studies, white papers).

Specifically, the Perfect Content Director Excels at:

  • Creating well researched, elegantly constructed content on deadline. For example, each week, 2 articles should be published on our blog. Blog posts should rotate to address the constituencies for our 3 business lines – not all blog posts will appeal to everyone, but over the course of a month, we want multiple compelling pieces for each segment of our audience. Similarly, case studies (and outbound emails) should be tailored to our sales team’s proposed campaigns / audiences. The Content Director creates ~75% of all content but is responsible for editing 100%.
  • Understanding organic methods for weaving business needs into compelling content. The majority of our content (but not EVERY piece) must tie to some business strategy. We hate fluff and hold our promotional content to a standard of being worth someone’s time to read. To be effective, the Content Director must understand the target customer segments and use cases for our products.
  • Straddling both Consumer & SaaS mechanics. A key part of the job will be working to augment the collateral used by our sales team for both B2 Cloud Storage and Business Backup. This content should be compelling and optimized for converting leads. And our foundational business line, Computer Backup, deserves to be nurtured and grown.
  • Product marketing. The Content Director “owns” the blog. But also assists in writing cases studies / white papers, creating collateral (email, trade show). Each of these things has a variety of call to action(s) and audiences. Direct experience is a plus, experience that will plausibly translate to these areas is a requirement.
  • Articulating views on storage, backup, and cloud infrastructure. Not everyone has experience with this. That’s fine, but if you do, it’s strongly beneficial.

A Thursday In The Life:

  • Coordinate Collaborators – We are deliverables driven culture, not a meeting driven one. We expect you to collaborate with internal blog authors and the occasional guest poster.
  • Collaborate with Design – Ensure imagery for upcoming posts / collateral are on track.
  • Augment Sales team – Lock content for next week’s outbound campaign.
  • Self directed blog agenda – Feedback for next Tuesday’s post is addressed, next Thursday’s post is circulated to marketing team for feedback & SEO polish.
  • Review Editorial calendar, make any changes.

Oh! And We Have Great Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee & fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Interested in Joining Our Team?

Send us an email to [email protected] with the subject “Content Director”. Please include your resume and 3 brief abstracts for content pieces.
Some hints for each of your three abstracts:

  • Create a compelling headline
  • Write clearly and concisely
  • Be brief, each abstract should be 100 words or less – no longer
  • Target each abstract to a different specific audience that is relevant to our business lines

Thank you for taking the time to read and consider all this. I hope it sounds like a great opportunity for you or someone you know. Principles only need apply.

The post Hiring a Content Director appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Reading Analytics and Privacy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/04/reading_analyti.html

Interesting paper: “The rise of reading analytics and the emerging calculus of reading privacy in the digital world,” by Clifford Lynch:

Abstract: This paper studies emerging technologies for tracking reading behaviors (“reading analytics”) and their implications for reader privacy, attempting to place them in a historical context. It discusses what data is being collected, to whom it is available, and how it might be used by various interested parties (including authors). I explore means of tracking what’s being read, who is doing the reading, and how readers discover what they read. The paper includes two case studies: mass-market e-books (both directly acquired by readers and mediated by libraries) and scholarly journals (usually mediated by academic libraries); in the latter case I also provide examples of the implications of various authentication, authorization and access management practices on reader privacy. While legal issues are touched upon, the focus is generally pragmatic, emphasizing technology and marketplace practices. The article illustrates the way reader privacy concerns are shifting from government to commercial surveillance, and the interactions between government and the private sector in this area. The paper emphasizes U.S.-based developments.

Classifying Elections as "Critical Infrastructure"

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/01/should_election.html

I am co-author on a paper discussing whether elections be classified as “critical infrastructure” in the US, based on experiences in other countries:

Abstract: With the Russian government hack of the Democratic National Convention email servers, and further leaks expected over the coming months that could influence an election, the drama of the 2016 U.S. presidential race highlights an important point: Nefarious hackers do not just pose a risk to vulnerable companies, cyber attacks can potentially impact the trajectory of democracies. Yet, to date, a consensus has not been reached as to the desirability and feasibility of reclassifying elections, in particular voting machines, as critical infrastructure due in part to the long history of local and state control of voting procedures. This Article takes on the debate in the U.S. using the 2016 elections as a case study but puts the issue in a global context with in-depth case studies from South Africa, Estonia, Brazil, Germany, and India. Governance best practices are analyzed by reviewing these differing approaches to securing elections, including the extent to which trend lines are converging or diverging. This investigation will, in turn, help inform ongoing minilateral efforts at cybersecurity norm building in the critical infrastructure context, which are considered here for the first time in the literature through the lens of polycentric governance.

The paper was speculative, but now it’s official. The U.S. election has been classified as critical infrastructure. I am tentatively in favor of this, but what really matter is what happens now. What does this mean? What sorts of increased security will election systems get? Will we finally get rid of computerized touch-screen voting?

EDITED TO ADD (1/16): This is a good article.

Now Open – AWS London Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-london-region/

Last week we launched our 15th AWS Region and today we are launching our 16th. We have expanded the AWS footprint into the United Kingdom with a new Region in London, our third in Europe. AWS customers can use the new London Region to better serve end-users in the United Kingdom and can also use it to store data in the UK.

The Details
The new London Region provides a broad suite of AWS services including Amazon CloudWatch, Amazon DynamoDB, Amazon ECS, Amazon ElastiCache, Amazon Elastic Block Store (EBS), Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon EMR, Amazon Glacier, Amazon Kinesis Streams, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Elastic Beanstalk, AWS Snowball, AWS Snowmobile, AWS Key Management Service (KMS), AWS Marketplace, AWS OpsWorks, AWS Personal Health Dashboard, AWS Shield Standard, AWS Storage Gateway, AWS Support API, Elastic Load Balancing, VM Import/Export, Amazon CloudFront, Amazon Route 53, AWS WAF, AWS Trusted Advisor, and AWS Direct Connect (follow the links for pricing and other information).

The London Region supports all sizes of C4, D2, M4, T2, and X1 instances.

Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

From Our Customers
Many AWS customers are getting ready to use this new Region. Here’s a very small sample:

Trainline is Europe’s number one independent rail ticket retailer. Every day more than 100,000 people travel using tickets bought from Trainline. Here’s what Mark Holt (CTO of Trainline) shared with us:

We recently completed the migration of 100 percent of our eCommerce infrastructure to AWS and have seen awesome results: improved security, 60 percent less downtime, significant cost savings and incredible improvements in agility. From extensive testing, we know that 0.3s of latency is worth more than 8 million pounds and so, while AWS connectivity is already blazingly fast, we expect that serving our UK customers from UK datacenters should lead to significant top-line benefits.

Kainos Evolve Electronic Medical Records (EMR) automates the creation, capture and handling of medical case notes and operational documents and records, allowing healthcare providers to deliver better patient safety and quality of care for several leading NHS Foundation Trusts and market leading healthcare technology companies.

Travis Perkins, the largest supplier of building materials in the UK, is implementing the biggest systems and business change in its history including the migration of its datacenters to AWS.

Just Eat is the world’s leading marketplace for online food delivery. Using AWS, JustEat has been able to experiment faster and reduce the time to roll out new feature updates.

OakNorth, a new bank focused on lending between £1m-£20m to entrepreneurs and growth businesses, became the UK’s first cloud-based bank in May after several months of working with AWS to drive the development forward with the regulator.

Partners
I’m happy to report that we are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in the United Kingdom. Here’s a partial list:

  • AWS Premier Consulting Partners – Accenture, Claranet, Cloudreach, CSC, Datapipe, KCOM, Rackspace, and Slalom.
  • AWS Consulting Partners – Attenda, Contino, Deloitte, KPMG, LayerV, Lemongrass, Perfect Image, and Version 1.
  • AWS Technology Partners – Splunk, Sage, Sophos, Trend Micro, and Zerolight.
  • AWS Managed Service Partners – Claranet, Cloudreach, KCOM, and Rackspace.
  • AWS Direct Connect Partners – AT&T, BT, Hutchison Global Communications, Level 3, Redcentric, and Vodafone.

Here are a few examples of what our partners are working on:

KCOM is a professional services provider offering consultancy, architecture, project delivery and managed service capabilities to large UK-based enterprise businesses. The scalability and flexibility of AWS gives them a significant competitive advantage with their enterprise and public sector customers. The new Region will allow KCOM to build innovative solutions for their public sector clients while meeting local regulatory requirements.

Splunk is a member of the AWS Partner Network and a market leader in analyzing machine data to deliver operational intelligence for security, IT, and the business. They use cloud computing and big data analytics to help their customers to embrace digital transformation and continuous innovation. The new Region will provide even more companies with real-time visibility into the operation of their systems and infrastructure.

Redcentric is a NHS Digital-approved N3 Commercial Aggregator. Their work allows health and care providers such as NHS acute, emergency and mental trusts, clinical commissioning groups (CCGs), and the ISV community to connect securely to AWS. The London Region will allow health and care providers to deliver new digital services and to improve outcomes for citizens and patients.

Visit the AWS Partner Network page to read some case studies and to learn how to join.

Compliance & Connectivity
Every AWS Region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, SOC3, PCI DSS Level 1, and many more. Our Cloud Compliance page includes information about these standards, along with those that are specific to the UK, including Cyber Essentials Plus.

The UK Government recognizes that local datacenters from hyper scale public cloud providers can deliver secure solutions for OFFICIAL workloads. In order to meet the special security needs of public sector organizations in the UK with respect to OFFICIAL workloads, we have worked with our Direct Connect Partners to make sure that obligations for connectivity to the Public Services Network (PSN) and N3 can be met.

Use it Today
The London Region is open for business now and you can start using it today! If you need additional information about this Region, please feel free to contact our UK team at [email protected]on.com.

Jeff;

Generate Your Own API Gateway Developer Portal

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/generate-your-own-api-gateway-developer-portal/


Shiva Krishnamurthy, Sr. Product Manager

Amazon API Gateway helps you quickly build highly scalable, secure, and robust APIs. Developers who want to consume your API to build web, mobile, or other types of apps need a site where they can learn about the API, acquire access, and manage their consumption. You can do this by hosting a developer portal: a web application that lists your APIs in catalog form, displays API documentation to help developers understand your API, and allows them to sign up for access. The application also needs to let developers test your APIs, and provide feedback to grow a successful developer ecosystem. Finally, you may also want to monetize your APIs and grow API product revenue.

Today, we published aws-api-gateway-developer-portal, an open source serverless web application that you can use to get started building your own developer portal. In just a few easy steps, you can generate a serverless web application that lists your APIs on API Gateway in catalog form, and allows for developer signups. The application is built using aws-serverless-express, an open source library that we published recently, that makes it incredibly easy to build serverless web applications using API Gateway and AWS Lambda. The application uses a SAM (Serverless Application Model) template to deploy its serverless resources, and is very easy to deploy and operate.

Walkthrough

In this post, I walk through the steps for creating a sample API and developer portal for third-party developer consumption:

  • Create an API or import a Swagger definition
  • Deploy the API for developer access
  • Document the API for easier developer consumption
  • Structure it into a usage plan to set access control policies, and set throttling limits.
  • Use aws-api-gateway-developer-portal to generate a developer portal, and then set up and configure the portal
  • Log in as a developer and verify the process of consuming an API.

You can also see how to create a AWS Marketplace listing and monetize your API.

Create an API

To get started, log in to the Amazon API Gateway console and import the example API (PetStore). You can also create your own APIs using Swagger, the console, or APIs.

Deploy the API

After creation, deploy the API to a stage to make it accessible to developers, and ready for testing. For this example, use “Marketplace” as the name for the API stage.

Document the API

API Gateway recently announced support for API documentation. You can now document various parts of your API by either importing Swagger or using the API Gateway console to edit API documentation directly.

Navigate to the Documentation tab to add missing documentation and make it easier for developers to understand your API. After you are satisfied with the new documentation, choose Publish Documentation and choose the Marketplace stage created earlier to update the previously deployed PetStore API with the modified documentation.

Package the API into usage plans

Usage plans in API Gateway help you structure your API for developer access. You can associate API keys to usage plans, and set throttling limits and quotas on a per-API key basis. Choose Usage Plans in the console, create a new usage plan, and set throttling limits and quotas as shown below.

When your customers subscribe to this usage plan, their requests are throttled at 200 RPS, and they can each make only 200,000 requests per month. You can change these limits at any time.

Choose Next to create the usage plan. (Skip the API Key screen, and add the Marketplace stage we created earlier)

Generate and configure a developer portal

Now, generate a developer portal in order to list the usage plan that you created earlier. To do this, clone the aws-api-gateway-developer-portal into a local folder. The README includes instructions on how to get set up, but let’s run through it step-by-step. First, ensure that you have the latest version of Node.js installed. For this walkthrough, you use the latest version of the AWS CLI, so make sure that you have your AWS credentials handy. and that you have configured the AWS CLI with the access-key and secret-key using the `aws –configure’ command.

Set up the developer portal

Then, open up a terminal window and run `npm run setup’. This step modifies several project files in-line with the information that you supply—choose a name for the S3 bucket that store webpages for the developer portal, as well as a name for the Lambda function that wraps the Express.js web application. The S3 bucket names that you specify in this step must be region-unique, so use a prefix (eg. ‘my-org-dev-portal-artifacts’).

After entering those details, it then creates the artifacts bucket on S3 (if the S3 bucket name you provided in the previous step doesn’t already exist). It also runs “package and deploy” on the SAM template using the new

command. It takes several minutes for CloudFormation to create the stack.

This step creates all the resources that you need for the developer portal, after setting up the IAM roles that are needed for the operation of this application. Navigate to the AWS CloudFormation console, and choose the Resources tab to see the full list of resources that have been created.

The core components that were created for your portal are listed below:

  • API Gateway API: Proxies HTTP requests to the back-end Lambda function (Express.js web server that serves website content)
  • Developer portal S3 website URL: Runs the developer portal site (publically accessible bucket)
  • Lambda function: Acts as the primary back end or API server
  • Cognito user and identity pools: Creates Cognito user pool and identity pool, and links them.
  • DynamoDB table – Stores a map of API key IDs to customer IDs, which lets you track the API key and customer association.
  • Lambda function Acts as a listener that subscribes to a particular SNS topic, which is useful when customers cancel or subscribe to an API.

Configure the developer portal post-setup

Additional files are then modified with values from your new resources, such as Cognito pool IDs, etc., and your web app is then uploaded to Amazon S3. After this step completes, it opens up your developer portal website in your default browser.

At this point, you have a complete serverless website ready. The application has been integrated with Cognito User Pools, which makes it easy for you to sign up developers and manage their access. For more information, see the Amazon Cognito Your User Pools – Now Generally Available post.

Edit the dev-portal/src/catalog.json file to list your APIs that appear in the developer portal. This can be a subset of all the APIs you have in API Gateway. Replace the contents of this file with your API definitions. For accuracy, export your APIs from API Gateway as Swagger, and copy them into the catalog.json file. Then, run `npm run upload-site’ to make these APIs available in the developer portal.

Verify the process

Test the end-user flow by signing up for access.

After you sign in as a developer, you can subscribe to an API, to receive an API key that you as a developer would use in your API requests.

You can also browse the API documentation that the API owner published (you, as of 5 minutes ago…), and learn more about the API.

You can even test an API with your API Key. Click the “Show API Key” button on the top right corner of the page, and copy your API Key. Next click the red alert icon, enter your API Key, and click Authorize. Finally, click the “Try it out!” button on any of your resources to make a request to your live API.

Your static content for the site is packaged into the folder ‘/dev-portal’. Modify the case studies, and getting started content in this folder; when you’re satisfied, run ‘npm run upload-site’ to push the changes to your portal.

Monetize your APIs

Amazon API Gateway now integrates with the AWS Marketplace to help API owners monetize their API and meter usage for their API products, without writing any code. Now, any API built using API Gateway can easily be published on the AWS Marketplace using the API Gateway console, AWS CLI, or AWS SDK.

API Gateway automatically and accurately meters API consumption for any such API published to the AWS Marketplace and send it to the AWS Marketplace Metering Service, enabling sellers to focus on adding value to their API products rather than building integrations with AWS Marketplace. For more information, see the Monetize your APIs in AWS Marketplace using API Gateway post.

To leverage this feature, you must offer a developer portal that accepts signups from AWS customers. You can use this developer portal implementation to either build your own from scratch, or use it to add functionality to your existing site.

Conclusion

To summarize, you started with an API on API Gateway and structured it for developer consumption. Then in just a few easy steps, you used aws-api-gateway-developer-portal, to spin up a developer portal on serverless architecture, ready to accept developer signups. This application needs no infrastructure management, scales out-of-the-box, and directly integrates with AWS Marketplace to help you monetize your APIs.

If you have any questions or suggestions, please comment below.

MLB Statcast – Featuring David Ortiz, Joe Torre, and Dave O’Brien

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/mlb-statcast-featuring-david-ortiz-joe-torre-and-dave-obrien/

One of my colleagues has been rooting for the Red Sox since she was in her teens. She recently had the opportunity to work with Red Sox legend and Hank Aaron award winner David Ortiz (aka “Big Papi”), Hall of Famer Joe Torre, and announcer Dave O’Brien (ESPN and NESN) to produce a fun video to commemorate Big Papi’s retirement from Major League Baseball. As you can see, he takes the idea of the quantified self very seriously, and is now measuring just about every aspect of his post-baseball life using AWS-powered Statcast in order to keep his competitive edge:

MLB Statcast uses high-resolution cameras and radar equipment to track the position of the ball (20,000 metrics per second) and the players (30 metrics per second) on the field, generating 7 terabytes of data per game, all stored in and processed by the AWS Cloud. The application uses multiple AWS services including Amazon CloudFront, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), Amazon ElastiCache, Amazon Simple Storage Service (S3), AWS Direct Connect, and AWS Lambda; here’s a big-picture look at the architecture:

To learn more, read our case study: Major League Baseball Fields Big Data, and Excitement, with AWS.

If you are coming to AWS re:Invent, get your shoulders in to shape and come visit our Statcast-equipped batting cage! You will be able to measure your metrics and see how you’d do in the big leagues.


Jeff;

 

Amazon WorkSpaces Update – Hourly Usage and Expanded Root Volume

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-workspaces-update-hourly-usage-and-expanded-root-volume/

In my recent post, I Love My Amazon WorkSpace, I shared the story of how I became a full-time user and big fan of Amazon WorkSpaces. Since writing the post I have heard similar sentiments from several other AWS customers.

Today I would like to tell you about some new and recent developments that will make WorkSpaces more economical, more flexible, and more useful:

  • Hourly WorkSpaces – You can now pay for your WorkSpace by the hour.
  • Expanded Root Volume – Newly launched WorkSpaces now have an 80 GB root volume.

Let’s take a closer look at these new features.

Hourly WorkSpaces
If you only need part-time access to your WorkSpace, you (or your organization, to be more precise) will benefit from this feature. In addition to the existing monthly billing, you can now use and pay for a WorkSpace on an hourly basis, allowing you to save money on your AWS bill. If you are a part-time employee, a road warrior, share your job with another part-timer, or work on multiple short-term projects, this feature is for you. It is also a great fit for corporate training, education, and remote administration.

There are now two running modes – AlwaysOn and AutoStop:

  • AlwaysOn – This is the existing mode. You have instant access to a WorkSpace that is always running, billed by the month.
  • AutoStop – This is new. Your WorkSpace starts running and billing when you log in, and stops automatically when you remain disconnected for a specified period of time.

A WorkSpace that is running in AutoStop mode will automatically stop a predetermined amount of time after you disconnect (1 to 48 hours). Your WorkSpaces Adminstrator can also force a running WorkSpace to stop. When you next connect, the WorkSpace will resume, with all open documents and running programs intact. Resuming a stopped WorkSpace generally takes less than 90 seconds.

Your WorkSpaces Administrator has the ability to choose your running mode when launching your WorkSpace:

WorkSpaces Configuration

The Administrator can change the AutoStop time and the running mode at any point during the month. They can also track the number of working hours that your WorkSpace accumulates during the month using the new UserConnected CloudWatch metric, and switch from AutoStop to AlwaysOn when this becomes more economical. Switching from hourly to monthly billing takes place upon request; however, switching the other way takes place at the the start of the following month.

All new Amazon WorkSpaces can take advantage of hourly billing today. If you’re using a custom image for your WorkSpaces, you’ll need to refresh your custom images from the latest Amazon WorkSpaces bundles. The ability for existing WorkSpaces to switch to hourly billing will be added in the future.

To learn more about pricing for hourly WorkSpaces, visit the WorkSpaces Pricing page.

Expanded Root Volume
By popular demand we have expanded the size of the root volume for newly launched WorkSpaces to 80 GB, allowing you to run more applications and store more data at no additional cost. Your WorkSpaces Administrator can rebuild existing WorkSpaces in order to upgrade them to the larger root volumes (read Rebuild a WorkSpace to learn more). Rebuilding a WorkSpace will restore the root volume (C:) to the most recent image of the bundle that was used to create the WorkSpace. It will also restore the data volume (D:) from the last automatic snapshot.

Some WorkSpaces Resources
While I have your attention, I would like to let you know about a couple of other important WorkSpaces resources:

Available Now
The features that I described above are available now and you can start using them today!


Jeff;

How AWS Powered Amazon’s Biggest Day Ever

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/how-aws-powered-amazons-biggest-day-ever/

The second annual Prime Day was another record-breaking success for Amazon, surpassing global orders compared to Black Friday, Cyber Monday and Prime Day 2015.

According to a report published by Slice Intelligence, Amazon accounted for 74% of all US consumer e-commerce on Prime Day 2016. This one-day only global shopping event, exclusively for Amazon Prime members saw record-high levels of traffic including double the number of orders on the Amazon Mobile App compared to Prime Day 2015. Members around the world purchased more than 2 million toys, more than 1 million pairs of shoes and more than 90,000 TVs in one day (see Amazon’s Prime Day is the Biggest Day Ever for more stats). An event of this scale requires infrastructure that can easily scale up to match the surge in traffic.

Scaling AWS-Style
The Amazon retail site uses a fleet of EC2 instances to handle web traffic. To serve the massive increase in customer traffic for Prime Day, the Amazon retail team increased the size of their EC2 fleet, adding capacity that was equal to all of AWS and Amazon.com back in 2009. Resources were drawn from multiple AWS regions around the world.

The morning of July 11th was cool and a few morning clouds blanketed Amazon’s Seattle headquarters. As 8 AM approached, the Amazon retail team was ready for the first of 10 global Prime Day launches. Across the Pacific, it was almost midnight. In Japan, mobile phones, tablets, and laptops glowed in anticipation of Prime Day deals. As traffic began to surge in Japan, CloudWatch metrics reflected the rising fleet utilization as CloudFront endpoints and ElastiCache nodes lit up with high-velocity mobile and web requests. This wave of traffic then circled the globe, arriving in Europe and the US over the course of 40 hours and generating 85 billion clickstream log entries. Orders surpassed Prime Day 2015 by more than 60% worldwide and more than 50% in the US alone. On the mobile side, more than one million customers downloaded and used the Amazon Mobile App for the first time.

As part of Prime Day, Amazon.com saw a significant uptick in their use of 38 different AWS services including:

To further illustrate the scale of Prime Day and the opportunity for other AWS customers to host similar large-scale, single-day events, let’s look at Prime Day through the lens of several AWS services:

  • Amazon Mobile Analytics events increased 1,661% compared to the same day the previous week.
  • Amazon’s use of CloudWatch metrics increased 400% worldwide on Prime Day, compared to the same day the previous week.
  • DynamoDB served over 56 billion extra requests worldwide on Prime Day compared to the same day the previous week.

Running on AWS
The AWS team treats Amazon.com just like any of our other big customers. The two organizations are business partners and communicate through defined support plans and channels. Sticking to this somewhat formal discipline helps the AWS team to improve the support plans and the communication processes for all AWS customers.

Running the Amazon website and mobile app on AWS makes short-term, large scale global events like Prime Day technically feasible and economically viable. When I joined Amazon.com back in 2002 (before the site moved to AWS), preparation for the holiday shopping season involved a lot of planning, budgeting, and expensive hardware acquisition. This hardware helped to accommodate the increased traffic, but the acquisition process meant that Amazon.com sat on unused and lightly utilized hardware after the traffic subsided. AWS enables customers to add the capacity required to power big events like Prime Day, and enables this capacity to be acquired in a much more elastic, cost-effective manner. All of the undifferentiated heavy lifting required to create an online event at this scale is now handled by AWS so the Amazon retail team can focus on delivering the best possible experience for its customers.

Lessons Learned
The Amazon retail team was happy that Prime Day was over, and ready for some rest, but they shared some of what they learned with me:

  • Prepare – Planning and testing are essential. Use historical metrics to help forecast and model future traffic, and to estimate your resource needs accordingly. Prepare for failures with GameDay exercises – intentionally breaking various parts of the infrastructure and the site in order to simulate several failure scenarios (read Resilience Engineering – Learning to Embrace Failure to learn more about GameDay exercises at Amazon).
  • Automate – Reduce manual efforts and automate everything. Take advantage of services that can scale automatically in response to demand – Route53 to automatically scale your DNS, Auto Scaling to scale your EC2 capacity according to demand, and Elastic Load Balancing for automatic failover and to balance traffic across multiple regions and availability zones (AZs).
  • Monitor – Use Amazon CloudWatch metrics and alarms liberally. CloudWatch monitoring helps you stay on top of your usage to ensure the best experience for your customers.
  • Think Big – Using AWS gave the team the resources to create another holiday season. Confidence in your infrastructure is what enables you to scale your big events.

As I mentioned before, nothing is stopping you from envisioning and implementing an event of this scale and scope!

I would encourage you to think big, and to make good use of our support plans and services. Our Solutions Architects and Technical Account Managers are ready to help, as are our APN Consulting Partners. If you are planning for a large-scale one-time event, give us a heads-up and we’ll work with you before and during the event.


Jeff;

PS – What did you buy on Prime Day?