Tag Archives: Amazon CloudFront

AWS Hot Startups – July 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-july-2017/

Welcome back to another month of Hot Startups! Every day, startups are creating innovative and exciting businesses, applications, and products around the world. Each month we feature a handful of startups doing cool things using AWS.

July is all about learning! These companies are focused on providing access to tools and resources to expand knowledge and skills in different ways.

This month’s startups:

  • CodeHS – provides fun and accessible computer science curriculum for middle and high schools.
  • Insight – offers intensive fellowships to grow technical talent in Data Science.
  • iTranslate – enables people to read, write, and speak in over 90 languages, anywhere in the world.

CodeHS (San Francisco, CA)

In 2012, Stanford students Zach Galant and Jeremy Keeshin were computer science majors and TAs for introductory classes when they noticed a trend among their peers. Many wished that they had been exposed to computer science earlier in life. In their senior year, Zach and Jeremy launched CodeHS to give middle and high schools the opportunity to provide a fun, accessible computer science education to students everywhere. CodeHS is a web-based curriculum pathway complete with teacher resources, lesson plans, and professional development opportunities. The curriculum is supplemented with time-saving teacher tools to help with lesson planning, grading and reviewing student code, and managing their classroom.

CodeHS aspires to empower all students to meaningfully impact the future, and believe that coding is becoming a new foundational skill, along with reading and writing, that allows students to further explore any interest or area of study. At the time CodeHS was founded in 2012, only 10% of high schools in America offered a computer science course. Zach and Jeremy set out to change that by providing a solution that made it easy for schools and districts to get started. With CodeHS, thousands of teachers have been trained and are teaching hundreds of thousands of students all over the world. To use CodeHS, all that’s needed is the internet and a web browser. Students can write and run their code online, and teachers can immediately see what the students are working on and how they are doing.

Amazon EC2, Amazon RDS, Amazon ElastiCache, Amazon CloudFront, and Amazon S3 make it possible for CodeHS to scale their site to meet the needs of schools all over the world. CodeHS also relies on AWS to compile and run student code in the browser, which is extremely important when teaching server-side languages like Java that powers the AP course. Since usage rises and falls based on school schedules, Amazon CloudWatch and ELBs are used to easily scale up when students are running code so they have a seamless experience.

Be sure to visit the CodeHS website, and to learn more about bringing computer science to your school, click here!

Insight (Palo Alto, CA)

Insight was founded in 2012 to create a new educational model, optimize hiring for data teams, and facilitate successful career transitions among data professionals. Over the last 5 years, Insight has kept ahead of market trends and launched a series of professional training fellowships including Data Science, Health Data Science, Data Engineering, and Artificial Intelligence. Finding individuals with the right skill set, background, and culture fit is a challenge for big companies and startups alike, and Insight is focused on developing top talent through intensive 7-week fellowships. To date, Insight has over 1,000 alumni at over 350 companies including Amazon, Google, Netflix, Twitter, and The New York Times.

The Data Engineering team at Insight is well-versed in the current ecosystem of open source tools and technologies and provides mentorship on the best practices in this space. The technical teams are continually working with external groups in a variety of data advisory and mentorship capacities, but the majority of Insight partners participate in professional sessions. Companies visit the Insight office to speak with fellows in an informal setting and provide details on the type of work they are doing and how their teams are growing. These sessions have proved invaluable as fellows experience a significantly better interview process and companies yield engaged and enthusiastic new team members.

An important aspect of Insight’s fellowships is the opportunity for hands-on work, focusing on everything from building big-data pipelines to contributing novel features to industry-standard open source efforts. Insight provides free AWS resources for all fellows to use, in addition to mentorships from the Data Engineering team. Fellows regularly utilize Amazon S3, Amazon EC2, Amazon Kinesis, Amazon EMR, AWS Lambda, Amazon Redshift, Amazon RDS, among other services. The experience with AWS gives fellows a solid skill set as they transition into the industry. Fellowships are currently being offered in Boston, New York, Seattle, and the Bay Area.

Check out the Insight blog for more information on trends in data infrastructure, artificial intelligence, and cutting-edge data products.

 

iTranslate (Austria)

When the App Store was introduced in 2008, the founders of iTranslate saw an opportunity to be part of something big. The group of four fully believed that the iPhone and apps were going to change the world, and together they brainstormed ideas for their own app. The combination of translation and mobile devices seemed a natural fit, and by 2009 iTranslate was born. iTranslate’s mission is to enable travelers, students, business professionals, employers, and medical staff to read, write, and speak in all languages, anywhere in the world. The app allows users to translate text, voice, websites and more into nearly 100 languages on various platforms. Today, iTranslate is the leading player for conversational translation and dictionary apps, with more than 60 million downloads and 6 million monthly active users.

iTranslate is breaking language barriers through disruptive technology and innovation, enabling people to translate in real time. The app has a variety of features designed to optimize productivity including offline translation, website and voice translation, and language auto detection. iTranslate also recently launched the world’s first ear translation device in collaboration with Bragi, a company focused on smart earphones. The Dash Pro allows people to communicate freely, while having a personal translator right in their ear.

iTranslate started using Amazon Polly soon after it was announced. CEO Alexander Marktl said, “As the leading translation and dictionary app, it is our mission at iTranslate to provide our users with the best possible tools to read, write, and speak in all languages across the globe. Amazon Polly provides us with the ability to efficiently produce and use high quality, natural sounding synthesized speech.” The stable and simple-to-use API, low latency, and free caching allow iTranslate to scale as they continue adding features to their app. Customers also enjoy the option to change speech rate and change between male and female voices. To assure quality, speed, and reliability of their products, iTranslate also uses Amazon EC2, Amazon S3, and Amazon Route 53.

To get started with iTranslate, visit their website here.

—–

Thanks for reading!

-Tina

AWS HIPAA Eligibility Update (July 2017) – Eight Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-update-july-2017-eight-additional-services/

It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

In May we announced that we added Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon Simple Queue Service (SQS) to our list of HIPAA eligible services and discussed our how customers and partners are putting them to use.

Eight More Eligible Services
Today I am happy to share the news that we are adding another eight services to the list:

Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).

AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.

AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.

Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.

Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.

AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.

Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.

Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.

Additional HIPAA Resources
Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:

Keep in Touch
In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.

Jeff;

[email protected] – Intelligent Processing of HTTP Requests at the Edge

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/lambdaedge-intelligent-processing-of-http-requests-at-the-edge/

Late last year I announced a preview of [email protected] and talked about how you could use it to intelligently process HTTP requests at locations that are close (latency-wise) to your customers. Developers who applied and gained access to the preview have been making good use of it, and have provided us with plenty of very helpful feedback. During the preview we added the ability to generate HTTP responses and support for CloudWatch Logs, and also updated our roadmap based on the feedback.

Now Generally Available
Today I am happy to announce that [email protected] is now generally available! You can use it to:

  • Inspect cookies and rewrite URLs to perform A/B testing.
  • Send specific objects to your users based on the User-Agent header.
  • Implement access control by looking for specific headers before passing requests to the origin.
  • Add, drop, or modify headers to direct users to different cached objects.
  • Generate new HTTP responses.
  • Cleanly support legacy URLs.
  • Modify or condense headers or URLs to improve cache utilization.
  • Make HTTP requests to other Internet resources and use the results to customize responses.

[email protected] allows you to create web-based user experiences that are rich and personal. As is rapidly becoming the norm in today’s world, you don’t need to provision or manage any servers. You simply upload your code (Lambda functions written in Node.js) and pick one of the CloudFront behaviors that you have created for the distribution, along with the desired CloudFront event:

In this case, my function (the imaginatively named EdgeFunc1) would run in response to origin requests for image/* within the indicated distribution. As you can see, you can run code in response to four different CloudFront events:

Viewer Request – This event is triggered when an event arrives from a viewer (an HTTP client, generally a web browser or a mobile app), and has access to the incoming HTTP request. As you know, each CloudFront edge location maintains a large cache of objects so that it can efficiently respond to repeated requests. This particular event is triggered regardless of whether the requested object is already cached.

Origin Request – This event is triggered when the edge location is about to make a request back to the origin, due to the fact that the requested object is not cached at the edge location. It has access to the request that will be made to the origin (often an S3 bucket or code running on an EC2 instance).

Origin Response – This event is triggered after the origin returns a response to a request. It has access to the response from the origin.

Viewer Response – This is event is triggered before the edge location returns a response to the viewer. It has access to the response.

Functions are globally replicated and requests are automatically routed to the optimal location for execution. You can write your code once and with no overt action on your part, have it be available at low latency to users all over the world.

Your code has full access to requests and responses, including headers, cookies, the HTTP method (GET, HEAD, and so forth), and the URI. Subject to a few restrictions, it can modify existing headers and insert new ones.

[email protected] in Action
Let’s create a simple function that runs in response to the Viewer Request event. I open up the Lambda Console and create a new function. I choose the Node.js 6.10 runtime and search for cloudfront blueprints:

I choose cloudfront-response-generation and configure a trigger to invoke the function:

The Lambda Console provides me with some information about the operating environment for my function:

I enter a name and a description for my function, as usual:

The blueprint includes a fully operational function. It generates a “200” HTTP response and a very simple body:

I used this as the starting point for my own code, which pulls some interesting values from the request and displays them in a table:

'use strict';
exports.handler = (event, context, callback) => {

    /* Set table row style */
    const rs = '"border-bottom:1px solid black;vertical-align:top;"';
    /* Get request */
    const request = event.Records[0].cf.request;
   
    /* Get values from request */ 
    const httpVersion = request.httpVersion;
    const clientIp    = request.clientIp;
    const method      = request.method;
    const uri         = request.uri;
    const headers     = request.headers;
    const host        = headers['host'][0].value;
    const agent       = headers['user-agent'][0].value;
    
    var sreq = JSON.stringify(event.Records[0].cf.request, null, ' ');
    sreq = sreq.replace(/\n/g, '<br/>');

    /* Generate body for response */
    const body = 
     '<html>\n'
     + '<head><title>Hello From [email protected]</title></head>\n'
     + '<body>\n'
     + '<table style="border:1px solid black;background-color:#e0e0e0;border-collapse:collapse;" cellpadding=4 cellspacing=4>\n'
     + '<tr style=' + rs + '><td>Host</td><td>'        + host     + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Agent</td><td>'       + agent    + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Client IP</td><td>'   + clientIp + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Method</td><td>'      + method   + '</td></tr>\n'
     + '<tr style=' + rs + '><td>URI</td><td>'         + uri      + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Raw Request</td><td>' + sreq     + '</td></tr>\n'
     + '</table>\n'
     + '</body>\n'
     + '</html>'

    /* Generate HTTP response */
    const response = {
        status: '200',
        statusDescription: 'HTTP OK',
        httpVersion: httpVersion,
        body: body,
        headers: {
            'vary':          [{key: 'Vary',          value: '*'}],
            'last-modified': [{key: 'Last-Modified', value:'2017-01-13'}]
        },
    };

    callback(null, response);
};

I configure my handler, and request the creation of a new IAM Role with Basic Edge Lambda permissions:

On the next page I confirm my settings (as I would do for a regular Lambda function), and click on Create function:

This creates the function, attaches the trigger to the distribution, and also initiates global replication of the function. The status of my distribution changes to In Progress for the duration of the replication (typically 5 to 8 minutes):

The status changes back to Deployed as soon as the replication completes:

Then I access the root of my distribution (https://dogy9dy9kvj6w.cloudfront.net/), the function runs, and this is what I see:

Feel free to click on the image (it is linked to the root of my distribution) to run my code!

As usual, this is a very simple example and I am sure that you can do a lot better. Here are a few ideas to get you started:

Site Management – You can take an entire dynamic website offline and replace critical pages with [email protected] functions for maintenance or during a disaster recovery operation.

High Volume Content – You can create scoreboards, weather reports, or public safety pages and make them available at the edge, both quickly and cost-effectively.

Create something cool and share it in the comments or in a blog post, and I’ll take a look.

Things to Know
Here are a couple of things to keep in mind as you start to think about how to put [email protected] to use in your application:

Timeouts – Functions that handle Origin Request and Origin Response events must complete within 3 seconds. Functions that handle Viewer Request and Viewer Response events must complete within 1 second.

Versioning – After you update your code in the Lambda Console, you must publish a new version and set up a fresh set of triggers for it, and then wait for the replication to complete. You must always refer to your code using a version number; $LATEST and aliases do not apply.

Headers – As you can see from my code, the HTTP request headers are accessible as an array. The headers fall in to four categories:

  • Accessible – Can be read, written, deleted, or modified.
  • Restricted – Must be passed on to the origin.
  • Read-only – Can be read, but not modified in any way.
  • Blacklisted – Not seen by code, and cannot be added.

Runtime Environment – The runtime environment provides each function with 128 MB of memory, but no builtin libraries or access to /tmp.

Web Service Access – Functions that handle Origin Request and Origin Response events must complete within 3 seconds can access the AWS APIs and fetch content via HTTP. These requests are always made synchronously with request to the original request or response.

Function Replication – As I mentioned earlier, your functions will be globally replicated. The replicas are visible in the “other” regions from the Lambda Console:

CloudFront – Everything that you already know about CloudFront and CloudFront behaviors is relevant to [email protected]. You can use multiple behaviors (each with up to four [email protected] functions) from each behavior, customize header & cookie forwarding, and so forth. You can also make the association between events and functions (via ARNs that include function versions) while you are editing a behavior:

Available Now
[email protected] is available now and you can start using it today. Pricing is based on the number of times that your functions are invoked and the amount of time that they run (see the [email protected] Pricing page for more info).

Jeff;

 

Prepare for the OWASP Top 10 Web Application Vulnerabilities Using AWS WAF and Our New White Paper

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/prepare-for-the-owasp-top-10-web-application-vulnerabilities-using-aws-waf-and-our-new-white-paper/

Are you aware of the Open Web Application Security Project (OWASP) and the work that they do to improve the security of web applications? Among many other things, they publish a list of the 10 most critical application security flaws, known as the OWASP Top 10. The release candidate for the 2017 version contains a consensus view of common vulnerabilities often found in web sites and web applications.

AWS WAF, as I described in my blog post, New – AWS WAF, helps to protect your application from application-layer attacks such as SQL injection and cross-site scripting. You can create custom rules to define the types of traffic that are accepted or rejected.

Our new white paper, Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities, shows you how to put AWS WAF to use. Going far beyond a simple recommendation to “use WAF,” it includes detailed, concrete mitigation strategies and implementation details for the most important items in the OWASP Top 10 (formally known as A1 through A10):

Download Today
The white paper provides background and context for each vulnerability, and then shows you how to create WAF rules to identify and block them. It also provides some defense-in-depth recommendations, including a very cool suggestion to use [email protected] to prevalidate the parameters supplied to HTTP requests.

The white paper links to a companion AWS CloudFormation template that creates a Web ACL, along with the recommended condition types and rules. You can use this template as a starting point for your own work, adding more condition types and rules as desired.

AWSTemplateFormatVersion: '2010-09-09'
Description: AWS WAF Basic OWASP Example Rule Set

## ::PARAMETERS::
## Template parameters to be configured by user
Parameters:
  stackPrefix:
    Type: String
    Description: The prefix to use when naming resources in this stack. Normally we would use the stack name, but since this template can be us\
ed as a resource in other stacks we want to keep the naming consistent. No symbols allowed.
    ConstraintDescription: Alphanumeric characters only, maximum 10 characters
    AllowedPattern: ^[a-zA-z0-9]+$
    MaxLength: 10
    Default: generic
  stackScope:
    Type: String
    Description: You can deploy this stack at a regional level, for regional WAF targets like Application Load Balancers, or for global targets\
, such as Amazon CloudFront distributions.
    AllowedValues:
      - Global
      - Regional
    Default: Regional
...

Attend our Webinar
If you would like to learn more about the topics discussed in this new white paper, please plan to attend our upcoming webinar, Secure Your Applications with AWS Web Application Firewall (WAF) and AWS Shield. On July 12, 2017, my colleagues Jeffrey Lyon and Sundar Jayashekar will show you how to secure your web applications and how to defend against the most common Layer 7 attacks.

Jeff;

 

 

 

Synchronizing Amazon S3 Buckets Using AWS Step Functions

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/synchronizing-amazon-s3-buckets-using-aws-step-functions/

Constantin Gonzalez is a Principal Solutions Architect at AWS

In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.

My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.

In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.

Step Functions overview

Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.

While this particular example focuses on synchronizing objects between two S3 buckets, it can be generalized to any other use case that involves coordinated processing of any number of objects in S3 buckets, or other, similar data processing patterns.

Bucket replication options

Before I dive into the details on how this particular example works, take a look at some alternatives for copying or replicating data between two Amazon S3 buckets:

  • The AWS CLI provides customers with a powerful aws s3 sync command that can synchronize the contents of one bucket with another.
  • S3DistCP is a powerful tool for users of Amazon EMR that can efficiently load, save, or copy large amounts of data between S3 buckets and HDFS.
  • The S3 cross-region replication functionality enables automatic, asynchronous copying of objects across buckets in different AWS regions.

In this use case, you are looking for a slightly different bucket synchronization solution that:

  • Works within the same region
  • Is more scalable than a CLI approach running on a single machine
  • Doesn’t require managing any servers
  • Uses a more finely grained cost model than the hourly based Amazon EMR approach

You need a scalable, serverless, and customizable bucket synchronization utility.

Solution architecture

Your solution needs to do three things:

  1. Copy all objects from a source bucket into a destination bucket, but leave out objects that are already present, for efficiency.
  2. Delete all "orphaned" objects from the destination bucket that aren’t present on the source bucket, because you don’t want obsolete objects lying around.
  3. Keep track of all objects for #1 and #2, regardless of how many objects there are.

In the beginning, you read in the source and destination buckets as parameters and perform basic parameter validation. Then, you operate two separate, independent loops, one for copying missing objects and one for deleting obsolete objects. Each loop is a sequence of Step Functions states that read in chunks of S3 object lists and use the continuation token to decide in a choice state whether to continue the loop or not.

This solution is based on the following architecture that uses Step Functions, Lambda, and two S3 buckets:

As you can see, this setup involves no servers, just two main building blocks:

  • Step Functions manages the overall flow of synchronizing the objects from the source bucket with the destination bucket.
  • A set of Lambda functions carry out the individual steps necessary to perform the work, such as validating input, getting lists of objects from source and destination buckets, copying or deleting objects in batches, and so on.

To understand the synchronization flow in more detail, look at the Step Functions state machine diagram for this example.

Walkthrough

Here’s a detailed discussion of how this works.

To follow along, use the code in the sync-buckets-state-machine GitHub repo. The code comes with a ready-to-run deployment script in Python that takes care of all the IAM roles, policies, Lambda functions, and of course the Step Functions state machine deployment using AWS CloudFormation, as well as instructions on how to use it.

Fine print: Use at your own risk

Before I start, here are some disclaimers:

  • Educational purposes only.

    The following example and code are intended for educational purposes only. Make sure that you customize, test, and review it on your own before using any of this in production.

  • S3 object deletion.

    In particular, using the code included below may delete objects on S3 in order to perform synchronization. Make sure that you have backups of your data. In particular, consider using the Amazon S3 Versioning feature to protect yourself against unintended data modification or deletion.

Step Functions execution starts with an initial set of parameters that contain the source and destination bucket names in JSON:

{
    "source":       "my-source-bucket-name",
    "destination":  "my-destination-bucket-name"
}

Armed with this data, Step Functions execution proceeds as follows.

Step 1: Detect the bucket region

First, you need to know the regions where your buckets reside. In this case, take advantage of the Step Functions Parallel state. This allows you to use a Lambda function get_bucket_location.py inside two different, parallel branches of task states:

  • FindRegionForSourceBucket
  • FindRegionForDestinationBucket

Each task state receives one bucket name as an input parameter, then detects the region corresponding to "their" bucket. The output of these functions is collected in a result array containing one element per parallel function.

Step 2: Combine the parallel states

The output of a parallel state is a list with all the individual branches’ outputs. To combine them into a single structure, use a Lambda function called combine_dicts.py in its own CombineRegionOutputs task state. The function combines the two outputs from step 1 into a single JSON dict that provides you with the necessary region information for each bucket.

Step 3: Validate the input

In this walkthrough, you only support buckets that reside in the same region, so you need to decide if the input is valid or if the user has given you two buckets in different regions. To find out, use a Lambda function called validate_input.py in the ValidateInput task state that tests if the two regions from the previous step are equal. The output is a Boolean.

Step 4: Branch the workflow

Use another type of Step Functions state, a Choice state, which branches into a Failure state if the comparison in step 3 yields false, or proceeds with the remaining steps if the comparison was successful.

Step 5: Execute in parallel

The actual work is happening in another Parallel state. Both branches of this state are very similar to each other and they re-use some of the Lambda function code.

Each parallel branch implements a looping pattern across the following steps:

  1. Use a Pass state to inject either the string value "source" (InjectSourceBucket) or "destination" (InjectDestinationBucket) into the listBucket attribute of the state document.

    The next step uses either the source or the destination bucket, depending on the branch, while executing the same, generic Lambda function. You don’t need two Lambda functions that differ only slightly. This step illustrates how to use Pass states as a way of injecting constant parameters into your state machine and as a way of controlling step behavior while re-using common step execution code.

  2. The next step UpdateSourceKeyList/UpdateDestinationKeyList lists objects in the given bucket.

    Remember that the previous step injected either "source" or "destination" into the state document’s listBucket attribute. This step uses the same list_bucket.py Lambda function to list objects in an S3 bucket. The listBucket attribute of its input decides which bucket to list. In the left branch of the main parallel state, use the list of source objects to work through copying missing objects. The right branch uses the list of destination objects, to check if they have a corresponding object in the source bucket and eliminate any orphaned objects. Orphans don’t have a source object of the same S3 key.

  3. This step performs the actual work. In the left branch, the CopySourceKeys step uses the copy_keys.py Lambda function to go through the list of source objects provided by the previous step, then copies any missing object into the destination bucket. Its sister step in the other branch, DeleteOrphanedKeys, uses its destination bucket key list to test whether each object from the destination bucket has a corresponding source object, then deletes any orphaned objects.

  4. The S3 ListObjects API action is designed to be scalable across many objects in a bucket. Therefore, it returns object lists in chunks of configurable size, along with a continuation token. If the API result has a continuation token, it means that there are more objects in this list. You can work from token to token to continue getting object list chunks, until you get no more continuation tokens.

By breaking down large amounts of work into chunks, you can make sure each chunk is completed within the timeframe allocated for the Lambda function, and within the maximum input/output data size for a Step Functions state.

This approach comes with a slight tradeoff: the more objects you process at one time in a given chunk, the faster you are done. There’s less overhead for managing individual chunks. On the other hand, if you process too many objects within the same chunk, you risk going over time and space limits of the processing Lambda function or the Step Functions state so the work cannot be completed.

In this particular case, use a Lambda function that maximizes the number of objects listed from the S3 bucket that can be stored in the input/output state data. This is currently up to 32,768 bytes, assuming (based on some experimentation) that the execution of the COPY/DELETE requests in the processing states can always complete in time.

A more sophisticated approach would use the Step Functions retry/catch state attributes to account for any time limits encountered and adjust the list size accordingly through some list site adjusting.

Step 6: Test for completion

Because the presence of a continuation token in the S3 ListObjects output signals that you are not done processing all objects yet, use a Choice state to test for its presence. If a continuation token exists, it branches into the UpdateSourceKeyList step, which uses the token to get to the next chunk of objects. If there is no token, you’re done. The state machine then branches into the FinishCopyBranch/FinishDeleteBranch state.

By using Choice states like this, you can create loops exactly like the old times, when you didn’t have for statements and used branches in assembly code instead!

Step 7: Success!

Finally, you’re done, and can step into your final Success state.

Lessons learned

When implementing this use case with Step Functions and Lambda, I learned the following things:

  • Sometimes, it is necessary to manipulate the JSON state of a Step Functions state machine with just a few lines of code that hardly seem to warrant their own Lambda function. This is ok, and the cost is actually pretty low given Lambda’s 100 millisecond billing granularity. The upside is that functions like these can be helpful to make the data more palatable for the following steps or for facilitating Choice states. An example here would be the combine_dicts.py function.
  • Pass states can be useful beyond debugging and tracing, they can be used to inject arbitrary values into your state JSON and guide generic Lambda functions into doing specific things.
  • Choice states are your friend because you can build while-loops with them. This allows you to reliably grind through large amounts of data with the patience of an engine that currently supports execution times of up to 1 year.

    Currently, there is an execution history limit of 25,000 events. Each Lambda task state execution takes up 5 events, while each choice state takes 2 events for a total of 7 events per loop. This means you can loop about 3500 times with this state machine. For even more scalability, you can split up work across multiple Step Functions executions through object key sharding or similar approaches.

  • It’s not necessary to spend a lot of time coding exception handling within your Lambda functions. You can delegate all exception handling to Step Functions and instead simplify your functions as much as possible.

  • Step Functions are great replacements for shell scripts. This could have been a shell script, but then I would have had to worry about where to execute it reliably, how to scale it if it went beyond a few thousand objects, etc. Think of Step Functions and Lambda as tools for scripting at a cloud level, beyond the boundaries of servers or containers. "Serverless" here also means "boundary-less".

Summary

This approach gives you scalability by breaking down any number of S3 objects into chunks, then using Step Functions to control logic to work through these objects in a scalable, serverless, and fully managed way.

To take a look at the code or tweak it for your own needs, use the code in the sync-buckets-state-machine GitHub repo.

To see more examples, please visit the Step Functions Getting Started page.

Enjoy!

Visualize and Monitor Amazon EC2 Events with Amazon CloudWatch Events and Amazon Kinesis Firehose

Post Syndicated from Karan Desai original https://aws.amazon.com/blogs/big-data/visualize-and-monitor-amazon-ec2-events-with-amazon-cloudwatch-events-and-amazon-kinesis-firehose/

Monitoring your AWS environment is important for security, performance, and cost control purposes. For example, by monitoring and analyzing API calls made to your Amazon EC2 instances, you can trace security incidents and gain insights into administrative behaviors and access patterns. The kinds of events you might monitor include console logins, Amazon EBS snapshot creation/deletion/modification, VPC creation/deletion/modification, and instance reboots, etc.

In this post, I show you how to build a near real-time API monitoring solution for EC2 events using Amazon CloudWatch Events and Amazon Kinesis Firehose. Please be sure to have Amazon CloudTrail enabled in your account.

  • CloudWatch Events offers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events now supports Kinesis Firehose as a target.
  • Kinesis Firehose is a fully managed service for continuously capturing, transforming, and delivering data in minutes to storage and analytics destinations such as Amazon S3, Amazon Kinesis Analytics, Amazon Redshift, and Amazon Elasticsearch Service.

Walkthrough

For this walkthrough, you create a CloudWatch event rule that matches specific EC2 events such as:

  • Starting, stopping, and terminating an instance
  • Creating and deleting VPC route tables
  • Creating and deleting a security group
  • Creating, deleting, and modifying instance volumes and snapshots

Your CloudWatch event target is a Kinesis Firehose delivery stream that delivers this data to an Elasticsearch cluster, where you set up Kibana for visualization. Using this solution, you can easily load and visualize EC2 events in minutes without setting up complicated data pipelines.

Set up the Elasticsearch cluster

Create the Amazon ES domain in the Amazon ES console, or by using the create-elasticsearch-domain command in the AWS CLI.

This example uses the following configuration:

  • Domain Name: esLogSearch
  • Elasticsearch Version: 1
  • Instance Count: 2
  • Instance type:elasticsearch
  • Enable dedicated master: true
  • Enable zone awareness: true
  • Restrict Amazon ES to an IP-based access policy

Other settings are left as the defaults.

Create a Kinesis Firehose delivery stream

In the Kinesis Firehose console, create a new delivery stream with Amazon ES as the destination. For detailed steps, see Create a Kinesis Firehose Delivery Stream to Amazon Elasticsearch Service.

Set up CloudWatch Events

Create a rule, and configure the event source and target. You can choose to configure multiple event sources with several AWS resources, along with options to specify specific or multiple event types.

In the CloudWatch console, choose Events.

For Service Name, choose EC2.

In Event Pattern Preview, choose Edit and copy the pattern below. For this walkthrough, I selected events that are specific to the EC2 API, but you can modify it to include events for any of your AWS resources.

 

{
	"source": [
		"aws.ec2"
	],
	"detail-type": [
		"AWS API Call via CloudTrail"
	],
	"detail": {
		"eventSource": [
			"ec2.amazonaws.com"
		],
		"eventName": [
			"RunInstances",
			"StopInstances",
			"StartInstances",
			"CreateFlowLogs",
			"CreateImage",
			"CreateNatGateway",
			"CreateVpc",
			"DeleteKeyPair",
			"DeleteNatGateway",
			"DeleteRoute",
			"DeleteRouteTable",
"CreateSnapshot",
"DeleteSnapshot",
			"DeleteVpc",
			"DeleteVpcEndpoints",
			"DeleteSecurityGroup",
			"ModifyVolume",
			"ModifyVpcEndpoint",
			"TerminateInstances"
		]
	}
}

The following screenshot shows what your event looks like in the console.

Next, choose Add target and select the delivery stream that you just created.

Set up Kibana on the Elasticsearch cluster

Amazon ES provides a default installation of Kibana with every Amazon ES domain. You can find the Kibana endpoint on your domain dashboard in the Amazon ES console. You can restrict Amazon ES access to an IP-based access policy.

In the Kibana console, for Index name or pattern, type log. This is the name of the Elasticsearch index.

For Time-field name, choose @time.

To view the events, choose Discover.

The following chart demonstrates the API operations and the number of times that they have been triggered in the past 12 hours.

Summary

In this post, you created a continuous, near real-time solution to monitor various EC2 events such as starting and shutting down instances, creating VPCs, etc. Likewise, you can build a continuous monitoring solution for all the API operations that are relevant to your daily AWS operations and resources.

With Kinesis Firehose as a new target for CloudWatch Events, you can retrieve, transform, and load system events to the storage and analytics destination of your choice in minutes, without setting up complicated data pipelines.

If you have any questions or suggestions, please comment below.


Additional Reading

Learn how to build a serverless architecture to analyze Amazon CloudFront access logs using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

 

 

 

AWS Hot Startups – May 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-may-2017/

April showers bring May startups! This month we have three hot startups for you to check out. Keep reading to find out what they’re up to, and how they’re using AWS to do it.

Today’s post features the following startups:

  • Lobster – an AI-powered platform connecting creative social media users to professionals.
  • Visii – helping consumers find the perfect product using visual search.
  • Tiqets – a curated marketplace for culture and entertainment.

Lobster (London, England)

Every day, social media users generate billions of authentic images and videos to rival typical stock photography. Powered by Artificial Intelligence, Lobster enables brands, agencies, and the press to license visual content directly from social media users so they can find that piece of content that perfectly fits their brand or story. Lobster does the work of sorting through major social networks (Instagram, Flickr, Facebook, Vk, YouTube, and Vimeo) and cloud storage providers (Dropbox, Google Photos, and Verizon) to find media, saving brands and agencies time and energy. Using filters like gender, color, age, and geolocation can help customers find the unique content they’re looking for, while Lobster’s AI and visual recognition finds images instantly. Lobster also runs photo challenges to help customers discover the perfect image to fit their needs.

Lobster is an excellent platform for creative people to get their work discovered while also protecting their content. Users are treated as copyright holders and earn 75% of the final price of every sale. The platform is easy to use: new users simply sign in with an existing social media or cloud account and can start showcasing their artistic talent right away. Lobster allows users to connect to any number of photo storage sources so they’re able to choose which items to share and which to keep private. Once users have selected their favorite photos and videos to share, they can sit back and watch as their work is picked to become the signature for a new campaign or featured on a cool website – and start earning money for their work.

Lobster is using a variety of AWS services to keep everything running smoothly. The company uses Amazon S3 to store photography that was previously ordered by customers. When a customer purchases content, the respective piece of content must be available at any given moment, independent from the original source. Lobster is also using Amazon EC2 for its application servers and Elastic Load Balancing to monitor the state of each server.

To learn more about Lobster, check them out here!

Visii (London, England)

In today’s vast web, a growing number of products are being sold online and searching for something specific can be difficult. Visii was created to cater to businesses and help them extract value from an asset they already have – their images. Their SaaS platform allows clients to leverage an intelligent visual search on their websites and apps to help consumers find the perfect product for them. With Visii, consumers can choose an image and immediately discover more based on their tastes and preferences. Whether it’s clothing, artwork, or home decor, Visii will make recommendations to get consumers to search visually and subsequently help businesses increase their conversion rates.

There are multiple ways for businesses to integrate Visii on their website or app. Many of Visii’s clients choose to build against their API, but Visii also work closely with many clients to figure out the most effective way to do this for each unique case. This has led Visii to help build innovative user interfaces and figure out the best integration points to get consumers to search visually. Businesses can also integrate Visii on their website with a widget – they just need to provide a list of links to their products and Visii does the rest.

Visii runs their entire infrastructure on AWS. Their APIs and pipeline all sit in auto-scaling groups, with ELBs in front of them, sending things across into Amazon Simple Queue Service and Amazon Aurora. Recently, Visii moved from Amazon RDS to Aurora and noted that the process was incredibly quick and easy. Because they make heavy use of machine learning, it is crucial that their pipeline only runs when required and that they maximize the efficiency of their uptime.

To see how companies are using Visii, check out Style Picker and Saatchi Art.

Tiqets (Amsterdam, Netherlands)

Tiqets is making the ticket-buying experience faster and easier for travelers around the world.  Founded in 2013, Tiqets is one of the leading curated marketplaces for admission tickets to museums, zoos, and attractions. Their mission is to help travelers get the most out of their trips by helping them find and experience a city’s culture and entertainment. Tiqets partners directly with vendors to adapt to a customer’s specific needs, and is now active in over 30 cities in the US, Europe, and the Middle East.

With Tiqets, travelers can book tickets either ahead of time or at their destination for a wide range of attractions. The Tiqets app provides real-time availability and delivers tickets straight to customer’s phones via email, direct download, or in the app. Customers save time skipping long lines (a perk of the app!), save trees (don’t need to physically print tickets), and most importantly, they can make the most out of their leisure time. For each attraction featured on Tiqets, there is a lot of helpful information including best modes of transportation, hours, commonly asked questions, and reviews from other customers.

The Tiqets platform consists of the consumer-facing website, the internal and external-facing APIs, and the partner self-service portals. For the app hosting and infrastructure, Tiqets uses AWS services such as Elastic Load Balancing, Amazon EC2, Amazon RDS, Amazon CloudFront, Amazon Route 53, and Amazon ElastiCache. Through the infrastructure orchestration of their AWS configuration, they can easily set up separate development or test environments while staying close to the production environment as well.

Tiqets is hiring! Be sure to check out their jobs page if you are interested in joining the Tiqets team.

Thanks for reading and don’t forget to check out April’s Hot Startups if you missed it.

-Tina Barr

 

 

Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

Post Syndicated from Rajeev Srinivasan original https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/

Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe.

The insights gained by analysis of Amazon CloudFront access logs helps improve website availability through bot detection and mitigation, optimizing web content based on the devices and browser used to view your webpages, reducing perceived latency by caching of popular object closer to its viewer, and so on. This results in a significant improvement in the overall perceived experience for the user.

This blog post provides a way to build a serverless architecture to generate some of these insights. To do so, we analyze Amazon CloudFront access logs both at rest and in transit through the stream. This serverless architecture uses Amazon Athena to analyze large volumes of CloudFront access logs (on the scale of terabytes per day), and Amazon Kinesis Analytics for streaming analysis.

The analytic queries in this blog post focus on three common use cases:

  1. Detection of common bots using the user agent string
  2. Calculation of current bandwidth usage per Amazon CloudFront distribution per edge location
  3. Determination of the current top 50 viewers

However, you can easily extend the architecture described to power dashboards for monitoring, reporting, and trigger alarms based on deeper insights gained by processing and analyzing the logs. Some examples are dashboards for cache performance, usage and viewer patterns, and so on.

Following we show a diagram of this architecture.

Prerequisites

Before you set up this architecture, install the AWS Command Line Interface (AWS CLI) tool on your local machine, if you don’t have it already.

Setup summary

The following steps are involved in setting up the serverless architecture on the AWS platform:

  1. Create an Amazon S3 bucket for your Amazon CloudFront access logs to be delivered to and stored in.
  2. Create a second Amazon S3 bucket to receive processed logs and store the partitioned data for interactive analysis.
  3. Create an Amazon Kinesis Firehose delivery stream to batch, compress, and deliver the preprocessed logs for analysis.
  4. Create an AWS Lambda function to preprocess the logs for analysis.
  5. Configure Amazon S3 event notification on the CloudFront access logs bucket, which contains the raw logs, to trigger the Lambda preprocessing function.
  6. Create an Amazon DynamoDB table to look up partition details, such as partition specification and partition location.
  7. Create an Amazon Athena table for interactive analysis.
  8. Create a second AWS Lambda function to add new partitions to the Athena table based on the log delivered to the processed logs bucket.
  9. Configure Amazon S3 event notification on the processed logs bucket to trigger the Lambda partitioning function.
  10. Configure Amazon Kinesis Analytics application for analysis of the logs directly from the stream.

ETL and preprocessing

In this section, we parse the CloudFront access logs as they are delivered, which occurs multiple times in an hour. We filter out commented records and use the user agent string to decipher the browser name, the name of the operating system, and whether the request has been made by a bot. For more details on how to decipher the preceding information based on the user agent string, see user-agents 1.1.0 in the Python documentation.

We use the Lambda preprocessing function to perform these tasks on individual rows of the access log. On successful completion, the rows are pushed to an Amazon Kinesis Firehose delivery stream to be persistently stored in an Amazon S3 bucket, the processed logs bucket.

To create a Firehose delivery stream with a new or existing S3 bucket as the destination, follow the steps described in Create a Firehose Delivery Stream to Amazon S3 in the S3 documentation. Keep most of the default settings, but select an AWS Identity and Access Management (IAM) role that has write access to your S3 bucket and specify GZIP compression. Name the delivery stream CloudFrontLogsToS3.

Another pre-requisite for this setup is to create an IAM role that provides the necessary permissions our AWS Lambda function to get the data from S3, process it, and deliver it to the CloudFrontLogsToS3 delivery stream.

Let’s use the AWS CLI to create the IAM role using the following the steps:

  1. Create the IAM policy (lambda-exec-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (lambda-cflogs-exec-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To download the policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/lambda-exec-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/assume-lambda-policy.json  <path_on_your_local_machine>

Following is the lambda-exec-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "FirehoseAccess",
            "Effect": "Allow",
            "Action": [
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3"
            ]
        }
    ]
}

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-exec-policy --policy-document file://<path>/lambda-exec-policy.json

To create the AWS Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Following is the assume-lambda-policy.json file, to grant Lambda permission to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To attach the policy (lambda-exec-policy) created to the AWS Lambda execution role (lambda-cflogs-exec-role), type the following command.

aws iam attach-role-policy --role-name lambda-cflogs-exec-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-exec-policy

Now that we have created the CloudFrontLogsToS3 Firehose delivery stream and the lambda-cflogs-exec-role IAM role for Lambda, the next step is to create a Lambda preprocessing function.

This Lambda preprocessing function parses the CloudFront access logs delivered into the S3 bucket and performs a few transformation and mapping operations on the data. The Lambda function adds descriptive information, such as the browser and the operating system that were used to make this request based on the user agent string found in the logs. The Lambda function also adds information about the web distribution to support scenarios where CloudFront access logs are delivered to a centralized S3 bucket from multiple distributions. With the solution in this blog post, you can get insights across distributions and their edge locations.

Use the Lambda Management Console to create a new Lambda function with a Python 2.7 runtime and the s3-get-object-python blueprint. Open the console, and on the Configure triggers page, choose the name of the S3 bucket where the CloudFront access logs are delivered. Choose Put for Event type. For Prefix, type the name of the prefix, if any, for the folder where CloudFront access logs are delivered, for example cloudfront-logs/. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable trigger.

Choose Next and provide a function name to identify this Lambda preprocessing function.

For Code entry type, choose Upload a file from Amazon S3. For S3 link URL, type https.amazonaws.com//preprocessing-lambda/pre-data.zip. In the section, also create an environment variable with the key KINESIS_FIREHOSE_STREAM and a value with the name of the Firehose delivery stream as CloudFrontLogsToS3.

Choose lambda-cflogs-exec-role as the IAM role for the Lambda function, and type prep-data.lambda_handler for the value for Handler.

Choose Next, and then choose Create Lambda.

Table creation in Amazon Athena

In this step, we will build the Athena table. Use the Athena console in the same region and create the table using the query editor.

CREATE EXTERNAL TABLE IF NOT EXISTS cf_logs (
  logdate date,
  logtime string,
  location string,
  bytes bigint,
  requestip string,
  method string,
  host string,
  uri string,
  status bigint,
  referrer string,
  useragent string,
  uriquery string,
  cookie string,
  resulttype string,
  requestid string,
  header string,
  csprotocol string,
  csbytes string,
  timetaken bigint,
  forwardedfor string,
  sslprotocol string,
  sslcipher string,
  responseresulttype string,
  protocolversion string,
  browserfamily string,
  osfamily string,
  isbot string,
  filename string,
  distribution string
)
PARTITIONED BY(year string, month string, day string, hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://<pre-processing-log-bucket>/prefix/';

Creation of the Athena partition

A popular website with millions of requests each day routed using Amazon CloudFront can generate a large volume of logs, on the order of a few terabytes a day. We strongly recommend that you partition your data to effectively restrict the amount of data scanned by each query. Partitioning significantly improves query performance and substantially reduces cost. The Lambda partitioning function adds the partition information to the Athena table for the data delivered to the preprocessed logs bucket.

Before delivering the preprocessed Amazon CloudFront logs file into the preprocessed logs bucket, Amazon Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH. This approach supports multilevel partitioning of the data by year, month, date, and hour. You can invoke the Lambda partitioning function every time a new processed Amazon CloudFront log is delivered to the preprocessed logs bucket. To do so, configure the Lambda partitioning function to be triggered by an S3 Put event.

For a website with millions of requests, a large number of preprocessed logs can be delivered multiple times in an hour—for example, at the interval of one each second. To avoid querying the Athena table for partition information every time a preprocessed log file is delivered, you can create an Amazon DynamoDB table for fast lookup.

Based on the year, month, data and hour in the prefix of the delivered log, the Lambda partitioning function checks if the partition specification exists in the Amazon DynamoDB table. If it doesn’t, it’s added to the table using an atomic operation, and then the Athena table is updated.

Type the following command to create the Amazon DynamoDB table.

aws dynamodb create-table --table-name athenapartitiondetails \
--attribute-definitions AttributeName=PartitionSpec,AttributeType=S \
--key-schema AttributeName=PartitionSpec,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

Here the following is true:

  • PartitionSpec is the hash key and is a representation of the partition signature—for example, year=”2017”; month=”05”; day=”15”; hour=”10”.
  • Depending on the rate at which the processed log files are delivered to the processed log bucket, you might have to increase the ReadCapacityUnits and WriteCapacityUnits values, if these are throttled.

The other attributes besides PartitionSpec are the following:

  • PartitionPath – The S3 path associated with the partition.
  • PartitionType – The type of partition used (Hour, Month, Date, Year, or ALL). In this case, ALL is used.

Next step is to create the IAM role to provide permissions for the Lambda partitioning function. You require permissions to do the following:

  1. Look up and write partition information to DynamoDB.
  2. Alter the Athena table with new partition information.
  3. Perform Amazon CloudWatch logs operations.
  4. Perform Amazon S3 operations.

To download the policy document to your local machine, type following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/lambda-partition-function-execution-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/assume-lambda-policy.json <path_on_your_local_machine>

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(lambda-partition-exec-policy) used by the Lambda execution role.
  2. Create the Lambda execution role (lambda-partition-execution-role)and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-partition-exec-policy --policy-document file://<path>/lambda-partition-function-execution-policy.json

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-partition-execution-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

To attach the policy (lambda-partition-exec-policy) created to the AWS Lambda execution role (lambda-partition-execution-role), type the following command.

aws iam attach-role-policy --role-name lambda-partition-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-partition-exec-policy

Following is the lambda-partition-function-execution-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
            	"Sid": "DDBTableAccess",
            	"Effect": "Allow",
            	"Action": "dynamodb:PutItem"
            	"Resource": "arn:aws:dynamodb*:*:table/athenapartitiondetails"
        	},
        	{
            	"Sid": "S3Access",
            	"Effect": "Allow",
            	"Action": [
                		"s3:GetBucketLocation",
                		"s3:GetObject",
                		"s3:ListBucket",
                		"s3:ListBucketMultipartUploads",
                		"s3:ListMultipartUploadParts",
                		"s3:AbortMultipartUpload",
                		"s3:PutObject"
            	],
          		"Resource":"arn:aws:s3:::*"
		},
	              {
		      "Sid": "AthenaAccess",
      		"Effect": "Allow",
      		"Action": [ "athena:*" ],
      		"Resource": [ "*" ]
	      },
        	{
            	"Sid": "CloudWatchLogsAccess",
            	"Effect": "Allow",
            	"Action": [
                		"logs:CreateLogGroup",
                		"logs:CreateLogStream",
             	   	"logs:PutLogEvents"
            	],
            	"Resource": "arn:aws:logs:*:*:*"
        	}
    ]
}

Download the .jar file containing the Java deployment package to your local machine.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/aws-lambda-athena-1.0.0.jar <path_on_your_local_machine>

From the AWS Management Console, create a new Lambda function with Java8 as the runtime. Select the Blank Function blueprint.

On the Configure triggers page, choose the name of the S3 bucket where the preprocessed logs are delivered. Choose Put for the Event Type. For Prefix, type the name of the prefix folder, if any, where preprocessed logs are delivered by Firehose—for example, out/. For Suffix, type the name of the compression format that the Firehose stream (CloudFrontLogToS3) delivers the preprocessed logs —for example, gz. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable Trigger.

Choose Next and provide a function name to identify this Lambda partitioning function.

Choose Java8 for Runtime for the AWS Lambda function. Choose Upload a .ZIP or .JAR file for the Code entry type, and choose Upload to upload the downloaded aws-lambda-athena-1.0.0.jar file.

Next, create the following environment variables for the Lambda function:

  • TABLE_NAME – The name of the Athena table (for example, cf_logs).
  • PARTITION_TYPE – The partition to be created based on the Athena table for the logs delivered to the sub folders in S3 bucket based on Year, Month, Date, Hour, or Set this to ALL to use Year, Month, Date, and Hour.
  • DDB_TABLE_NAME – The name of the DynamoDB table holding partition information (for example, athenapartitiondetails).
  • ATHENA_REGION – The current AWS Region for the Athena table to construct the JDBC connection string.
  • S3_STAGING_DIR – The Amazon S3 location where your query output is written. The JDBC driver asks Athena to read the results and provide rows of data back to the user (for example, s3://<bucketname>/<folder>/).

To configure the function handler and IAM, for Handler copy and paste the name of the handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3EventWithDDB::handleRequest. Choose the existing IAM role, lambda-partition-execution-role.

Choose Next and then Create Lambda.

Interactive analysis using Amazon Athena

In this section, we analyze the historical data that’s been collected since we added the partitions to the Amazon Athena table for data delivered to the preprocessing logs bucket.

Scenario 1 is robot traffic by edge location.

SELECT COUNT(*) AS ct, requestip, location FROM cf_logs
WHERE isbot='True'
GROUP BY requestip, location
ORDER BY ct DESC;

Scenario 2 is total bytes transferred per distribution for each edge location for your website.

SELECT distribution, location, SUM(bytes) as totalBytes
FROM cf_logs
GROUP BY location, distribution;

Scenario 3 is the top 50 viewers of your website.

SELECT requestip, COUNT(*) AS ct  FROM cf_logs
GROUP BY requestip
ORDER BY ct DESC;

Streaming analysis using Amazon Kinesis Analytics

In this section, you deploy a stream processing application using Amazon Kinesis Analytics to analyze the preprocessed Amazon CloudFront log streams. This application analyzes directly from the Amazon Kinesis Stream as it is delivered to the preprocessing logs bucket. The stream queries in section are focused on gaining the following insights:

  • The IP address of the bot, identified by its Amazon CloudFront edge location, that is currently sending requests to your website. The query also includes the total bytes transferred as part of the response.
  • The total bytes served per distribution per population for your website.
  • The top 10 viewers of your website.

To download the firehose-access-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/firehose-access-policy.json  <path_on_your_local_machine>

To download the kinesisanalytics-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/assume-kinesisanalytics-policy.json <path_on_your_local_machine>

Before we create the Amazon Kinesis Analytics application, we need to create the IAM role to provide permission for the analytics application to access Amazon Kinesis Firehose stream.

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(firehose-access-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (ka-execution-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

Following is the firehose-access-policy.json file, which is the IAM policy used by Kinesis Analytics to read Firehose delivery stream.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
    	"Sid": "AmazonFirehoseAccess",
    	"Effect": "Allow",
    	"Action": [
       	"firehose:DescribeDeliveryStream",
        	"firehose:Get*"
    	],
    	"Resource": [
              "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3”
       ]
     }
}

Following is the assume-kinesisanalytics-policy.json file, to grant Amazon Kinesis Analytics permissions to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "kinesisanalytics.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To create the IAM policy used by Analytics access role, type the following command.

aws iam create-policy --policy-name firehose-access-policy --policy-document file://<path>/firehose-access-policy.json

To create the Analytics execution role and assign the service to use this role, type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To attach the policy (irehose-access-policy) created to the Analytics execution role (ka-execution-role), type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To deploy the Analytics application, first download the configuration file and then modify ResourceARN and RoleARN for the Amazon Kinesis Firehose input configuration.

"KinesisFirehoseInput": { 
    "ResourceARN": "arn:aws:firehose:<region>:<account-id>:deliverystream/CloudFrontLogsToS3", 
    "RoleARN": "arn:aws:iam:<account-id>:role/ka-execution-role"
}

To download the Analytics application configuration file, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis//kinesisanalytics/kinesis-analytics-app-configuration.json <path_on_your_local_machine>

To deploy the application, type the following command.

aws kinesisanalytics create-application --application-name "cf-log-analysis" --cli-input-json file://<path>/kinesis-analytics-app-configuration.json

To start the application, type the following command.

aws kinesisanalytics start-application --application-name "cf-log-analysis" --input-configuration Id="1.1",InputStartingPositionConfiguration={InputStartingPosition="NOW"}

SQL queries using Amazon Kinesis Analytics

Scenario 1 is a query for detecting bots for sending request to your website detection for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BOT_DETECTION" (requesttime TIME, destribution VARCHAR(16), requestip VARCHAR(64), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BOT_DETECTION_PUMP" AS INSERT INTO "BOT_DETECTION"
--
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "request_ip" as requestip, 
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
WHERE "is_bot" = true
GROUP BY "request_ip", "edge_location", "distribution_name",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 2 is a query for total bytes transferred per distribution for each edge location for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BYTES_TRANSFFERED" (requesttime TIME, destribution VARCHAR(16), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BYTES_TRANSFFERED_PUMP" AS INSERT INTO "BYTES_TRANSFFERED"
-- Bytes Transffered per second per web destribution by edge location
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
GROUP BY "distribution_name", "edge_location", "request_date",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 3 is a query for the top 50 viewers for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "TOP_TALKERS" (requestip VARCHAR(64), requestcount DOUBLE);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "TOP_TALKERS_PUMP" AS INSERT INTO "TOP_TALKERS"
-- Top Ten Talker
SELECT STREAM ITEM as requestip, ITEM_COUNT as requestcount FROM TABLE(TOP_K_ITEMS_TUMBLING(
  CURSOR(SELECT STREAM * FROM "CF_LOG_STREAM_001"),
  'request_ip', -- name of column in single quotes
  50, -- number of top items
  60 -- tumbling window size in seconds
  )
);

Conclusion

Following the steps in this blog post, you just built an end-to-end serverless architecture to analyze Amazon CloudFront access logs. You analyzed these both in interactive and streaming mode, using Amazon Athena and Amazon Kinesis Analytics respectively.

By creating a partition in Athena for the logs delivered to a centralized bucket, this architecture is optimized for performance and cost when analyzing large volumes of logs for popular websites that receive millions of requests. Here, we have focused on just three common use cases for analysis, sharing the analytic queries as part of the post. However, you can extend this architecture to gain deeper insights and generate usage reports to reduce latency and increase availability. This way, you can provide a better experience on your websites fronted with Amazon CloudFront.

In this blog post, we focused on building serverless architecture to analyze Amazon CloudFront access logs. Our plan is to extend the solution to provide rich visualization as part of our next blog post.


About the Authors

Rajeev Srinivasan is a Senior Solution Architect for AWS. He works very close with our customers to provide big data and NoSQL solution leveraging the AWS platform and enjoys coding . In his spare time he enjoys riding his motorcycle and reading books.

 

Sai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 


Related

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

AWS Online Tech Talks – April 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-2017/

I’m a life-long learner! I try to set aside some time every day to read about or to try something new. I grew up in the 1960’s and 1970’s, back before the Internet existed. As a teenager, access to technical information of any sort was difficult, expensive, and time-consuming. It often involved begging my parents to drive me to the library or riding my bicycle down a narrow road to get to the nearest magazine rack. Today, with so much information at our fingertips, the most interesting challenge is deciding what to study.

As we do every month, we have assembled a set of online tech talks that can fulfill this need for you. Our team has created talks that will provide you with information about the latest AWS services along with the best practices for putting them to use in your environment.

The talks are free, but they do fill up, so you should definitely register ahead of time if you would like to attend. Most of the talks are one hour in length; all times are in the Pacific (PT) time zone. Here’s what we have on tap for the rest of this month:

Monday, April 24
10:30 AM – Modernize Meetings with Amazon Chime.

1:00 PM – Essential Capabilities of an IoT Cloud Platform.

Tuesday, April 25
8:30 AM – Hands On Lab: Windows Workloads on AWS.

Noon – AWS Services Overview & Quarterly Update.

1:30 PM – Scaling to Million of Users in the Blink of an Eye with Amazon CloudFront: Are you Ready?

Wednesday, April 26
9:00 AM – IDC and AWS Joint Webinar: Getting the Most Bang for your Buck with #EC2 #Winning.

10:30 AM – From Monolith to Microservices – Containerized Microservices on AWS.

Noon – Accelerating Mobile Application Development and Delivery with AWS Mobile Hub.

Thursday, April 27
8:30 AM – Hands On Lab: Introduction to Microsoft SQL Server in AWS.

10:30 AM – Applying AWS Organizations to Complex Account Structures.

Noon – Deep Dive on Amazon Cloud Directory.

I will be presenting Tuesday’s service overview – hope to “see” you there.

Jeff;

AWS Hot Startups – March 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-march-2017/

As the madness of March rounds up, take a break from all the basketball and check out the cool startups Tina Barr brings you for this month!

-Ana


The arrival of spring brings five new startups this month:

  • Amino Apps – providing social networks for hundreds of thousands of communities.
  • Appboy – empowering brands to strengthen customer relationships.
  • Arterys – revolutionizing the medical imaging industry.
  • Protenus – protecting patient data for healthcare organizations.
  • Syapse – improving targeted cancer care with shared data from across the country.

In case you missed them, check out February’s hot startups here.

Amino Apps (New York, NY)
Amino Logo
Amino Apps was founded on the belief that interest-based communities were underdeveloped and outdated, particularly when it came to mobile. CEO Ben Anderson and CTO Yin Wang created the app to give users access to hundreds of thousands of communities, each of them a complete social network dedicated to a single topic. Some of the largest communities have over 1 million members and are built around topics like popular TV shows, video games, sports, and an endless number of hobbies and other interests. Amino hosts communities from around the world and is currently available in six languages with many more on the way.

Navigating the Amino app is easy. Simply download the app (iOS or Android), sign up with a valid email address, choose a profile picture, and start exploring. Users can search for communities and join any that fit their interests. Each community has chatrooms, multimedia content, quizzes, and a seamless commenting system. If a community doesn’t exist yet, users can create it in minutes using the Amino Creator and Manager app (ACM). The largest user-generated communities are turned into their own apps, which gives communities their own piece of real estate on members’ phones, as well as in app stores.

Amino’s vast global network of hundreds of thousands of communities is run on AWS services. Every day users generate, share, and engage with an enormous amount of content across hundreds of mobile applications. By leveraging AWS services including Amazon EC2, Amazon RDS, Amazon S3, Amazon SQS, and Amazon CloudFront, Amino can continue to provide new features to their users while scaling their service capacity to keep up with user growth.

Interested in joining Amino? Check out their jobs page here.

Appboy (New York, NY)
In 2011, Bill Magnuson, Jon Hyman, and Mark Ghermezian saw a unique opportunity to strengthen and humanize relationships between brands and their customers through technology. The trio created Appboy to empower brands to build long-term relationships with their customers and today they are the leading lifecycle engagement platform for marketing, growth, and engagement teams. The team recognized that as rapid mobile growth became undeniable, many brands were becoming frustrated with the lack of compelling and seamless cross-channel experiences offered by existing marketing clouds. Many of today’s top mobile apps and enterprise companies trust Appboy to take their marketing to the next level. Appboy manages user profiles for nearly 700 million monthly active users, and is used to power more than 10 billion personalized messages monthly across a multitude of channels and devices.

Appboy creates a holistic user profile that offers a single view of each customer. That user profile in turn powers contextual cross-channel messaging, lifecycle engagement automation, and robust campaign insights and optimization opportunities. Appboy offers solutions that allow brands to create push notifications, targeted emails, in-app and in-browser messages, news feed cards, and webhooks to enhance the user experience and increase customer engagement. The company prides itself on its interoperability, connecting to a variety of complimentary marketing tools and technologies so brands can build the perfect stack to enable their strategies and experiments in real time.

AWS makes it easy for Appboy to dynamically size all of their service components and automatically scale up and down as needed. They use an array of services including Elastic Load Balancing, AWS Lambda, Amazon CloudWatch, Auto Scaling groups, and Amazon S3 to help scale capacity and better deal with unpredictable customer loads.

To keep up with the latest marketing trends and tactics, visit the Appboy digital magazine, Relate. Appboy was also recently featured in the #StartupsOnAir video series where they gave insight into their AWS usage.

Arterys (San Francisco, CA)
Getting test results back from a physician can often be a time consuming and tedious process. Clinicians typically employ a variety of techniques to manually measure medical images and then make their assessments. Arterys founders Fabien Beckers, John Axerio-Cilies, Albert Hsiao, and Shreyas Vasanawala realized that much more computation and advanced analytics were needed to harness all of the valuable information in medical images, especially those generated by MRI and CT scanners. Clinicians were often skipping measurements and making assessments based mostly on qualitative data. Their solution was to start a cloud/AI software company focused on accelerating data-driven medicine with advanced software products for post-processing of medical images.

Arterys’ products provide timely, accurate, and consistent quantification of images, improve speed to results, and improve the quality of the information offered to the treating physician. This allows for much better tracking of a patient’s condition, and thus better decisions about their care. Advanced analytics, such as deep learning and distributed cloud computing, are used to process images. The first Arterys product can contour cardiac anatomy as accurately as experts, but takes only 15-20 seconds instead of the 45-60 minutes required to do it manually. Their computing cloud platform is also fully HIPAA compliant.

Arterys relies on a variety of AWS services to process their medical images. Using deep learning and other advanced analytic tools, Arterys is able to render images without latency over a web browser using AWS G2 instances. They use Amazon EC2 extensively for all of their compute needs, including inference and rendering, and Amazon S3 is used to archive images that aren’t needed immediately, as well as manage costs. Arterys also employs Amazon Route 53, AWS CloudTrail, and Amazon EC2 Container Service.

Check out this quick video about the technology that Arterys is creating. They were also recently featured in the #StartupsOnAir video series and offered a quick demo of their product.

Protenus (Baltimore, MD)
Protenus Logo
Protenus founders Nick Culbertson and Robert Lord were medical students at Johns Hopkins Medical School when they saw first-hand how Electronic Health Record (EHR) systems could be used to improve patient care and share clinical data more efficiently. With increased efficiency came a huge issue – an onslaught of serious security and privacy concerns. Over the past two years, 140 million medical records have been breached, meaning that approximately 1 in 3 Americans have had their health data compromised. Health records contain a repository of sensitive information and a breach of that data can cause major havoc in a patient’s life – namely identity theft, prescription fraud, Medicare/Medicaid fraud, and improper performance of medical procedures. Using their experience and knowledge from former careers in the intelligence community and involvement in a leading hedge fund, Nick and Robert developed the prototype and algorithms that launched Protenus.

Today, Protenus offers a number of solutions that detect breaches and misuse of patient data for healthcare organizations nationwide. Using advanced analytics and AI, Protenus’ health data insights platform understands appropriate vs. inappropriate use of patient data in the EHR. It also protects privacy, aids compliance with HIPAA regulations, and ensures trust for patients and providers alike.

Protenus built and operates its SaaS offering atop Amazon EC2, where Dedicated Hosts and encrypted Amazon EBS volume are used to ensure compliance with HIPAA regulation for the storage of Protected Health Information. They use Elastic Load Balancing and Amazon Route 53 for DNS, enabling unique, secure client specific access points to their Protenus instance.

To learn more about threats to patient data, read Hospitals’ Biggest Threat to Patient Data is Hiding in Plain Sight on the Protenus blog. Also be sure to check out their recent video in the #StartupsOnAir series for more insight into their product.

Syapse (Palo Alto, CA)
Syapse provides a comprehensive software solution that enables clinicians to treat patients with precision medicine for targeted cancer therapies — treatments that are designed and chosen using genetic or molecular profiling. Existing hospital IT doesn’t support the robust infrastructure and clinical workflows required to treat patients with precision medicine at scale, but Syapse centralizes and organizes patient data to clinicians at the point of care. Syapse offers a variety of solutions for oncologists that allow them to access the full scope of patient data longitudinally, view recommended treatments or clinical trials for similar patients, and track outcomes over time. These solutions are helping health systems across the country to improve patient outcomes by offering the most innovative care to cancer patients.

Leading health systems such as Stanford Health Care, Providence St. Joseph Health, and Intermountain Healthcare are using Syapse to improve patient outcomes, streamline clinical workflows, and scale their precision medicine programs. A group of experts known as the Molecular Tumor Board (MTB) reviews complex cases and evaluates patient data, documents notes, and disseminates treatment recommendations to the treating physician. Syapse also provides reports that give health system staff insight into their institution’s oncology care, which can be used toward quality improvement, business goals, and understanding variables in the oncology service line.

Syapse uses Amazon Virtual Private Cloud, Amazon EC2 Dedicated Instances, and Amazon Elastic Block Store to build a high-performance, scalable, and HIPAA-compliant data platform that enables health systems to make precision medicine part of routine cancer care for patients throughout the country.

Be sure to check out the Syapse blog to learn more and also their recent video on the #StartupsOnAir video series where they discuss their product, HIPAA compliance, and more about how they are using AWS.

Thank you for checking out another month of awesome hot startups!

-Tina Barr

 

New – AWS Resource Tagging API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-resource-tagging-api/

AWS customers frequently use tags to organize their Amazon EC2 instances, Amazon EBS volumes, Amazon S3 buckets, and other resources. Over the past couple of years we have been working to make tagging more useful and more powerful. For example, we have added support for tagging during Auto Scaling, the ability to use up to 50 tags per resource, console-based support for the creation of resources that share a common tag (also known as resource groups), and the option to use Config Rules to enforce the use of tags.

As customers grow to the point where they are managing thousands of resources, each with up to 50 tags, they have been looking to us for additional tooling and options to simplify their work. Today I am happy to announce that our new Resource Tagging API is now available. You can use these APIs from the AWS SDKs or via the AWS Command Line Interface (CLI). You now have programmatic access to the same resource group operations that had been accessible only from the AWS Management Console.

Recap: Console-Based Resource Group Operations
Before I get in to the specifics of the new API functions, I thought you would appreciate a fresh look at the console-based grouping and tagging model. I already have the ability to find and then tag AWS resources using a search that spans one or more regions. For example, I can select a long list of regions and then search them for my EC2 instances like this:

After I locate and select all of the desired resources, I can add a new tag key by clicking Create a new tag key and entering the desired tag key:

Then I enter a value for each instance (the new ProjectCode column):

Then I can create a resource group that contains all of the resources that are tagged with P100:

After I have created the resource group, I can locate all of the resources by clicking on the Resource Groups menu:

To learn more about this feature, read Resource Groups and Tagging for AWS.

New API for Resource Tagging
The API that we are announcing today gives you power to tag, untag, and locate resources using tags, all from your own code. With these new API functions, you are now able to operate on multiple resource types with a single set of functions.

Here are the new functions:

TagResources – Add tags to up to 20 resources at a time.

UntagResources – Remove tags from up to 20 resources at a time.

GetResources – Get a list of resources, with optional filtering by tags and/or resource types.

GetTagKeys – Get a list of all of the unique tag keys used in your account.

GetTagValues – Get all tag values for a specified tag key.

These functions support the following AWS services and resource types:

AWS Service Resource Types
Amazon CloudFront Distribution.
Amazon EC2 AMI, Customer Gateway, DHCP Option, EBS Volume, Instance, Internet Gateway, Network ACL, Network Interface, Reserved Instance, Reserved Instance Listing, Route Table, Security Group – EC2 Classic, Security Group – VPC, Snapshot, Spot Batch, Spot Instance Request, Spot Instance, Subnet, Virtual Private Gateway, VPC, VPN Connection.
Amazon ElastiCache Cluster, Snapshot.
Amazon Elastic File System Filesystem.
Amazon Elasticsearch Service Domain.
Amazon EMR Cluster.
Amazon Glacier Vault.
Amazon Inspector Assessment.
Amazon Kinesis Stream.
Amazon Machine Learning Batch Prediction, Data Source, Evaluation, ML Model.
Amazon Redshift Cluster.
Amazon Relational Database Service DB Instance, DB Option Group, DB Parameter Group, DB Security Group, DB Snapshot, DB Subnet Group, Event Subscription, Read Replica, Reserved DB Instance.
Amazon Route 53 Domain, Health Check, Hosted Zone.
Amazon S3 Bucket.
Amazon WorkSpaces WorkSpace.
AWS Certificate Manager Certificate.
AWS CloudHSM HSM.
AWS Directory Service Directory.
AWS Storage Gateway Gateway, Virtual Tape, Volume.
Elastic Load Balancing Load Balancer, Target Group.

Things to Know
Here are a couple of things to keep in mind when you build code or write scripts that use the new API functions or the CLI equivalents:

Compatibility – The older, service-specific functions remain available and you can continue to use them.

Write Permission – The new tagging API adds another layer of permission on top of existing policies that are specific to a single AWS service. For example, you will need to have access to tag:tagResources and EC2:createTags in order to add a tag to an EC2 instance.

Read Permission – You will need to have access to tag:GetResources, tag:GetTagKeys, and tag:GetTagValues in order to call functions that access tags and tag values.

Pricing – There is no charge for the use of these functions or for tags.

Available Now
The new functions are supported by the latest versions of the AWS SDKs. You can use them to tag and access resources in all commercial AWS regions.

Jeff;

 

In Case You Missed These: AWS Security Blog Posts from January, February, and March

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/in-case-you-missed-these-aws-security-blog-posts-from-january-february-and-march/

Image of lock and key

In case you missed any AWS Security Blog posts published so far in 2017, they are summarized and linked to below. The posts are shown in reverse chronological order (most recent first), and the subject matter ranges from protecting dynamic web applications against DDoS attacks to monitoring AWS account configuration changes and API calls to Amazon EC2 security groups.

March

March 22: How to Help Protect Dynamic Web Applications Against DDoS Attacks by Using Amazon CloudFront and Amazon Route 53
Using a content delivery network (CDN) such as Amazon CloudFront to cache and serve static text and images or downloadable objects such as media files and documents is a common strategy to improve webpage load times, reduce network bandwidth costs, lessen the load on web servers, and mitigate distributed denial of service (DDoS) attacks. AWS WAF is a web application firewall that can be deployed on CloudFront to help protect your application against DDoS attacks by giving you control over which traffic to allow or block by defining security rules. When users access your application, the Domain Name System (DNS) translates human-readable domain names (for example, www.example.com) to machine-readable IP addresses (for example, 192.0.2.44). A DNS service, such as Amazon Route 53, can effectively connect users’ requests to a CloudFront distribution that proxies requests for dynamic content to the infrastructure hosting your application’s endpoints. In this blog post, I show you how to deploy CloudFront with AWS WAF and Route 53 to help protect dynamic web applications (with dynamic content such as a response to user input) against DDoS attacks. The steps shown in this post are key to implementing the overall approach described in AWS Best Practices for DDoS Resiliency and enable the built-in, managed DDoS protection service, AWS Shield.

March 21: New AWS Encryption SDK for Python Simplifies Multiple Master Key Encryption
The AWS Cryptography team is happy to announce a Python implementation of the AWS Encryption SDK. This new SDK helps manage data keys for you, and it simplifies the process of encrypting data under multiple master keys. As a result, this new SDK allows you to focus on the code that drives your business forward. It also provides a framework you can easily extend to ensure that you have a cryptographic library that is configured to match and enforce your standards. The SDK also includes ready-to-use examples. If you are a Java developer, you can refer to this blog post to see specific Java examples for the SDK. In this blog post, I show you how you can use the AWS Encryption SDK to simplify the process of encrypting data and how to protect your encryption keys in ways that help improve application availability by not tying you to a single region or key management solution.

March 21: Updated CJIS Workbook Now Available by Request
The need for guidance when implementing Criminal Justice Information Services (CJIS)–compliant solutions has become of paramount importance as more law enforcement customers and technology partners move to store and process criminal justice data in the cloud. AWS services allow these customers to easily and securely architect a CJIS-compliant solution when handling criminal justice data, creating a durable, cost-effective, and secure IT infrastructure that better supports local, state, and federal law enforcement in carrying out their public safety missions. AWS has created several documents (collectively referred to as the CJIS Workbook) to assist you in aligning with the FBI’s CJIS Security Policy. You can use the workbook as a framework for developing CJIS-compliant architecture in the AWS Cloud. The workbook helps you define and test the controls you operate, and document the dependence on the controls that AWS operates (compute, storage, database, networking, regions, Availability Zones, and edge locations).

March 9: New Cloud Directory API Makes It Easier to Query Data Along Multiple Dimensions
Today, we made available a new Cloud Directory API, ListObjectParentPaths, that enables you to retrieve all available parent paths for any directory object across multiple hierarchies. Use this API when you want to fetch all parent objects for a specific child object. The order of the paths and objects returned is consistent across iterative calls to the API, unless objects are moved or deleted. In case an object has multiple parents, the API allows you to control the number of paths returned by using a paginated call pattern. In this blog post, I use an example directory to demonstrate how this new API enables you to retrieve data across multiple dimensions to implement powerful applications quickly.

March 8: How to Access the AWS Management Console Using AWS Microsoft AD and Your On-Premises Credentials
AWS Directory Service for Microsoft Active Directory, also known as AWS Microsoft AD, is a managed Microsoft Active Directory (AD) hosted in the AWS Cloud. Now, AWS Microsoft AD makes it easy for you to give your users permission to manage AWS resources by using on-premises AD administrative tools. With AWS Microsoft AD, you can grant your on-premises users permissions to resources such as the AWS Management Console instead of adding AWS Identity and Access Management (IAM) user accounts or configuring AD Federation Services (AD FS) with Security Assertion Markup Language (SAML). In this blog post, I show how to use AWS Microsoft AD to enable your on-premises AD users to sign in to the AWS Management Console with their on-premises AD user credentials to access and manage AWS resources through IAM roles.

March 7: How to Protect Your Web Application Against DDoS Attacks by Using Amazon Route 53 and an External Content Delivery Network
Distributed Denial of Service (DDoS) attacks are attempts by a malicious actor to flood a network, system, or application with more traffic, connections, or requests than it is able to handle. To protect your web application against DDoS attacks, you can use AWS Shield, a DDoS protection service that AWS provides automatically to all AWS customers at no additional charge. You can use AWS Shield in conjunction with DDoS-resilient web services such as Amazon CloudFront and Amazon Route 53 to improve your ability to defend against DDoS attacks. Learn more about architecting for DDoS resiliency by reading the AWS Best Practices for DDoS Resiliency whitepaper. You also have the option of using Route 53 with an externally hosted content delivery network (CDN). In this blog post, I show how you can help protect the zone apex (also known as the root domain) of your web application by using Route 53 to perform a secure redirect to prevent discovery of your application origin.

Image of lock and key

February

February 27: Now Generally Available – AWS Organizations: Policy-Based Management for Multiple AWS Accounts
Today, AWS Organizations moves from Preview to General Availability. You can use Organizations to centrally manage multiple AWS accounts, with the ability to create a hierarchy of organizational units (OUs). You can assign each account to an OU, define policies, and then apply those policies to an entire hierarchy, specific OUs, or specific accounts. You can invite existing AWS accounts to join your organization, and you can also create new accounts. All of these functions are available from the AWS Management Console, the AWS Command Line Interface (CLI), and through the AWS Organizations API.To read the full AWS Blog post about today’s launch, see AWS Organizations – Policy-Based Management for Multiple AWS Accounts.

February 23: s2n Is Now Handling 100 Percent of SSL Traffic for Amazon S3
Today, we’ve achieved another important milestone for securing customer data: we have replaced OpenSSL with s2n for all internal and external SSL traffic in Amazon Simple Storage Service (Amazon S3) commercial regions. This was implemented with minimal impact to customers, and multiple means of error checking were used to ensure a smooth transition, including client integration tests, catching potential interoperability conflicts, and identifying memory leaks through fuzz testing.

February 22: Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials. IAM roles for EC2 make it easier for your applications to make API requests securely from an instance because they do not require you to manage AWS security credentials that the applications use. Recently, we enabled you to use temporary security credentials for your applications by attaching an IAM role to an existing EC2 instance by using the AWS CLI and SDK. To learn more, see New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI. Starting today, you can attach an IAM role to an existing EC2 instance from the EC2 console. You can also use the EC2 console to replace an IAM role attached to an existing instance. In this blog post, I will show how to attach an IAM role to an existing EC2 instance from the EC2 console.

February 22: How to Audit Your AWS Resources for Security Compliance by Using Custom AWS Config Rules
AWS Config Rules enables you to implement security policies as code for your organization and evaluate configuration changes to AWS resources against these policies. You can use Config rules to audit your use of AWS resources for compliance with external compliance frameworks such as CIS AWS Foundations Benchmark and with your internal security policies related to the US Health Insurance Portability and Accountability Act (HIPAA), the Federal Risk and Authorization Management Program (FedRAMP), and other regimes. AWS provides some predefined, managed Config rules. You also can create custom Config rules based on criteria you define within an AWS Lambda function. In this post, I show how to create a custom rule that audits AWS resources for security compliance by enabling VPC Flow Logs for an Amazon Virtual Private Cloud (VPC). The custom rule meets requirement 4.3 of the CIS AWS Foundations Benchmark: “Ensure VPC flow logging is enabled in all VPCs.”

February 13: AWS Announces CISPE Membership and Compliance with First-Ever Code of Conduct for Data Protection in the Cloud
I have two exciting announcements today, both showing AWS’s continued commitment to ensuring that customers can comply with EU Data Protection requirements when using our services.

February 13: How to Enable Multi-Factor Authentication for AWS Services by Using AWS Microsoft AD and On-Premises Credentials
You can now enable multi-factor authentication (MFA) for users of AWS services such as Amazon WorkSpaces and Amazon QuickSight and their on-premises credentials by using your AWS Directory Service for Microsoft Active Directory (Enterprise Edition) directory, also known as AWS Microsoft AD. MFA adds an extra layer of protection to a user name and password (the first “factor”) by requiring users to enter an authentication code (the second factor), which has been provided by your virtual or hardware MFA solution. These factors together provide additional security by preventing access to AWS services, unless users supply a valid MFA code.

February 13: How to Create an Organizational Chart with Separate Hierarchies by Using Amazon Cloud Directory
Amazon Cloud Directory enables you to create directories for a variety of use cases, such as organizational charts, course catalogs, and device registries. Cloud Directory offers you the flexibility to create directories with hierarchies that span multiple dimensions. For example, you can create an organizational chart that you can navigate through separate hierarchies for reporting structure, location, and cost center. In this blog post, I show how to use Cloud Directory APIs to create an organizational chart with two separate hierarchies in a single directory. I also show how to navigate the hierarchies and retrieve data. I use the Java SDK for all the sample code in this post, but you can use other language SDKs or the AWS CLI.

February 10: How to Easily Log On to AWS Services by Using Your On-Premises Active Directory
AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as Microsoft AD, now enables your users to log on with just their on-premises Active Directory (AD) user name—no domain name is required. This new domainless logon feature makes it easier to set up connections to your on-premises AD for use with applications such as Amazon WorkSpaces and Amazon QuickSight, and it keeps the user logon experience free from network naming. This new interforest trusts capability is now available when using Microsoft AD with Amazon WorkSpaces and Amazon QuickSight Enterprise Edition. In this blog post, I explain how Microsoft AD domainless logon works with AD interforest trusts, and I show an example of setting up Amazon WorkSpaces to use this capability.

February 9: New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials that AWS creates, distributes, and rotates automatically. Using temporary credentials is an IAM best practice because you do not need to maintain long-term keys on your instance. Using IAM roles for EC2 also eliminates the need to use long-term AWS access keys that you have to manage manually or programmatically. Starting today, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance. You can also replace the IAM role attached to an existing EC2 instance. In this blog post, I show how you can attach an IAM role to an existing EC2 instance by using the AWS CLI.

February 8: How to Remediate Amazon Inspector Security Findings Automatically
The Amazon Inspector security assessment service can evaluate the operating environments and applications you have deployed on AWS for common and emerging security vulnerabilities automatically. As an AWS-built service, Amazon Inspector is designed to exchange data and interact with other core AWS services not only to identify potential security findings but also to automate addressing those findings. Previous related blog posts showed how you can deliver Amazon Inspector security findings automatically to third-party ticketing systems and automate the installation of the Amazon Inspector agent on new Amazon EC2 instances. In this post, I show how you can automatically remediate findings generated by Amazon Inspector. To get started, you must first run an assessment and publish any security findings to an Amazon Simple Notification Service (SNS) topic. Then, you create an AWS Lambda function that is triggered by those notifications. Finally, the Lambda function examines the findings and then implements the appropriate remediation based on the type of issue.

February 6: How to Simplify Security Assessment Setup Using Amazon EC2 Systems Manager and Amazon Inspector
In a July 2016 AWS Blog post, I discussed how to integrate Amazon Inspector with third-party ticketing systems by using Amazon Simple Notification Service (SNS) and AWS Lambda. This AWS Security Blog post continues in the same vein, describing how to use Amazon Inspector to automate various aspects of security management. In this post, I show you how to install the Amazon Inspector agent automatically through the Amazon EC2 Systems Manager when a new Amazon EC2 instance is launched. In a subsequent post, I will show you how to update EC2 instances automatically that run Linux when Amazon Inspector discovers a missing security patch.

Image of lock and key

January

January 30: How to Protect Data at Rest with Amazon EC2 Instance Store Encryption
Encrypting data at rest is vital for regulatory compliance to ensure that sensitive data saved on disks is not readable by any user or application without a valid key. Some compliance regulations such as PCI DSS and HIPAA require that data at rest be encrypted throughout the data lifecycle. To this end, AWS provides data-at-rest options and key management to support the encryption process. For example, you can encrypt Amazon EBS volumes and configure Amazon S3 buckets for server-side encryption (SSE) using AES-256 encryption. Additionally, Amazon RDS supports Transparent Data Encryption (TDE). Instance storage provides temporary block-level storage for Amazon EC2 instances. This storage is located on disks attached physically to a host computer. Instance storage is ideal for temporary storage of information that frequently changes, such as buffers, caches, and scratch data. By default, files stored on these disks are not encrypted. In this blog post, I show a method for encrypting data on Linux EC2 instance stores by using Linux built-in libraries. This method encrypts files transparently, which protects confidential data. As a result, applications that process the data are unaware of the disk-level encryption.

January 27: How to Detect and Automatically Remediate Unintended Permissions in Amazon S3 Object ACLs with CloudWatch Events
Amazon S3 Access Control Lists (ACLs) enable you to specify permissions that grant access to S3 buckets and objects. When S3 receives a request for an object, it verifies whether the requester has the necessary access permissions in the associated ACL. For example, you could set up an ACL for an object so that only the users in your account can access it, or you could make an object public so that it can be accessed by anyone. If the number of objects and users in your AWS account is large, ensuring that you have attached correctly configured ACLs to your objects can be a challenge. For example, what if a user were to call the PutObjectAcl API call on an object that is supposed to be private and make it public? Or, what if a user were to call the PutObject with the optional Acl parameter set to public-read, therefore uploading a confidential file as publicly readable? In this blog post, I show a solution that uses Amazon CloudWatch Events to detect PutObject and PutObjectAcl API calls in near-real time and helps ensure that the objects remain private by making automatic PutObjectAcl calls, when necessary.

January 26: Now Available: Amazon Cloud Directory—A Cloud-Native Directory for Hierarchical Data
Today we are launching Amazon Cloud Directory. This service is purpose-built for storing large amounts of strongly typed hierarchical data. With the ability to scale to hundreds of millions of objects while remaining cost-effective, Cloud Directory is a great fit for all sorts of cloud and mobile applications.

January 24: New SOC 2 Report Available: Confidentiality
As with everything at Amazon, the success of our security and compliance program is primarily measured by one thing: our customers’ success. Our customers drive our portfolio of compliance reports, attestations, and certifications that support their efforts in running a secure and compliant cloud environment. As a result of our engagement with key customers across the globe, we are happy to announce the publication of our new SOC 2 Confidentiality report. This report is available now through AWS Artifact in the AWS Management Console.

January 18: Compliance in the Cloud for New Financial Services Cybersecurity Regulations
Financial regulatory agencies are focused more than ever on ensuring responsible innovation. Consequently, if you want to achieve compliance with financial services regulations, you must be increasingly agile and employ dynamic security capabilities. AWS enables you to achieve this by providing you with the tools you need to scale your security and compliance capabilities on AWS. The following breakdown of the most recent cybersecurity regulations, NY DFS Rule 23 NYCRR 500, demonstrates how AWS continues to focus on your regulatory needs in the financial services sector.

January 9: New Amazon GameDev Blog Post: Protect Multiplayer Game Servers from DDoS Attacks by Using Amazon GameLift
In online gaming, distributed denial of service (DDoS) attacks target a game’s network layer, flooding servers with requests until performance degrades considerably. These attacks can limit a game’s availability to players and limit the player experience for those who can connect. Today’s new Amazon GameDev Blog post uses a typical game server architecture to highlight DDoS attack vulnerabilities and discusses how to stay protected by using built-in AWS Cloud security, AWS security best practices, and the security features of Amazon GameLift. Read the post to learn more.

January 6: The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016
The following list includes the 10 most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

January 6: FedRAMP Compliance Update: AWS GovCloud (US) Region Receives a JAB-Issued FedRAMP High Baseline P-ATO for Three New Services
Three new services in the AWS GovCloud (US) region have received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) under the Federal Risk and Authorization Management Program (FedRAMP). JAB issued the authorization at the High baseline, which enables US government agencies and their service providers the capability to use these services to process the government’s most sensitive unclassified data, including Personal Identifiable Information (PII), Protected Health Information (PHI), Controlled Unclassified Information (CUI), criminal justice information (CJI), and financial data.

January 4: The Top 20 Most Viewed AWS IAM Documentation Pages in 2016
The following 20 pages were the most viewed AWS Identity and Access Management (IAM) documentation pages in 2016. I have included a brief description with each link to give you a clearer idea of what each page covers. Use this list to see what other people have been viewing and perhaps to pique your own interest about a topic you’ve been meaning to research.

January 3: The Most Viewed AWS Security Blog Posts in 2016
The following 10 posts were the most viewed AWS Security Blog posts that we published during 2016. You can use this list as a guide to catch up on your blog reading or even read a post again that you found particularly useful.

January 3: How to Monitor AWS Account Configuration Changes and API Calls to Amazon EC2 Security Groups
You can use AWS security controls to detect and mitigate risks to your AWS resources. The purpose of each security control is defined by its control objective. For example, the control objective of an Amazon VPC security group is to permit only designated traffic to enter or leave a network interface. Let’s say you have an Internet-facing e-commerce website, and your security administrator has determined that only HTTP (TCP port 80) and HTTPS (TCP 443) traffic should be allowed access to the public subnet. As a result, your administrator configures a security group to meet this control objective. What if, though, someone were to inadvertently change this security group’s rules and enable FTP or other protocols to access the public subnet from any location on the Internet? That expanded access could weaken the security posture of your assets. Consequently, your administrator might need to monitor the integrity of your company’s security controls so that the controls maintain their desired effectiveness. In this blog post, I explore two methods for detecting unintended changes to VPC security groups. The two methods address not only control objectives but also control failures.

If you have questions about or issues with implementing the solutions in any of these posts, please start a new thread on the forum identified near the end of each post.

– Craig

How to Help Protect Dynamic Web Applications Against DDoS Attacks by Using Amazon CloudFront and Amazon Route 53

Post Syndicated from Holly Willey original https://aws.amazon.com/blogs/security/how-to-protect-dynamic-web-applications-against-ddos-attacks-by-using-amazon-cloudfront-and-amazon-route-53/

Using a content delivery network (CDN) such as Amazon CloudFront to cache and serve static text and images or downloadable objects such as media files and documents is a common strategy to improve webpage load times, reduce network bandwidth costs, lessen the load on web servers, and mitigate distributed denial of service (DDoS) attacks. AWS WAF is a web application firewall that can be deployed on CloudFront to help protect your application against DDoS attacks by giving you control over which traffic to allow or block by defining security rules. When users access your application, the Domain Name System (DNS) translates human-readable domain names (for example, www.example.com) to machine-readable IP addresses (for example, 192.0.2.44). A DNS service, such as Amazon Route 53, can effectively connect users’ requests to a CloudFront distribution that proxies requests for dynamic content to the infrastructure hosting your application’s endpoints.

In this blog post, I show you how to deploy CloudFront with AWS WAF and Route 53 to help protect dynamic web applications (with dynamic content such as a response to user input) against DDoS attacks. The steps shown in this post are key to implementing the overall approach described in AWS Best Practices for DDoS Resiliency and enable the built-in, managed DDoS protection service, AWS Shield.

Background

AWS hosts CloudFront and Route 53 services on a distributed network of proxy servers in data centers throughout the world called edge locations. Using the global Amazon network of edge locations for application delivery and DNS service plays an important part in building a comprehensive defense against DDoS attacks for your dynamic web applications. These web applications can benefit from the increased security and availability provided by CloudFront and Route 53 as well as improving end users’ experience by reducing latency.

The following screenshot of an Amazon.com webpage shows how static and dynamic content can compose a dynamic web application that is delivered via HTTPS protocol for the encryption of user page requests as well as the pages that are returned by a web server.

Screenshot of an Amazon.com webpage with static and dynamic content

The following map shows the global Amazon network of edge locations available to serve static content and proxy requests for dynamic content back to the origin as of the writing of this blog post. For the latest list of edge locations, see AWS Global Infrastructure.

Map showing Amazon edge locations

How AWS Shield, CloudFront, and Route 53 work to help protect against DDoS attacks

To help keep your dynamic web applications available when they are under DDoS attack, the steps in this post enable AWS Shield Standard by configuring your applications behind CloudFront and Route 53. AWS Shield Standard protects your resources from common, frequently occurring network and transport layer DDoS attacks. Attack traffic can be geographically isolated and absorbed using the capacity in edge locations close to the source. Additionally, you can configure geographical restrictions to help block attacks originating from specific countries.

The request-routing technology in CloudFront connects each client to the nearest edge location, as determined by continuously updated latency measurements. HTTP and HTTPS requests sent to CloudFront can be monitored, and access to your application resources can be controlled at edge locations using AWS WAF. Based on conditions that you specify in AWS WAF, such as the IP addresses that requests originate from or the values of query strings, traffic can be allowed, blocked, or allowed and counted for further investigation or remediation. The following diagram shows how static and dynamic web application content can originate from endpoint resources within AWS or your corporate data center. For more details, see How CloudFront Delivers Content and How CloudFront Works with Regional Edge Caches.

Route 53 DNS requests and subsequent application traffic routed through CloudFront are inspected inline. Always-on monitoring, anomaly detection, and mitigation against common infrastructure DDoS attacks such as SYN/ACK floods, UDP floods, and reflection attacks are built into both Route 53 and CloudFront. For a review of common DDoS attack vectors, see How to Help Prepare for DDoS Attacks by Reducing Your Attack Surface. When the SYN flood attack threshold is exceeded, SYN cookies are activated to avoid dropping connections from legitimate clients. Deterministic packet filtering drops malformed TCP packets and invalid DNS requests, only allowing traffic to pass that is valid for the service. Heuristics-based anomaly detection evaluates attributes such as type, source, and composition of traffic. Traffic is scored across many dimensions, and only the most suspicious traffic is dropped. This method allows you to avoid false positives while protecting application availability.

Route 53 is also designed to withstand DNS query floods, which are real DNS requests that can continue for hours and attempt to exhaust DNS server resources. Route 53 uses shuffle sharding and anycast striping to spread DNS traffic across edge locations and help protect the availability of the service.

The next four sections provide guidance about how to deploy CloudFront, Route 53, AWS WAF, and, optionally, AWS Shield Advanced.

Deploy CloudFront

To take advantage of application delivery with DDoS mitigations at the edge, start by creating a CloudFront distribution and configuring origins:

  1. Sign in to the AWS Management Console and open the CloudFront console
  2. Choose Create Distribution.
  3. On the first page of the Create Distribution Wizard, in the Web section, choose Get Started.
  4. Specify origin settings for the distribution. The following screenshot of the CloudFront console shows an example CloudFront distribution configured with an Elastic Load Balancing load balancer origin, as shown in the previous diagram. I have configured this example to set the Origin SSL Protocols to use TLSv1.2 and the Origin Protocol Policy to HTTP Only. For more information about creating an HTTPS listener for your ELB load balancer and requesting a certificate from AWS Certificate Manager (ACM), see Getting Started with Elastic Load BalancingSupported Regions, and Requiring HTTPS for Communication Between CloudFront and Your Custom Origin.
  1. Specify cache behavior settings for the distribution, as shown in the following screenshot. You can configure each URL path pattern with a set of associated cache behaviors. For dynamic web applications, set the Minimum TTL to 0 so that CloudFront will make a GET request with an If-Modified-Since header back to the origin. When CloudFront proxies traffic to the origin from edge locations and back, multiple concurrent requests for the same object are collapsed into a single request. The request is sent over a persistent connection from the edge location to the region over networks monitored by AWS. The use of a large initial TCP window size in CloudFront maximizes the available bandwidth, and TCP Fast Open (TFO) reduces latency.
  2. To ensure that all traffic to CloudFront is encrypted and to enable SSL termination from clients at global edge locations, specify Redirect HTTP to HTTPS for Viewer Protocol Policy. Moving SSL termination to CloudFront offloads computationally expensive SSL negotiation, helps mitigate SSL abuse, and reduces latency with the use of OCSP stapling and session tickets. For more information about options for serving HTTPS requests, see Choosing How CloudFront Serves HTTPS Requests. For dynamic web applications, set Allowed HTTP Methods to include all methods, set Forward Headers to All, and for Query String Forwarding and Caching, choose Forward all, cache based on all.
  1. Specify distribution settings for the distribution, as shown in the following screenshot. Enter your domain names in the Alternate Domain Names box and choose Custom SSL Certificate.
  2. Choose Create Distribution. Note the x.cloudfront.net Domain Name of the distribution. In the next section, you will configure Route 53 to route traffic to this CloudFront distribution domain name.

Configure Route 53

When you created a web distribution in the previous section, CloudFront assigned a domain name to the distribution, such as d111111abcdef8.cloudfront.net. You can use this domain name in the URLs for your content, such as: http://d111111abcdef8.cloudfront.net/logo.jpg.

Alternatively, you might prefer to use your own domain name in URLs, such as: http://example.com/logo.jpg. You can accomplish this by creating a Route 53 alias resource record set that routes dynamic web application traffic to your CloudFront distribution by using your domain name. Alias resource record sets are virtual records specific to Route 53 that are used to map alias resource record sets for your domain to your CloudFront distribution. Alias resource record sets are similar to CNAME records except there is no charge for DNS queries to Route 53 alias resource record sets mapped to AWS services. Alias resource record sets are also not visible to resolvers, and they can be created for the root domain (zone apex) as well as subdomains.

A hosted zone, similar to a DNS zone file, is a collection of records that belongs to a single parent domain name. Each hosted zone has four nonoverlapping name servers in a delegation set. If a DNS query is dropped, the client automatically retries the next name server. If you have not already registered a domain name and have not configured a hosted zone for your domain, complete these two prerequisite steps before proceeding:

After you have registered your domain name and configured your public hosted zone, follow these steps to create an alias resource record set:

  1. Sign in to the AWS Management Console and open the Route 53 console.
  2. In the navigation pane, choose Hosted Zones.
  3. Choose the name of the hosted zone for the domain that you want to use to route traffic to your CloudFront distribution.
  4. Choose Create Record Set.
  5. Specify the following values:
    • Name – Type the domain name that you want to use to route traffic to your CloudFront distribution. The default value is the name of the hosted zone. For example, if the name of the hosted zone is example.com and you want to use acme.example.com to route traffic to your distribution, type acme.
    • Type – Choose A – IPv4 address. If IPv6 is enabled for the distribution and you are creating a second resource record set, choose AAAA – IPv6 address.
    • Alias – Choose Yes.
    • Alias Target – In the CloudFront distributions section, choose the name that CloudFront assigned to the distribution when you created it.
    • Routing Policy – Accept the default value of Simple.
    • Evaluate Target Health – Accept the default value of No.
  6. Choose Create.
  7. If IPv6 is enabled for the distribution, repeat Steps 4 through 6. Specify the same settings except for the Type field, as explained in Step 5.

The following screenshot of the Route 53 console shows a Route 53 alias resource record set that is configured to map a domain name to a CloudFront distribution.

If your dynamic web application requires geo redundancy, you can use latency-based routing in Route 53 to run origin servers in different AWS regions. Route 53 is integrated with CloudFront to collect latency measurements from each edge location. With Route 53 latency-based routing, each CloudFront edge location goes to the region with the lowest latency for the origin fetch.

Enable AWS WAF

AWS WAF is a web application firewall that helps detect and mitigate web application layer DDoS attacks by inspecting traffic inline. Application layer DDoS attacks use well-formed but malicious requests to evade mitigation and consume application resources. You can define custom security rules (also called web ACLs) that contain a set of conditions, rules, and actions to block attacking traffic. After you define web ACLs, you can apply them to CloudFront distributions, and web ACLs are evaluated in the priority order you specified when you configured them. Real-time metrics and sampled web requests are provided for each web ACL.

You can configure AWS WAF whitelisting or blacklisting in conjunction with CloudFront geo restriction to prevent users in specific geographic locations from accessing your application. The AWS WAF API supports security automation such as blacklisting IP addresses that exceed request limits, which can be useful for mitigating HTTP flood attacks. Use the AWS WAF Security Automations Implementation Guide to implement rate-based blacklisting.

The following diagram shows how the (a) flow of CloudFront access logs files to an Amazon S3 bucket (b) provides the source data for the Lambda log parser function (c) to identify HTTP flood traffic and update AWS WAF web ACLs. As CloudFront receives requests on behalf of your dynamic web application, it sends access logs to an S3 bucket, triggering the Lambda log parser. The Lambda function parses CloudFront access logs to identify suspicious behavior, such as an unusual number of requests or errors, and it automatically updates your AWS WAF rules to block subsequent requests from the IP addresses in question for a predefined amount of time that you specify.

Diagram of the process

In addition to automated rate-based blacklisting to help protect against HTTP flood attacks, prebuilt AWS CloudFormation templates are available to simplify the configuration of AWS WAF for a proactive application-layer security defense. The following diagram provides an overview of CloudFormation template input into the creation of the CommonAttackProtection stack that includes AWS WAF web ACLs used to block, allow, or count requests that meet the criteria defined in each rule.

Diagram of CloudFormation template input into the creation of the CommonAttackProtection stack

To implement these application layer protections, follow the steps in Tutorial: Quickly Setting Up AWS WAF Protection Against Common Attacks. After you have created your AWS WAF web ACLs, you can assign them to your CloudFront distribution by updating the settings.

  1. Sign in to the AWS Management Console and open the CloudFront console.
  2. Choose the link under the ID column for your CloudFront distribution.
  3. Choose Edit under the General
  4. Choose your AWS WAF Web ACL from the drop-down
  5. Choose Yes, Edit.

Activate AWS Shield Advanced (optional)

Deploying CloudFront, Route 53, and AWS WAF as described in this post enables the built-in DDoS protections for your dynamic web applications that are included with AWS Shield Standard. (There is no upfront cost or charge for AWS Shield Standard beyond the normal pricing for CloudFront, Route 53, and AWS WAF.) AWS Shield Standard is designed to meet the needs of many dynamic web applications.

For dynamic web applications that have a high risk or history of frequent, complex, or high volume DDoS attacks, AWS Shield Advanced provides additional DDoS mitigation capacity, attack visibility, cost protection, and access to the AWS DDoS Response Team (DRT). For more information about AWS Shield Advanced pricing, see AWS Shield Advanced pricing. To activate advanced protection services, follow these steps:

  1. Sign in to the AWS Management Console and open the AWS WAF console.
  2. If this is your first time signing in to the AWS WAF console, choose Get started with AWS Shield Advanced. Otherwise, choose Protected resources.
  3. Choose Activate AWS Shield Advanced.
  4. Choose the resource type and resource to protect.
  5. For Name, enter a friendly name that will help you identify the AWS resources that are protected. For example, My CloudFront AWS Shield Advanced distributions.
  6. (Optional) For Web DDoS attack, select Enable. You will be prompted to associate an existing web ACL with these resources, or create a new ACL if you don’t have any yet.
  7. Choose Add DDoS protection.

Summary

In this blog post, I outline the steps to deploy CloudFront and configure Route 53 in front of your dynamic web application to leverage the global Amazon network of edge locations for DDoS resiliency. The post also provides guidance about enabling AWS WAF for application layer traffic monitoring and automated rules creation to block malicious traffic. I also cover the optional steps to activate AWS Shield Advanced, which helps build a more comprehensive defense against DDoS attacks for your dynamic web applications.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please open a new thread on the AWS WAF forum.

– Holly

How to Protect Your Web Application Against DDoS Attacks by Using Amazon Route 53 and an External Content Delivery Network

Post Syndicated from Shawn Marck original https://aws.amazon.com/blogs/security/how-to-protect-your-web-application-against-ddos-attacks-by-using-amazon-route-53-and-a-content-delivery-network/

Distributed Denial of Service (DDoS) attacks are attempts by a malicious actor to flood a network, system, or application with more traffic, connections, or requests than it is able to handle. To protect your web application against DDoS attacks, you can use AWS Shield, a DDoS protection service that AWS provides automatically to all AWS customers at no additional charge. You can use AWS Shield in conjunction with DDoS-resilient web services such as Amazon CloudFront and Amazon Route 53 to improve your ability to defend against DDoS attacks. Learn more about architecting for DDoS resiliency by reading the AWS Best Practices for DDoS Resiliency whitepaper.

In this blog post, I show how you can help protect the zone apex (also known as the root domain) of your web application by using Route 53 to perform a secure redirect to your externally hosted content delivery network (CDN) distribution.

Background

When browsing the Internet, a user might type example.com instead of www.example.com. To make sure these requests are routed properly, it is necessary to create a Route 53 alias resource record set for the zone apex. For example.com, this would be an alias resource record set without any subdomain (www) defined. With Route 53, you can use an alias resource record set to point www or your zone apex directly at a CloudFront distribution. As a result, anyone resolving example.com or www.example.com will see only the CloudFront distribution. This makes it difficult for a malicious actor to find and attack your application origin.

You can also use Route 53 to route end users to a CDN outside AWS. The CDN provider will ask you to create a CNAME alias resource record set to point www.example.com to your CDN distribution’s hostname. Unfortunately, it is not possible to point your zone apex with a CNAME alias resource record set because a zone apex cannot be a CNAME. As a result, users who type example.com without www will not be routed to your web application unless you point the zone apex directly to your application origin.

The benefit of a secure redirect from the zone apex to www is that it helps protect your origin from being exposed to direct attacks.

Solution overview

The following solution diagram shows the AWS services this solution uses and how the solution uses them.

Diagram showing how AWS services are used in this post's solution

Here is how the process works:

  1. A user’s browser makes a DNS request to Route 53.
  2. Route 53 has a hosted zone for the example.com domain.
  3. The hosted zone serves the record:
    1. If the request is for the apex zone, the alias resource record set for the CloudFront distribution is served.
    2. If the request is for the www subdomain, the CNAME for the externally hosted CDN is served.
  4. CloudFront forwards the request to Amazon S3.
  5. S3 performs a secure redirect from example.com to www.example.com.

Note: All of the steps in this blog post’s solution use example.com as a domain name. You must replace this domain name with your own domain name.

AWS services used in this solution

You will use three AWS services in this walkthrough to build your zone apex–to–external CDN distribution redirect:

  • Route 53 – This post assumes that you are already using Route 53 to route users to your web application, which provides you with protection against common DDoS attacks, including DNS query floods. To learn more about migrating to Route 53, see Getting Started with Amazon Route 53.
  • S3 – S3 is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. S3 also allows you to configure a bucket for website hosting. In this walkthrough, you will use the S3 website hosting feature to redirect users from example.com to www.example.com, which points to your externally hosted CDN.
  • CloudFront – When architecting your application for DDoS resiliency, it is important to protect origin resources, such as S3 buckets, from discovery by a malicious actor. This is known as obfuscation. In this walkthrough, you will use a CloudFront distribution to obfuscate your S3 bucket.

Prerequisites

The solution in this blog post assumes that you already have the following components as part of your architecture:

  1. A Route 53 hosted zone for your domain.
  2. A CNAME alias resource record set pointing to your CDN.

Deploy the solution

In this solution, you:

  1. Create an S3 bucket with HTTP redirection. This allows requests made to your zone apex to be redirected to your www subdomain.
  2. Create and configure a CloudFront web distribution. I use a CloudFront distribution in front of my S3 web redirect so that I can leverage the advanced DDoS protection and scale that is native to CloudFront.
  3. Configure an alias resource record set in your hosted zone. Alias resource record sets are similar to CNAME records, but you can set them at the zone apex.
  4. Validate that the redirect is working.

Step 1: Create an S3 bucket with HTTP redirection

The following steps show how to configure your S3 bucket as a static website that will perform HTTP redirects to your www URL:

  1. Open the AWS Management Console. Navigate to the S3 console and create an S3 bucket in the region of your choice.
  2. Configure static website hosting to redirect all requests to another host name:
    1. Choose the S3 bucket you just created and then choose Properties.
      Screenshot showing choosing the S3 bucket and the Properties button
    2. Choose Static Website Hosting.
      Screenshot of choosing Static Website Hosting
    3. Choose Redirect all requests to another host name, and type your zone apex (root domain) in the Redirect all requests to box, as shown in the following screenshot.
      Screenshot of Static Website Hosting settings to choose

Note: At the top of this tab, you will see an endpoint. Copy the endpoint because you will need it in Step 2 when you configure the CloudFront distribution. In this example, the endpoint is example-com.s3-website-us-east-1.amazonaws.com.

Step 2: Create and configure a CloudFront web distribution

The following steps show how to create a CloudFront web distribution that protects the S3 bucket:

  1. From the AWS Management Console, choose CloudFront.
  2. On the first page of the Create Distribution Wizard, in the Web section, choose Get Started.
  3. The Create Distribution page has many values you can specify. For this walkthrough, you need to specify only two settings:
    1. Origin Settings:
      • Origin Domain Name –When you click in this box, a menu appears with AWS resources you can choose. Choose the S3 bucket you created in Step 1, or paste the endpoint URL you copied in Step 1. In this example, the endpoint is example-com.s3-website-us-east-1.amazonaws.com.
        Screenshot of Origin Domain Name
    1. Distribution Settings:
      • Alternate Domain Names (CNAMEs) – Type the root domain (for this walkthrough, it is www.example.com).
        Screenshot of Alternate Domain Names
  4. Click Create Distribution.
  5. Wait for the CloudFront distribution to deploy completely before proceeding to Step 3. After CloudFront creates your distribution, the value of the Status column for your distribution will change from InProgress to Deployed. The distribution is then ready to process requests.

Step 3: Configure an alias resource record set in your hosted zone

In this step, you use Route 53 to configure an alias resource record set for your zone apex that resolves to the CloudFront distribution you made in Step 2:

  1. From the AWS Management Console, choose Route 53 and choose Hosted zones.
  2. On the Hosted zones page, choose your domain. This takes you to the Record sets page.
    Screenshot of choosing the domain on the Hosted zones page
  3. Click Create Record Set.
  4. Leave the Name box blank and choose Alias: Yes.
  5. Click the Alias Target box, and choose the CloudFront distribution you created in Step 2. If the distribution does not appear in the list automatically, you can copy and paste the name exactly as it appears in the CloudFront console.
  6. Click Create.
    Screenshot of creating the record set

Step 4: Validate that the redirect is working

To confirm that you have correctly configured all components of this solution and your zone apex is redirecting to the www domain as expected, open a browser and navigate to your zone apex. In this walkthrough, the zone apex is http://example.com and it should redirect automatically to http://www.example.com.

Summary

In this post, I showed how you can help protect your web application against DDoS attacks by using Route 53 to perform a secure redirect to your externally hosted CDN distribution. This helps protect your origin from being exposed to direct DDoS attacks.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this blog post, start a new thread in the Route 53 forum.

– Shawn

AWS Hot Startups – February 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-february-2017-2/

As we finish up the month of February, Tina Barr is back with some awesome startups.

-Ana


This month we are bringing you five innovative hot startups:

  • GumGum – Creating and popularizing the field of in-image advertising.
  • Jiobit – Smart tags to help parents keep track of kids.
  • Parsec – Offers flexibility in hardware and location for PC gamers.
  • Peloton – Revolutionizing indoor cycling and fitness classes at home.
  • Tendril – Reducing energy consumption for homeowners.

If you missed any of our January startups, make sure to check them out here.

GumGum (Santa Monica, CA)
GumGum logo1GumGum is best known for inventing and popularizing the field of in-image advertising. Founded in 2008 by Ophir Tanz, the company is on a mission to unlock the value held within the vast content produced daily via social media, editorials, and broadcasts in a variety of industries. GumGum powers campaigns across more than 2,000 premium publishers, which are seen by over 400 million users.

In-image advertising was pioneered by GumGum and has given companies a platform to deliver highly visible ads to a place where the consumer’s attention is already focused. Using image recognition technology, GumGum delivers targeted placements as contextual overlays on related pictures, as banners that fit on all screen sizes, or as In-Feed placements that blend seamlessly into the surrounding content. Using Visual Intelligence, GumGum can scour social media and broadcast TV for all images and videos related to a brand, allowing companies to gain a stronger understanding of their audience and how they are relating to that brand on social media.

GumGum relies on AWS for its Image Processing and Ad Serving operations. Using AWS infrastructure, GumGum currently processes 13 million requests per minute across the globe and generates 30 TB of new data every day. The company uses a suite of services including but not limited to Amazon EC2, Amazon S3, Amazon Kinesis, Amazon EMR, AWS Data Pipeline, and Amazon SNS. AWS edge locations allow GumGum to serve its customers in the US, Europe, Australia, and Japan and the company has plans to expand its infrastructure to Australia and APAC regions in the future.

For a look inside GumGum’s startup culture, check out their first Hackathon!

Jiobit (Chicago, IL)
Jiobit Team1
Jiobit was inspired by a real event that took place in a crowded Chicago park. A couple of summers ago, John Renaldi experienced every parent’s worst nightmare – he lost track of his then 6-year-old son in a public park for almost 30 minutes. John knew he wasn’t the only parent with this problem. After months of research, he determined that over 50% of parents have had a similar experience and an even greater percentage are actively looking for a way to prevent it.

Jiobit is the world’s smallest and longest lasting smart tag that helps parents keep track of their kids in every location – indoors and outdoors. The small device is kid-proof: lightweight, durable, and waterproof. It acts as a virtual “safety harness” as it uses a combination of Bluetooth, Wi-Fi, Multiple Cellular Networks, GPS, and sensors to provide accurate locations in real-time. Jiobit can automatically learn routes and locations, and will send parents an alert if their child does not arrive at their destination on time. The talented team of experienced engineers, designers, marketers, and parents has over 150 patents and has shipped dozens of hardware and software products worldwide.

The Jiobit team is utilizing a number of AWS services in the development of their product. Security is critical to the overall product experience, and they are over-engineering security on both the hardware and software side with the help of AWS. Jiobit is also working towards being the first child monitoring device that will have implemented an Alexa Skill via the Amazon Echo device (see here for a demo!). The devices use AWS IoT to send and receive data from the Jio Cloud over the MQTT protocol. Once data is received, they use AWS Lambda to parse the received data and take appropriate actions, including storing relevant data using Amazon DynamoDB, and sending location data to Amazon Machine Learning processing jobs.

Visit the Jiobit blog for more information.

Parsec (New York, NY)
Parsec logo large1
Parsec operates under the notion that everyone should have access to the best computing in the world because access to technology creates endless opportunities. Founded in 2016 by Benjy Boxer and Chris Dickson, Parsec aims to eliminate the burden of hardware upgrades that users frequently experience by building the technology to make a computer in the cloud available anywhere, at any time. Today, they are using their technology to enable greater flexibility in the hardware and location that PC gamers choose to play their favorite games on. Check out this interview with Benjy and our Startups team for a look at how Parsec works.

Parsec built their first product to improve the gaming experience; gamers no longer have to purchase consoles or expensive PCs to access the entertainment they love. Their low latency video streaming and networking technologies allow gamers to remotely access their gaming rig and play on any Windows, Mac, Android, or Raspberry Pi device. With the global reach of AWS, Parsec is able to deliver cloud gaming to the median user in the US and Europe with less than 30 milliseconds of network latency.

Parsec users currently have two options available to start gaming with cloud resources. They can either set up their own machines with the Parsec AMI in their region or rely on Parsec to manage everything for a seamless experience. In either case, Parsec uses the g2.2xlarge EC2 instance type. Parsec is using Amazon Elastic Block Storage to store games, Amazon DynamoDB for scalability, and Amazon EC2 for its web servers and various APIs. They also deal with a high volume of logs and take advantage of the Amazon Elasticsearch Service to analyze the data.

Be sure to check out Parsec’s blog to keep up with the latest news.

Peloton (New York, NY)
Peloton image 3
The idea for Peloton was born in 2012 when John Foley, Founder and CEO, and his wife Jill started realizing the challenge of balancing work, raising young children, and keeping up with personal fitness. This is a common challenge people face – they want to work out, but there are a lot of obstacles that stand in their way. Peloton offers a solution that enables people to join indoor cycling and fitness classes anywhere, anytime.

Peloton has created a cutting-edge indoor bike that streams up to 14 hours of live classes daily and has over 4,000 on-demand classes. Users can access live classes from world-class instructors from the convenience of their home or gym. The bike tracks progress with in-depth ride metrics and allows people to compete in real-time with other users who have taken a specific ride. The live classes even feature top DJs that play current playlists to keep users motivated.

With an aggressive marketing campaign, which has included high-visibility TV advertising, Peloton made the decision to run its entire platform in the cloud. Most recently, they ran an ad during an NFL playoff game and their rate of requests per minute to their site increased from ~2k/min to ~32.2k/min within 60 seconds. As they continue to grow and diversify, they are utilizing services such as Amazon S3 for thousands of hours of archived on-demand video content, Amazon Redshift for data warehousing, and Application Load Balancer for intelligent request routing.

Learn more about Peloton’s engineering team here.

Tendril (Denver, CO)
Tendril logo1
Tendril was founded in 2004 with the goal of helping homeowners better manage and reduce their energy consumption. Today, electric and gas utilities use Tendril’s data analytics platform on more than 140 million homes to deliver a personalized energy experience for consumers around the world. Using the latest technology in decision science and analytics, Tendril can gain access to real-time, ever evolving data about energy consumers and their homes so they can improve customer acquisition, increase engagement, and orchestrate home energy experiences. In turn, Tendril helps its customers unlock the true value of energy interactions.

AWS helps Tendril run its services globally, while scaling capacity up and down as needed, and in real-time. This has been especially important in support of Tendril’s newest solution, Orchestrated Energy, a continuous demand management platform that calculates a home’s thermal mass, predicts consumer behavior, and integrates with smart thermostats and other connected home devices. This solution allows millions of consumers to create a personalized energy plan for their home based on their individual needs.

Tendril builds and maintains most of its infrastructure services with open sources tools running on Amazon EC2 instances, while also making use of AWS services such as Elastic Load Balancing, Amazon API Gateway, Amazon CloudFront, Amazon Route 53, Amazon Simple Queue Service, and Amazon RDS for PostgreSQL.

Visit the Tendril Blog for more information!

— Tina Barr

AWS Week in Review – February 20, 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-february-20-2017/

By popular demand, I am producing this “micro” version of the AWS Week in Review. I have included all of our announcements, content from all of our blogs, and as much community-generated AWS content as I had time for. Going forward I hope to bring back the other sections, as soon as I get my tooling and automation into better shape.

Monday

February 20

Tuesday

February 21

Wednesday

February 22

Thursday

February 23

Friday

February 24

Saturday

February 25

Jeff;

 

AWS Hot Startups- January 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-january-2017-2/

It is the start of a new year and Tina Barr is back with many more great new startups to check out.
-Ana


Welcome back to another year of hot AWS-powered startups! We have three exciting new startups today:

  • ClassDojo – Connecting teachers, students, and parents to the classroom.
  • Nubank – A financial services startup reimagining the banking experience.
  • Ravelin – A fraud detection company built on machine learning models.

If you missed any of last year’s featured startups, be sure to check out our Year in Review.

ClassDojo (San Francisco)
ClassDojo imageFounded in 2011 by Liam Don and Sam Chaudhary, ClassDojo is a communication platform for the classroom. Teachers, parents, and students can use it throughout the day as a place to share important moments through photos, videos and messaging. With many classrooms today operating as a one-size-fits-all model, the ClassDojo founders wanted to improve the education system and connect the 700 million primary age kids in the world to the very best content and services. Sam and Liam started out by asking teachers what they would find most helpful for their classrooms, and many expressed that they wanted a more caring and inclusive community – one where they could be connected to everyone who was part of their classroom. With ClassDojo, teachers are able to create their own classroom culture in partnership with students and their parents.

In five years, ClassDojo has expanded to 90% of K-8 schools in the US and 180 other countries, and their content has been translated into over 35 languages. Recently, they have expanded further into classrooms with video series on Empathy and Growth Mindset that were co-created with Harvard and Stanford. These videos have now been seen by 1 in 3 kids under the age of 14 in the U.S. One of their products called Stories allows for instantly updated streams of pictures and videos from the school day, all of which are shared at home with parents. Students can even create their own stories – a timeline or portfolio of what they’ve learned.

Because ClassDojo sees heavy usage during the school day and across many global time zones, their traffic patterns are highly variable. Amazon EC2 autoscaling allows them to meet demand while controlling costs during quieter periods. Their data pipeline is built entirely on AWS – Amazon Kinesis allows them to stream high volumes of data into Amazon Redshift for analysis and into Amazon S3 for archival. They also utilize Amazon Aurora and Amazon RDS to store sensitive relational data, which makes at-rest encryption easy to manage, while scaling to meet very high query volumes with incredibly low latency. All of ClassDojo’s web frontends are hosted on Amazon S3 and served through Amazon CloudFront, and they use AWS WAF rules to secure their frontends against attacks and unauthorized access. To detect fraudulent accounts they have used Amazon Machine Learning, and are also exploring the new Amazon Lex service to provide voice control so that teachers can use their products hands-free in the classroom.

Check out their blog to see how teachers across the world are using ClassDojo in their classrooms!

Nubank (Brazil)
Nubank imageNubank is a technology-driven financial services startup that is working to redefine the banking standard in Brazil. Founder David Vélez with a team of over 350 engineers, scientists, designers, and analysts, they have created a banking alternative in one of the world’s fastest growing mobile markets. Not only is Brazil the world’s 5th largest country in both area and population, but it also has one of the highest credit card interest rates in the world. Nubank has reimagined the credit card experience for a world where everyone has access to smartphones and offers a product customers haven’t seen before.

The Brazilian banking industry is both heavily regulated and extremely concentrated. Nubank saw an opportunity for companies that are truly customer-centric and have better data and technology to compete in an industry that has seen little innovation in decades. With Nubank’s mobile app customers are able to block and unblock their credit cards, change their credit limits, pay their bills, and have access to all of their purchases in real time. They also offer 24/7 customer support through digital channels and clear and simple communication. This was previously unheard of in Brazil’s banking industry, and Nubank’s services have been extremely well-received by customers.

From the start, Nubank’s leaders planned for growth. They wanted to build a system that could meet the ever changing regulatory and business rules, have full auditing capability and scale in both size and complexity. They use many AWS services including Amazon DynamoDB, Amazon EC2, Amazon S3, and AWS CloudFormation. By using AWS, Nubank developed its credit card processing platform in only seven months and are able to add features with ease.

Go to Nubank’s blog for more information!

Ravelin (London)
Ravelin imageLaunched in 2015, Ravelin is a fraud detection company that works with many leading e-commerce and on-demand companies in a range of sectors including travel, retail, food delivery, ticketing, and transport. The company’s founders (Martin Sweeney, Leonard Austin, Mairtain O’Riada, and Nicky Lally) began their work while trying to solve fraud issues in an on-demand taxi business, which required accurate fraud predictions about a customer with limited information and then making that fraud decision almost instantly. They soon found that there was nothing on the market that was able to do this, and so the founders left to start Ravelin.

Ravelin allows its clients to spend less time on manual reviews and instead focus on servicing their customers. Their machine learning models are built to predict good and bad behavior based on the relevant customer behavioral and payment data sent via API. Spotting bad behavior helps Ravelin to prevent fraud, and equally importantly, spotting good patterns means fewer good customers are being blocked. Ravelin chose machine learning as their core technology due to its incredible accuracy at a speed and scale that aligns with how their clients’ businesses operate.

Ravelin uses a suite of AWS services to help their machine learning algorithms detect fraud. Their clients are spread all over the world and their peak traffic times can be unpredictable so they scale their Amazon EC2 infrastructure multiple times a day, which helps with handling increased traffic while minimizing server costs. Ravelin also uses services such as Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, and Amazon Elasticsearch Service. Utilizing these services has allowed the Ravelin team more time to concentrate on building fraud detection software.

For the latest in fraud prevention, be sure to check out Ravelin’s blog!

-Tina Barr

AWS IPv6 Update – Global Support Spanning 15 Regions & Multiple AWS Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-ipv6-update-global-support-spanning-15-regions-multiple-aws-services/

We’ve been working to add IPv6 support to many different parts of AWS over the last couple of years, starting with Elastic Load Balancing, AWS IoT, Amazon Route 53, Amazon CloudFront, AWS WAF, and S3 Transfer Acceleration, all building up to last month’s announcement of IPv6 support for EC2 instances in Virtual Private Clouds (initially available for use in the US East (Ohio) Region).

Today I am happy to share the news that IPv6 support for EC2 instances in VPCs is now available in a total of fifteen regions, along with Application Load Balancer support for IPv6 in nine of those regions.

You can now build and deploy applications that can use IPv6 addresses to communicate with servers, object storage, load balancers, and content distribution services. In accord with the latest guidelines for IPv6 support from Apple and other vendors, your mobile applications can now make use of IPv6 addresses when they communicate with AWS.

IPv6 Now in 15 Regions
IPv6 support for EC2 instances in new and existing VPCs is now available in the US East (Northern Virginia), US East (Ohio), US West (Northern California), US West (Oregon), South America (São Paulo), Canada (Central), EU (Ireland), EU (Frankfurt), EU (London), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Mumbai), and AWS GovCloud (US) Regions and you can start using it today!

You can enable IPv6 from the AWS Management Console when you create a new VPC:

Application Load Balancer
Application Load Balancers in the US East (Northern Virginia), US West (Northern California), US West (Oregon), South America (São Paulo), EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and AWS GovCloud (US) Regions now support IPv6 in dual-stack mode, making them accessible via IPv4 or IPv6 (we expect to add support for the remaining regions within a few weeks).

Simply enable the dualstack option when you configure the ALB and then make sure that your security groups allow or deny IPv6 traffic in accord with your requirements. Here’s how you select the dualstack option:

You can also enable this option by running the set-ip-address-type command or by making a call to the SetIpAddressType function. To learn more about this new feature, read the Load Balancer Address Type documentation.

IPv6 Recap
Here are the IPv6 launches that we made in the run-up to the launch of IPv6 support for EC2 instances in VPCs:

CloudFront, WAF, and S3 Transfer Acceleration – This launch let you enable IPv6 support for individual CloudFront distributions. Newly created distributions supported IPv6 by default and existing distributions could be upgraded with a couple of clicks (if you using Route 53 alias records, you also need to add an AAAA record to the domain). With IPv6 support enabled, the new addresses will show up in the CloudFront Access Logs. The launch also let you use AWS WAF to inspect requests that arrive via IPv4 or IPv6 addresses and to use a new, dual-stack endpoint for S3 Transfer Acceleration.

Route 53 – This launch added support for DNS queries over IPv6 (support for the requisite AAAA records was already in place). A subsequent launch added support for Health Checks of IPv6 Endpoints, allowing you to monitor the health of the endpoints and to arrange for DNS failover.

IoT – This product launch included IPv6 support for message exchange between devices and AWS IoT.

S3 – This launch added support for access to S3 buckets via dual-stack endpoints.

Elastic Load Balancing – This launch added publicly routable IPv6 addresses for Elastic Load Balancers.

Jeff;

 

AWS Web Application Firewall (WAF) for Application Load Balancers

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-web-application-firewall-waf-for-application-load-balancers/

I’m still catching up on a couple of launches that we made late last year!

Today’s post covers two services that I’ve written about in the past — AWS Web Application Firewall (WAF) and AWS Application Load Balancer:

AWS Web Application Firewall (WAF) – Helps to protect your web applications from common application-layer exploits that can affect availability or consume excessive resources. As you can see in my post (New – AWS WAF), WAF allows you to use access control lists (ACLs), rules, and conditions that define acceptable or unacceptable requests or IP addresses. You can selectively allow or deny access to specific parts of your web application and you can also guard against various SQL injection attacks. We launched WAF with support for Amazon CloudFront.

AWS Application Load Balancer (ALB) – This load balancing option for the Elastic Load Balancing service runs at the application layer. It allows you to define routing rules that are based on content that can span multiple containers or EC2 instances. Application Load Balancers support HTTP/2 and WebSocket, and give you additional visibility into the health of the target containers and instances (to learn more, read New – AWS Application Load Balancer).

Better Together
Late last year (I told you I am still catching up), we announced that WAF can now help to protect applications that are running behind an Application Load Balancer. You can set this up pretty quickly and you can protect both internal and external applications and web services.

I already have three EC2 instances behind an ALB:

I simple create a Web ACL in the same region and associate it with the ALB. I begin by naming the Web ACL. I also instruct WAF to publish to a designated CloudWatch metric:

Then I add any desired conditions to my Web ACL:

For example, I can easily set up several SQL injection filters for the query string:

After I create the filter I use it to create a rule:

And then I use the rule to block requests that match the condition:

To pull it all together I review my settings and then create the Web ACL:

Seconds after I click on Confirm and create, the new rule is active and WAF is protecting the application behind my ALB:

And that’s all it takes to use WAF to protect the EC2 instances and containers that are running behind an Application Load Balancer!

Learn More
To learn more about how to use WAF and ALB together, plan to attend the Secure Your Web Applications Using AWS WAF and Application Load Balancer webinar at 10 AM PT on January 26th.

You may also find the Secure Your Web Application With AWS WAF and Amazon CloudFront presentation from re:Invent to be of interest.

Jeff;