Tag Archives: bucket access

Implementing Default Directory Indexes in Amazon S3-backed Amazon CloudFront Origins Using [email protected]

Post Syndicated from Ronnie Eichler original https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/

With the recent launch of [email protected], it’s now possible for you to provide even more robust functionality to your static websites. Amazon CloudFront is a content distribution network service. In this post, I show how you can use [email protected] along with the CloudFront origin access identity (OAI) for Amazon S3 and still provide simple URLs (such as www.example.com/about/ instead of www.example.com/about/index.html).

Background

Amazon S3 is a great platform for hosting a static website. You don’t need to worry about managing servers or underlying infrastructure—you just publish your static to content to an S3 bucket. S3 provides a DNS name such as <bucket-name>.s3-website-<AWS-region>.amazonaws.com. Use this name for your website by creating a CNAME record in your domain’s DNS environment (or Amazon Route 53) as follows:

www.example.com -> <bucket-name>.s3-website-<AWS-region>.amazonaws.com

You can also put CloudFront in front of S3 to further scale the performance of your site and cache the content closer to your users. CloudFront can enable HTTPS-hosted sites, by either using a custom Secure Sockets Layer (SSL) certificate or a managed certificate from AWS Certificate Manager. In addition, CloudFront also offers integration with AWS WAF, a web application firewall. As you can see, it’s possible to achieve some robust functionality by using S3, CloudFront, and other managed services and not have to worry about maintaining underlying infrastructure.

One of the key concerns that you might have when implementing any type of WAF or CDN is that you want to force your users to go through the CDN. If you implement CloudFront in front of S3, you can achieve this by using an OAI. However, in order to do this, you cannot use the HTTP endpoint that is exposed by S3’s static website hosting feature. Instead, CloudFront must use the S3 REST endpoint to fetch content from your origin so that the request can be authenticated using the OAI. This presents some challenges in that the REST endpoint does not support redirection to a default index page.

CloudFront does allow you to specify a default root object (index.html), but it only works on the root of the website (such as http://www.example.com > http://www.example.com/index.html). It does not work on any subdirectory (such as http://www.example.com/about/). If you were to attempt to request this URL through CloudFront, CloudFront would do a S3 GetObject API call against a key that does not exist.

Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of [email protected], you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.

Solution

In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.

When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

To get started, create an S3 bucket to be the origin for CloudFront:

Create bucket

On the other screens, you can just accept the defaults for the purposes of this walkthrough. If this were a production implementation, I would recommend enabling bucket logging and specifying an existing S3 bucket as the destination for access logs. These logs can be useful if you need to troubleshoot issues with your S3 access.

Now, put some content into your S3 bucket. For this walkthrough, create two simple webpages to demonstrate the functionality:  A page that resides at the website root, and another that is in a subdirectory.

<s3bucketname>/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>

<s3bucketname>/subdirectory/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>

When uploading the files into S3, you can accept the defaults. You add a bucket policy as part of the CloudFront distribution creation that allows CloudFront to access the S3 origin. You should now have an S3 bucket that looks like the following:

Root of bucket

Subdirectory in bucket

Next, create a CloudFront distribution that your users will use to access the content. Open the CloudFront console, and choose Create Distribution. For Select a delivery method for your content, under Web, choose Get Started.

On the next screen, you set up the distribution. Below are the options to configure:

  • Origin Domain Name:  Select the S3 bucket that you created earlier.
  • Restrict Bucket Access: Choose Yes.
  • Origin Access Identity: Create a new identity.
  • Grant Read Permissions on Bucket: Choose Yes, Update Bucket Policy.
  • Object Caching: Choose Customize (I am changing the behavior to avoid having CloudFront cache objects, as this could affect your ability to troubleshoot while implementing the Lambda code).
    • Minimum TTL: 0
    • Maximum TTL: 0
    • Default TTL: 0

You can accept all of the other defaults. Again, this is a proof-of-concept exercise. After you are comfortable that the CloudFront distribution is working properly with the origin and Lambda code, you can re-visit the preceding values and make changes before implementing it in production.

CloudFront distributions can take several minutes to deploy (because the changes have to propagate out to all of the edge locations). After that’s done, test the functionality of the S3-backed static website. Looking at the distribution, you can see that CloudFront assigns a domain name:

CloudFront Distribution Settings

Try to access the website using a combination of various URLs:

http://<domainname>/:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET / HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "cb7e2634fe66c1fd395cf868087dd3b9"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: -D2FSRwzfcwyKZKFZr6DqYFkIf4t7HdGw2MkUF5sE6YFDxRJgi0R1g==
< Content-Length: 209
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:16 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This is because CloudFront is configured to request a default root object (index.html) from the origin.

http://<domainname>/subdirectory/:  Doesn’t work

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "d41d8cd98f00b204e9800998ecf8427e"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: Iqf0Gy8hJLiW-9tOAdSFPkL7vCWBrgm3-1ly5tBeY_izU82ftipodA==
< Content-Length: 0
< Content-Type: application/x-directory
< Last-Modified: Wed, 19 Jul 2017 19:21:24 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

If you use a tool such like cURL to test this, you notice that CloudFront and S3 are returning a blank response. The reason for this is that the subdirectory does exist, but it does not resolve to an S3 object. Keep in mind that S3 is an object store, so there are no real directories. User interfaces such as the S3 console present a hierarchical view of a bucket with folders based on the presence of forward slashes, but behind the scenes the bucket is just a collection of keys that represent stored objects.

http://<domainname>/subdirectory/index.html:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/index.html
*   Trying 54.192.192.130...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.130) port 80 (#0)
> GET /subdirectory/index.html HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 20:35:15 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: RefreshHit from cloudfront
< X-Amz-Cf-Id: bkh6opXdpw8pUomqG3Qr3UcjnZL8axxOH82Lh0OOcx48uJKc_Dc3Cg==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3f2788d309d30f41de96da6f931d4ede.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This request works as expected because you are referencing the object directly. Now, you implement the [email protected] function to return the default index.html page for any subdirectory. Looking at the example JavaScript code, here’s where the magic happens:

var newuri = olduri.replace(/\/$/, '\/index.html');

You are going to use a JavaScript regular expression to match any ‘/’ that occurs at the end of the URI and replace it with ‘/index.html’. This is the equivalent to what S3 does on its own with static website hosting. However, as I mentioned earlier, you can’t rely on this if you want to use a policy on the bucket to restrict it so that users must access the bucket through CloudFront. That way, all requests to the S3 bucket must be authenticated using the S3 REST API. Because of this, you implement a [email protected] function that takes any client request ending in ‘/’ and append a default ‘index.html’ to the request before requesting the object from the origin.

In the Lambda console, choose Create function. On the next screen, skip the blueprint selection and choose Author from scratch, as you’ll use the sample code provided.

Next, configure the trigger. Choosing the empty box shows a list of available triggers. Choose CloudFront and select your CloudFront distribution ID (created earlier). For this example, leave Cache Behavior as * and CloudFront Event as Origin Request. Select the Enable trigger and replicate box and choose Next.

Lambda Trigger

Next, give the function a name and a description. Then, copy and paste the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

Next, define a role that grants permissions to the Lambda function. For this example, choose Create new role from template, Basic Edge Lambda permissions. This creates a new IAM role for the Lambda function and grants the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ]
        }
    ]
}

In a nutshell, these are the permissions that the function needs to create the necessary CloudWatch log group and log stream, and to put the log events so that the function is able to write logs when it executes.

After the function has been created, you can go back to the browser (or cURL) and re-run the test for the subdirectory request that failed previously:

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.202...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.202) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 21:18:44 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: rwFN7yHE70bT9xckBpceTsAPcmaadqWB9omPBv2P6WkIfQqdjTk_4w==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3572de112011f1b625bb77410b0c5cca.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

You have now configured a way for CloudFront to return a default index page for subdirectories in S3!

Summary

In this post, you used [email protected] to be able to use CloudFront with an S3 origin access identity and serve a default root object on subdirectory URLs. To find out some more about this use-case, see [email protected] integration with CloudFront in our documentation.

If you have questions or suggestions, feel free to comment below. For troubleshooting or implementation help, check out the Lambda forum.

AWS Enables Consortium Science to Accelerate Discovery

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-enables-consortium-science-to-accelerate-discovery/

My colleague Mia Champion is a scientist (check out her publications), an AWS Certified Solutions Architect, and an AWS Certified Developer. The time that she spent doing research on large-data datasets gave her an appreciation for the value of cloud computing in the bioinformatics space, which she summarizes and explains in the guest post below!

Jeff;


Technological advances in scientific research continue to enable the collection of exponentially growing datasets that are also increasing in the complexity of their content. The global pace of innovation is now also fueled by the recent cloud-computing revolution, which provides researchers with a seemingly boundless scalable and agile infrastructure. Now, researchers can remove the hindrances of having to own and maintain their own sequencers, microscopes, compute clusters, and more. Using the cloud, scientists can easily store, manage, process and share datasets for millions of patient samples with gigabytes and more of data for each individual. As American physicist, John Bardeen once said: “Science is a collaborative effort. The combined results of several people working together is much more effective than could be that of an individual scientist working alone”.

Prioritizing Reproducible Innovation, Democratization, and Data Protection
Today, we have many individual researchers and organizations leveraging secure cloud enabled data sharing on an unprecedented scale and producing innovative, customized analytical solutions using the AWS cloud.  But, can secure data sharing and analytics be done on such a collaborative scale as to revolutionize the way science is done across a domain of interest or even across discipline/s of science? Can building a cloud-enabled consortium of resources remove the analytical variability that leads to diminished reproducibility, which has long plagued the interpretability and impact of research discoveries? The answers to these questions are ‘yes’ and initiatives such as the Neuro Cloud Consortium, The Global Alliance for Genomics and Health (GA4GH), and The Sage Bionetworks Synapse platform, which powers many research consortiums including the DREAM challenges, are starting to put into practice model cloud-initiatives that will not only provide impactful discoveries in the areas of neuroscience, infectious disease, and cancer, but are also revolutionizing the way in which scientific research is done.

Bringing Crowd Developed Models, Algorithms, and Functions to the Data
Collaborative projects have traditionally allowed investigators to download datasets such as those used for comparative sequence analysis or for training a deep learning algorithm on medical imaging data. Investigators were then able to develop and execute their analysis using institutional clusters, local workstations, or even laptops:

This method of collaboration is problematic for many reasons. The first concern is data security, since dataset download essentially permits “chain-data-sharing” with any number of recipients. Second, analytics done using compute environments that are not templated at some level introduces the risk of variable analytics that itself is not reproducible by a different investigator, or even the same investigator using a different compute environment. Third, the required data dump, processing, and then re-upload or distribution to the collaborative group is highly inefficient and dependent upon each individual’s networking and compute capabilities. Overall, traditional methods of scientific collaboration have introduced methods in which security is compromised and time to discovery is hampered.

Using the AWS cloud, collaborative researchers can share datasets easily and securely by taking advantage of Identity and Access Management (IAM) policy restrictions for user bucket access as well as S3 bucket policies or Access Control Lists (ACLs). To streamline analysis and ensure data security, many researchers are eliminating the necessity to download datasets entirely by leveraging resources that facilitate moving the analytics to the data source and/or taking advantage of remote API requests to access a shared database or data lake. One way our customers are accomplishing this is to leverage container based Docker technology to provide collaborators with a way to submit algorithms or models for execution on the system hosting the shared datasets:

Docker container images have all of the application’s dependencies bundled together, and therefore provide a high degree of versatility and portability, which is a significant advantage over using other executable-based approaches. In the case of collaborative machine learning projects, each docker container will contain applications, language runtime, packages and libraries, as well as any of the more popular deep learning frameworks commonly used by researchers including: MXNet, Caffe, TensorFlow, and Theano.

A common feature in these frameworks is the ability to leverage a host machine’s Graphical Processing Units (GPUs) for significant acceleration of the matrix and vector operations involved in the machine learning computations. As such, researchers with these objectives can leverage EC2’s new P2 instance types in order to power execution of submitted machine learning models. In addition, GPUs can be mounted directly to containers using the NVIDIA Docker tool and appear at the system level as additional devices. By leveraging Amazon EC2 Container Service and the EC2 Container Registry, collaborators are able to execute analytical solutions submitted to the project repository by their colleagues in a reproducible fashion as well as continue to build on their existing environment.  Researchers can also architect a continuous deployment pipeline to run their docker-enabled workflows.

In conclusion, emerging cloud-enabled consortium initiatives serve as models for the broader research community for how cloud-enabled community science can expedite discoveries in Precision Medicine while also providing a platform where data security and discovery reproducibility is inherent to the project execution.

Mia D. Champion, Ph.D.

 

The Most Viewed AWS Security Blog Posts in 2016

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/the-most-viewed-aws-security-blog-posts-in-2016/

The following 10 posts were the most viewed AWS Security Blog posts that we published during 2016. You can use this list as a guide to catch up on your blog reading or even read a post again that you found particularly useful.

  1. How to Set Up DNS Resolution Between On-Premises Networks and AWS Using AWS Directory Service and Amazon Route 53
  2. How to Control Access to Your Amazon Elasticsearch Service Domain
  3. How to Restrict Amazon S3 Bucket Access to a Specific IAM Role
  4. Announcing AWS Organizations: Centrally Manage Multiple AWS Accounts
  5. How to Configure Rate-Based Blacklisting with AWS WAF and AWS Lambda
  6. How to Use AWS WAF to Block IP Addresses That Generate Bad Requests
  7. How to Record SSH Sessions Established Through a Bastion Host
  8. How to Manage Secrets for Amazon EC2 Container Service–Based Applications by Using Amazon S3 and Docker
  9. Announcing Industry Best Practices for Securing AWS Resources
  10. How to Set Up DNS Resolution Between On-Premises Networks and AWS Using AWS Directory Service and Microsoft Active Directory

The following 10 posts published since the blog’s inception in April 2013 were the most viewed AWS Security Blog posts in 2016.

  1. Writing IAM Policies: How to Grant Access to an Amazon S3 Bucket
  2. Securely Connect to Linux Instances Running in a Private Amazon VPC
  3. A New and Standardized Way to Manage Credentials in the AWS SDKs
  4. Where’s My Secret Access Key?
  5. Enabling Federation to AWS Using Windows Active Directory, ADFS, and SAML 2.0
  6. IAM Policies and Bucket Policies and ACLs! Oh, My! (Controlling Access to S3 Resources)
  7. How to Connect Your On-Premises Active Directory to AWS Using AD Connector
  8. Writing IAM Policies: Grant Access to User-Specific Folders in an Amazon S3 Bucket
  9. How to Help Prepare for DDoS Attacks by Reducing Your Attack Surface
  10. How to Set Up DNS Resolution Between On-Premises Networks and AWS Using AWS Directory Service and Amazon Route 53

Let us know in the comments section below if there is a specific security or compliance topic you would like us to cover on the Security Blog in 2017.

– Craig