This post courtesy of Thiago Morais, AWS Solutions Architect
When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.
Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.
In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.
AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.
As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.
Regional API endpoints
Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.
When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.
For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.
Edge-optimized API endpoint
The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.
Regional API endpoint
For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.
Walkthrough
In this section, you implement the following steps:
Attach the web ACL to the CloudFront distribution.
Test AWS WAF protection.
Create the regional API
For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:
After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.
On the API Gateway console, select the PetStore API and choose Actions, Deploy API.
For Stage name, type prod and add a stage description.
Choose Deploy and the new API stage is created.
Use the following AWS CLI command to update your API from edge-optimized to regional:
{
"description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints",
"createdDate": 1511525626,
"endpointConfiguration": {
"types": [
"REGIONAL"
]
},
"id": "{api-id}",
"name": "PetStore"
}
After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.
Create a CloudFront distribution
To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.
For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:
{api-id}.execute-api.us-east-1.amazonaws.com
After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:
A CloudFront domain URL
An S3 bucket for CloudFront access logs
Output from CloudFormation
Test the CloudFront distribution
To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:
With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.
For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.
For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.
For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.
The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.
To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.
After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.
Attach the AWS WAF web ACL to the CloudFront distribution
With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.
To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.
Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.
Save the changes to your CloudFront distribution and wait for the deployment to finish.
Test AWS WAF protection
To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.
To install Artillery on your machine, run the following command:
$ npm install -g artillery
After the installation completes, you can check if Artillery installed successfully by running the following command:
$ artillery -V
$ 1.6.0-12
As the time of publication, Artillery is on version 1.6.0-12.
One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.
First, use cURL to query your distribution and see the API output:
What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.
After Artillery finishes its execution, try to run the cURL request again and see what happens:
$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>
As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.
Conclusion
In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.
In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.
User authentication is the functionality that every web application shared. We should have perfected that a long time ago, having implemented it so many times. And yet there are so many mistakes made all the time.
Part of the reason for that is that the list of things that can go wrong is long. You can store passwords incorrectly, you can have a vulnerably password reset functionality, you can expose your session to a CSRF attack, your session can be hijacked, etc. So I’ll try to compile a list of best practices regarding user authentication. OWASP top 10 is always something you should read, every year. But that might not be enough.
So, let’s start. I’ll try to be concise, but I’ll include as much of the related pitfalls as I can cover – e.g. what could go wrong with the user session after they login:
Store passwords with bcrypt/scrypt/PBKDF2. No MD5 or SHA, as they are not good for password storing. Long salt (per user) is mandatory (the aforementioned algorithms have it built in). If you don’t and someone gets hold of your database, they’ll be able to extract the passwords of all your users. And then try these passwords on other websites.
Use HTTPS. Period. (Otherwise user credentials can leak through unprotected networks). Force HTTPS if user opens a plain-text version.
Mark cookies as secure. Makes cookie theft harder.
Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
Disallow framing (X-Frame-Options: DENY). Otherwise your website may be included in another website in a hidden iframe and “abused” through javascript.
Logout – let your users logout by deleting all cookies and invalidating the session. This makes usage of shared computers safer (yes, users should ideally use private browsing sessions, but not all of them are that savvy)
Session expiry – don’t have forever-lasting sessions. If the user closes your website, their session should expire after a while. “A while” may still be a big number depending on the service provided. For ajax-heavy website you can have regular ajax-polling that keeps the session alive while the page stays open.
Remember me – implementing “remember me” (on this machine) functionality is actually hard due to the risks of a stolen persistent cookie. Spring-security uses this approach, which I think should be followed if you wish to implement more persistent logins.
Forgotten password flow – the forgotten password flow should rely on sending a one-time (or expiring) link to the user and asking for a new password when it’s opened. 0Auth explain it in this post and Postmark gives some best pracitces. How the link is formed is a separate discussion and there are several approaches. Store a password-reset token in the user profile table and then send it as parameter in the link. Or do not store anything in the database, but send a few params: userId:expiresTimestamp:hmac(userId+expiresTimestamp). That way you have expiring links (rather than one-time links). The HMAC relies on a secret key, so the links can’t be spoofed. It seems there’s no consensus, as the OWASP guide has a bit different approach
One-time login links – this is an option used by Slack, which sends one-time login links instead of asking users for passwords. It relies on the fact that your email is well guarded and you have access to it all the time. If your service is not accessed to often, you can have that approach instead of (rather than in addition to) passwords.
Limit login attempts – brute-force through a web UI should not be possible; therefore you should block login attempts if they become too many. One approach is to just block them based on IP. The other one is to block them based on account attempted. (Spring example here). Which one is better – I don’t know. Both can actually be combined. Instead of fully blocking the attempts, you may add a captcha after, say, the 5th attempt. But don’t add the captcha for the first attempt – it is bad user experience.
Don’t leak information through error messages – you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login report just “Incorrect credentials”. On passwords reset, it may be something like “If your email is registered, you should have received a password reset email”. This is often at odds with usability – people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.
Consider using a 3rd party authentication – OpenID Connect, OAuth by Google/Facebook/Twitter (but be careful with OAuth flaws as well). There’s an associated risk with relying on a 3rd party identity provider, and you still have to manage cookies, logout, etc., but some of the authentication aspects are simplified.
For high-risk or sensitive applications use 2-factor authentication. There’s a caveat with Google Authenticator though – if you lose your phone, you lose your accounts (unless there’s a manual process to restore it). That’s why Authy seems like a good solution for storing 2FA keys.
I’m sure I’m missing something. And you see it’s complicated. Sadly we’re still at the point where the most common functionality – authenticating users – is so tricky and cumbersome, that you almost always get at least some of it wrong.
Applying technology to healthcare data has the potential to produce many exciting and important outcomes. The analysis produced from healthcare data can empower clinicians to improve the health of individuals and populations by enabling them to make better decisions that enhance the care they provide.
The Observational Health Data Sciences and Informatics (OHDSI, pronounced “Odyssey”) program and community is working toward this goal by producing data standards and open-source solutions to store and analyze observational health data. Using the OHDSI tools, you can visualize the health of your entire population. You can build cohorts of patients, analyze incidence rates for various conditions, and estimate the effect of treatments on patients with certain conditions. You can also model health outcome predictions using machine learning algorithms.
One of the challenges often faced when working with big data tools is the expense of the infrastructure required to run them. Another challenge is the learning curve to implement and begin using these tools. Amazon Web Services has enabled us to address many of the classic IT challenges by making enterprise class infrastructure and technology available in an affordable, elastic, and automated way. This blog post demonstrates how to combine some of the OHDSI projects (Atlas, Achilles, WebAPI, and the OMOP Common Data Model) with AWS technologies. By doing so, you can quickly and inexpensively implement a health data science and informatics environment.
Shown following is just one example of the population health analysis that is possible with the OHDSI tools. This visualization shows the prevalence of various drugs within the given population of people. This information helps researchers and clinicians discover trends and make better informed decisions about patient health.
OHDSI application architecture on AWS
Before deploying an application on AWS that transmits, processes, or stores protected health information (PHI) or personally identifiable information (PII), address your organization’s compliance concerns. Make sure that you have worked with your internal compliance and legal team to ensure compliance with the laws and regulations that govern your organization. To understand how you can use AWS services as a part of your overall compliance program, see the AWS HIPAA Compliance whitepaper. With that said, we paid careful attention to the HIPAA control set during the design of this solution.
This blog post presents a complete OHDSI application environment, including a data warehouse with sample data. It has the following features:
Following, you can see a block diagram of how the OHDSI tools map to the services provided by AWS.
Atlas is the web application that researchers interact with to perform analysis. Atlas interacts with the underlying databases through a web services application named WebAPI. In this example, both Atlas and WebAPI are deployed and managed by AWS Elastic Beanstalk. Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications. Simply upload the Atlas and WebAPI code and Elastic Beanstalk automatically handles the deployment. It covers everything from capacity provisioning, load balancing, autoscaling, and high availability, to application health monitoring. Using a feature of Elastic Beanstalk called ebextensions, the Atlas and WebAPI servers are customized to use an encrypted storage volume for the middleware application logs.
Atlas stores the state of the various patient cohorts that are analyzed in a dedicated database separate from your observational health data. This database is provided by Amazon Aurora with PostgreSQL compatibility.
Amazon Aurora is a relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. It is configured for high availability and uses encryption at rest for the database and backups, and encryption in flight for the JDBC connections.
All of your observational health data is stored inside the OHDSI Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). This model also stores useful vocabulary tables that help to translate values from various data sources (like EHR systems and claims data).
The OMOP CDM schema is deployed onto Amazon Redshift. Amazon Redshift is a fast, fully managed data warehouse that allows you to run complex analytic queries against petabytes of structured data. It uses using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. You can also resize an Amazon Redshift cluster as your requirements for it change.
The solution in this blog post automatically loads de-identified sample data of 1,000 people from the CMS 2008–2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF). The data has helpful formatting from LTS Computing LLC. Vocabulary data from the OHDSI Athena project is also loaded into the OMOP CDM, and a results set is computed by OHDSI Achilles.
Following is a detailed technical diagram showing the configuration of the architecture to be deployed.
Deploying OHDSI on AWS
Everything just described is automatically deployed by using an AWS CloudFormation template. Using this template, you can quickly get started with the OHDSI project. The CloudFormation templates for this deployment as well as all of the supporting scripts and source code can be found in the AWS Labs GitHub repo.
From your AWS account, open the CloudFormation Management Console and choose Create Stack. From there, copy and paste the following URL in the Specify an Amazon S3 template URL box, and choose Next.
On the next screen, you provide a Stack Name (this can be anything you like) and a few other parameters for your OHDSI environment.
You use the DatabasePassword parameter to set the password for the master user account of the Amazon Redshift and Aurora databases.
You use the EBEndpoint name to generate a unique URL for Atlas to access the OHDSI environment. It is http://EBEndpoint.AWS-Region.elasticbeanstalk.com, where EBEndpoint.AWS-Region indicates the Elastic Beanstalk endpoint and AWS Region. You can configure this URL through Elastic Beanstalk if you want to change it in the future.
You use the KPair option to choose one of your existing Amazon EC2 key pairs to use with the instances that Elastic Beanstalk deploys. By doing this, you can gain administrative access to these instances in the future if you need to. If you don’t already have an Amazon EC2 key pair, you can generate one for free. You do this by going to the Key Pairs section of the EC2 console and choosing Generate Key Pair.
Finally, you use the UserIPRange parameter to specify a CIDR IP address range from which to access your OHDSI environment. By default, your OHDSI environment is accessible over the public internet. Use UserIPRange to limit access over the Internet to a single IP address or a range of IP addresses that represent users you want to have access. Through additional configuration, you can also make your OHDSI environment completely private and accessible only through a VPN or AWS Direct Connect private circuit.
When you’ve provided all Parameters, choose Next.
On the next screen, you can provide some other optional information like tags at your discretion, or just choose Next.
On the next screen, you can review what will be deployed. At the bottom of the screen, there is a check box for you to acknowledge that AWS CloudFormation might create IAM resources with custom names. This is correct; the template being deployed creates four custom roles that give permission for the AWS services involved to communicate with each other. Details of these permissions are inside the CloudFormation template referenced in the URL given in the first step. Check the box acknowledging this and choose Next.
You can watch as CloudFormation builds out your OHDSI architecture. A CloudFormation deployment is called a stack. The parent stack creates two child stacks, one containing the VPC and IAM roles and another created by Elastic Beanstalk with the Atlas and WebAPI servers. When all three stacks have reached the green CREATE_COMPLETE status, as shown in the screenshot following, then the OHDSI architecture has been deployed.
There is still some work going on behind the scenes, though. To watch the progress, browse to the Amazon Redshift section of your AWS Management Console and choose the Amazon Redshift cluster that was created for your OHDSI architecture. After you do so, you can observe the Loads and Queries tabs.
First, on the Loads tab, you can see the CMS De-SynPUF sample data and Athena vocabulary data being loaded into the OMOP Common Data Model. After you see the VOCABULARY table reach the COMPLETED status (as shown following), all of the sample and vocabulary data has been loaded.
After the data loads, the Achilles computation starts. On the Queries tab, you can watch Achilles running queries against your database to build out the Results schema. Achilles runs a large number of queries, and the entire process can take quite some time (about 20 minutes for the sample data we’ve loaded). Eventually, no new queries show up in the Queries tab, which shows that the Achilles computation is completed. The entire process from the time you executed the CloudFormation template until the Achilles computation is completed usually takes about an hour and 15 minutes.
At this point, you can browse to the Elastic Beanstalk section of the AWS Management Console. There, you can choose the OHDSI Application and Environment (green box) that was deployed by the CloudFormation template. At the top of the dashboard, as shown following, you see a link to a URL. This URL matches the name you provided in the EBEndpoint parameter of the CloudFormation template. Choose this URL, and you can start using Atlas to explore the CMS DE-SynPUF sample data!
Cost of deploying this environment
It used to be common to see healthcare data analytics environments deployed in an on-premises data center with expensive data warehouse appliances and virtualized environments. The cloud era has democratized the availability of the infrastructure required to do this type of data analysis, so that now it is within reach of even small organizations. This environment can expand to analyze petabyte-scale health data, and you only pay for what you need. See an estimated breakdown of the monthly cost components for this environment as deployed on the AWS Solution Calculator.
It’s also worth noting that this environment does not have to be run all of the time. If you are only performing analyses periodically, you can terminate the environment when you are finished and restore it from the database backups when you want to continue working. This would reduce the cost of operation even further.
Summary
Now that you have a fully functional OHDSI environment with sample data, you can use this to explore and learn the toolset and its capabilities. After learning with the sample data, you can begin gaining insights by analyzing your own organization’s health data. You can do this using an extract, transform, load (ETL) process from one or more of your health data sources.
James Wiggins is a senior healthcare solutions architect at AWS. He is passionate about using technology to help organizations positively impact world health. He also loves spending time with his wife and three children.
The Twelve-Factor App methodology is twelve best practices for building modern, cloud-native applications. With guidance on things like configuration, deployment, runtime, and multiple service communication, the Twelve-Factor model prescribes best practices that apply to a diverse number of use cases, from web applications and APIs to data processing applications. Although serverless computing and AWS Lambda have changed how application development is done, the Twelve-Factor best practices remain relevant and applicable in a serverless world.
In this post, I directly apply and compare the Twelve-Factor methodology to serverless application development with Lambda and Amazon API Gateway.
The Twelve Factors
As you’ll see, many of these factors are not only directly applicable to serverless applications, but in fact a default mechanism or capability of the AWS serverless platform. Other factors don’t fit, and I talk about how these factors may not apply at all in a serverless approach.
A general software development best practice is to have all of your code in revision control. This is no different with serverless applications.
For a single serverless application, your code should be stored in a single repository in which a single deployable artifact is generated and from which it is deployed. This single code base should also represent the code used in all of your application environments (development, staging, production, etc.). What might be different for serverless applications is the bounds for what constitutes a “single application.”
Here are two guidelines to help you understand the scope of an application:
If events are shared (such as a common Amazon API Gateway API), then the Lambda function code for those events should be put in the same repository.
Otherwise, break functions along event sources into their own repositories.
Following these two guidelines helps you keep your serverless applications scoped to a single purpose and help prevent complexity in your code base.
Code that needs to be used by multiple functions should be packaged into its own library and included inside your deployment package. Going back to the previous factor on codebase, if you find that you need to often include special processing or business logic, the best solution may be to try to create a purposeful library yourself. Every language that Lambda supports has a model for dependencies/libraries, which you can use:
Both Lambda and API Gateway allow you to set configuration information, using the environment in which each service runs.
In Lambda, these are called environment variables and are key-value pairs that can be set at deployment or when updating the function configuration. Lambda then makes these key-value pairs available to your Lambda function code using standard APIs supported by the language, like process.env for Node.js functions. For more information, see Programming Model, which contains examples for each supported language.
Lambda also allows you to encrypt these key-value pairs using KMS, such that they can be used to store secrets such as API keys or passwords for databases. You can also use them to help define application environment specifics, such as differences between testing or production environments where you might have unique databases or endpoints with which your Lambda function needs to interface. You could also use these for setting A/B testing flags or to enable or disable certain function logic.
For API Gateway, these configuration variables are called stage variables. Like environment variables in Lambda, these are key-value pairs that are available for API Gateway to consume or pass to your API’s backend service. Stage variables can be useful to send requests to different backend environments based on the URL from which your API is accessed. For example, a single configuration could support both beta.yourapi.com vs. prod.yourapi.com. You could also use stage variables to pass information to a Lambda function that causes it to perform different logic.
Because Lambda doesn’t allow you to run another service as part of your function execution, this factor is basically the default model for Lambda. Typically, you reference any database or data store as an external resource via HTTP endpoint or DNS name. These connection strings are ideally passed in via the configuration information, as previously covered.
The separation of build, release, and run stages follows the development best practices of continuous integration and delivery. AWS recommends that you have a CI &CD process no matter what type of application you are building. For serverless applications, this is no different. For more information, see the Building CI/CD Pipelines for Serverless Applications (SRV302) re:Invent 2017 session.
An example minimal pipeline (from presentation linked above)
This is inherent in how Lambda is designed so there is nothing more to consider. Lambda functions should always be treated as being stateless, despite the ability to potentially store some information locally between execution environment re-use. This is because there is no guaranteed affinity to any execution environment, and the potential for an execution environment to go away between invocations exists. You should always store any stateful information in a database, cache, or separate data store via a backing service.
This factor also does not apply to Lambda, as execution environments do not expose any direct networking to your functions. Instead of a port, Lambda functions are invoked via one or more triggering services or AWS APIs for Lambda. There are currently three different invocation models:
Lambda was built with massive concurrency and scale in mind. A recent post on this blog, titled Managing AWS Lambda Function Concurrency explained that for a serverless application, “the unit of scale is a concurrent execution” and that these are consumed by your functions.
Lambda automatically scales to meet the demands of invocations sent at your function. This is in contrast to a traditional compute model using physical hosts, virtual machines, or containers that you self-manage. With Lambda, you do not need to manage overall capacity or apply scaling policies.
Each AWS account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed. As of May 2017, the default limit is 1000 concurrent executions per AWS Region. You can also set and manage a reserved concurrency limit, which provides a limit to how much concurrency a function can have. It also reserves concurrency capacity for a given function out of the total available for an account.
Shutdown doesn’t apply to Lambda because Lambda is intrinsically event-driven. Invocations are tied directly to incoming events or triggers.
However, speed at startup does matter. Initial function execution latency, or what is called “cold starts”, can occur when there isn’t a “warmed” compute resource ready to execute against your application invocations. In the AWS Lambda Execution Model topic, it explains that:
“It takes time to set up an execution context and do the necessary “bootstrapping”, which adds some latency each time the Lambda function is invoked. You typically see this latency when a Lambda function is invoked for the first time or after it has been updated because AWS Lambda tries to reuse the execution context for subsequent invocations of the Lambda function.”
The Best Practices topic covers a number of issues around how to think about performance of your functions. This includes where to place certain logic, how to re-use execution environments, and how by configuring your function for more memory you also get a proportional increase in CPU available to your function. With AWS X-Ray, you can gather some insight as to what your function is doing during an execution and make adjustments accordingly.
Along with continuous integration and delivery, the practice of having independent application environments is a solid best practice no matter the development approach. Being able to safely test applications in a non-production environment is key to development success. Products within the AWS Serverless Platform do not charge for idle time, which greatly reduces the cost of running multiple environments. You can also use the AWS Serverless Application Model (AWS SAM), to manage the configuration of your separate environments.
SAM allows you to model your serverless applications in greatly simplified AWS CloudFormation syntax. With SAM, you can use CloudFormation’s capabilities—such as Parameters and Mappings—to build dynamic templates. Along with Lambda’s environment variables and API Gateway’s stage variables, those templates give you the ability to deploy multiple environments from a single template, such as testing, staging, and production. Whenever the non-production environments are not in use, your costs for Lambda and API Gateway would be zero. For more information, see the AWS Lambda Applications with AWS Serverless Application Model 2017 AWS online tech talk.
In a typical non-serverless application environment, you might be concerned with log files, logging daemons, and centralization of the data represented in them. Thankfully, this is not a concern for serverless applications, as most of the services in the platform handle this for you.
With Lambda, you can just output logs to the console via the native capabilities of the language in which your function is written. For more information about Go, see Logging (Go). Similar documentation pages exist for other languages. Those messages output by your code are captured and centralized in Amazon CloudWatch Logs. For more information, see Accessing Amazon CloudWatch Logs for AWS Lambda.
API Gateway provides two different methods for getting log information:
Execution logs Includes errors or execution traces (such as request or response parameter values or payloads), data used by custom authorizers, whether API keys are required, whether usage plans are enabled, and so on.
Access logs Provide the ability to log who has accessed your API and how the caller accessed the API. You can even customize the format of these logs as desired.
Capturing logs and being able to search and view them is one thing, but CloudWatch Logs also gives you the ability to treat a log message as an event and take action on them via subscription filters in the service. With subscription filters, you could send a log message matching a certain pattern to a Lambda function, and have it take action based on that. Say, for example, that you want to respond to certain error messages or usage patterns that violate certain rules. You could do that with CloudWatch Logs, subscription filters, and Lambda. Another important capability of CloudWatch Logs is the ability to “pivot” log information into a metric in CloudWatch. With this, you could take a data point from a log entry, create a metric, and then an alarm on a metric to show a breached threshold.
This is another factor that doesn’t directly apply to Lambda due to its design. Typically, you would have your functions scoped down to single or limited use cases and have individual functions for different components of your application. Even if they share a common invoking resource, such as an API Gateway endpoint and stage, you would still separate the individual API resources and actions to their own Lambda functions.
The Seven-and-a-Half–Factor model and you
As we’ve seen, Twelve-Factor application design can still be applied to serverless applications, taking into account some small differences! The following diagram highlights the factors and how applicable or not they are to serverless applications:
NOTE: Disposability only half applies, as discussed in this post.
Conclusion
If you’ve been building applications for cloud infrastructure over the past few years, the Twelve-Factor methodology should seem familiar and straight-forward. If you are new to this space, however, you should know that the general best practices and default working patterns for serverless applications overlap heavily with what I’ve discussed here.
It shouldn’t require much work to adhere rather closely to the Twelve-Factor model. When you’re building serverless applications, following the applicable points listed here helps you simplify development and any operational work involved (though already minimized by services such as Lambda and API Gateway). The Twelve-Factor methodology also isn’t all-or-nothing. You can apply only the practices that work best for you and your applications, and still benefit.
The following list includes the ten most downloaded AWS security and compliance documents in 2017. Using this list, you can learn about what other AWS customers found most interesting about security and compliance last year.
AWS Security Best Practices – This guide is intended for customers who are designing the security infrastructure and configuration for applications running on AWS. The guide provides security best practices that will help you define your Information Security Management System (ISMS) and build a set of security policies and processes for your organization so that you can protect your data and assets in the AWS Cloud.
AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
Service Organization Controls (SOC) 3 Report – This publicly available report describes internal AWS security controls, availability, processing integrity, confidentiality, and privacy.
Introduction to AWS Security –This document provides an introduction to AWS’s approach to security, including the controls in the AWS environment, and some of the products and features that AWS makes available to customers to meet your security objectives.
AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities – AWS WAF is a web application firewall that helps you protect your websites and web applications against various attack vectors at the HTTP protocol level. This whitepaper outlines how you can use AWS WAF to mitigate the application vulnerabilities that are defined in the Open Web Application Security Project (OWASP) Top 10 list of most common categories of application security flaws.
Introduction to Auditing the Use of AWS – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.
AWS Security and Compliance: Quick Reference Guide – By using AWS, you inherit the many security controls that we operate, thus reducing the number of security controls that you need to maintain. Your own compliance and certification programs are strengthened while at the same time lowering your cost to maintain and run your specific security assurance requirements. Learn more in this quick reference guide.
This post was written by Andy Powell, Partner Solutions Architect.
For developers, tracing and instrumenting code is one of the most valuable tools when debugging code. When you are developing locally, you can use local debugging and profiling tools, but when you deploy an application to the cloud, the task is more challenging. In this blog post, we will look at a new way to instrument your application using AWS X-Ray without adding tracing code to your business logic.
AWS X-Ray
Released to the public earlier this year, AWS X-Ray provides a mechanism for developers to instrument and trace their code while running on AWS. AWS X-Ray enables developers to analyze and debug distributed applications through the entire AWS stack. Developers and operations personnel can also follow the flow of a request through the entire AWS infrastructure. X-Ray works well in a monolithic or microservices model. Either way, developers can get a complete view of their application’s performance and behavior.
X-Ray provides two key mechanisms for analyzing an application:
The service map.
The trace view.
The service map, shown here, provides a high-level view of the services consumed by an application and their relative health:
Application developers often look for a more detailed view of their application to help them answer questions like “Which functions are my bottlenecks?” and “Where is the most latency in the application?” The trace view allows you to see the flow of an application through service calls. It shows you latency between services and the exact execution time of each X-Ray segment.
Similar to other logging and metrics frameworks, X-Ray requires the developer to insert specific code throughout the application. This might cause issues because the logging and metrics code has the potential to increase the complexity of the application code. Increased complexity leads, in turn, to increased maintenance and testing costs. Ideally, developers should add X-Ray tracing to an application in a non-invasive manner, one that does not affect the underlying business logic.
Aspect-oriented programming and the Spring Framework
Aspect-oriented programming (AOP) is a mechanism by which code runs either before, after, or around a target function. The code that runs outside the target code is called an aspect. An aspect provides the ability to perform actions like logging, transaction management, and method retries. The goal is for the aspect to provide these capabilities without affecting the target code. Pointcuts define where these aspects should act in the code. AOP allows developers to leverage powerful functionality without affecting their business logic.
An aspect-oriented approach is a perfect way to implement AWS X-Ray because it keeps the underlying code clean and provides non-invasive, reusable tracing logic.
Simply creating an aspect to invoke X-Ray is not enough. We also have to create pointcuts that tell the aspect where to act. These pointcuts will define which methods we wrap with tracing logic. After they are defined, the aspect sends the entire call stack to X-Ray for visualization.
The Spring Framework is a common application framework for the development of Java software. It provides an extensive programming and config method for modern Java applications.
Spring provides facilities for a range of application functions, including web applications, messaging applications, and streaming data applications. Spring applications are capable of being cloud-native from the onset.
AOP is one of the core components of the Spring Framework. Spring’s implementation of AOP “weaves” application code at runtime. Load-time weaving is also available, but it requires extra configuration at compile or application runtime to work properly. The Spring runtime weaving does not require any special compilation or agents.
AWS X-Ray Spring extensions
Starting with version 1.3.0, the AWS X-Ray SDK lets you use AOP in the Spring Framework to instrument code with no change to the application’s business logic. This means that there is now a non-invasive way to instrument your applications running remotely in AWS.
To include the extension in the code, first add the dependency to the application. If you are using Maven, you add the dependency this way:
After you’ve included this in your application, there are a couple of steps that need configuration before tracing is enabled. First, classes must either be annotated with the @XRayEnabled annotation, or implement the XRayTraced interface. This tells the AOP system to wrap the functions of the affected class for X-Ray instrumentation.
Second, you need an interceptor to actually wrap the code. This involves extending an abstract class, AbstractXRayInterceptor, to activate X-Ray tracing in the application. The AbstractXRayInterceptor contains methods that must be overridden:
generateMetadata – This function allows customization of the metadata attached to the current function’s trace. By default, the class name of the executing function is recorded in the metadata. You can add more data if you need additional insights.
xrayEnabledClasses – This function is empty, and should remain so. It serves as the host for a pointcut instructing the interceptor about which methods to wrap. The developer should define the pointcut. You can specify which classes that are annotated with XRayEnabled you want traced. A pointcut statement of @Pointcut(“@within(com.amazonaws.xray.spring.aop.XRayEnabled) && bean(*Controller)”) tells the interceptor to wrap all controller beans annotated with the @XRayEnabled annotation.
Here is a sample implementation of the AbstractXRayInterceptor :
Here is an example of a Service class that will be instrumented by X-Ray:
@Service
@XRayEnabled
public class MyServiceImpl implements MyService {
private final MyEntityRepository myEntityRepository;
@Autowired
public MyServiceImpl(MyEntityRepository myEntityRepository) {
this.myEntityRepository = myEntityRepository;
}
@Transactional(readOnly = true)
public List<MyEntity> getMyEntities(){
try(Stream<MyEntity> entityStream = this.myEntityRepository.streamAll()){
return entityStream.sorted().collect(Collectors.toList());
}
}
}
By default, the AbstractXRayInterceptor instruments around all Spring data repository instances.
After it’s configured, the Spring application runs as normal. The X-Ray interceptor picks up annotated classes automatically and builds a trace of the call stack. You can view this call stack in the Traces section of the X-Ray console.
Looking at the trace, you will see the complete call stack of the application, from the controller down through the service calls. Stack traces are arranged in a hierarchy in the same way as typical X-Ray traces. Any exceptions that occur in the call stack are added to the trace by the interceptor automatically. This gives you a complete view of the application’s functionality, performance, and error states. X-Ray provides the convenience of a managed service of the tracing and reporting engine. Without it, the developer would have to manage the tracing and reporting infrastructure.
Conclusion
On its own, X-Ray provides powerful functionality to trace your applications running on AWS. When you combine the service with the ease of use of AOP and the Spring Framework, it is a natural fit.
The demo app code is available for download. Download it today and start integrating deep tracing into your Spring applications.
This post was written by James Bowman, Software Development Engineer, AWS X-Ray
AWS X-Ray helps developers analyze and debug distributed applications and underlying services in production. You can identify and analyze root-causes of performance issues and errors, understand customer impact, and extract statistical aggregations (such as histograms) for optimization.
In this blog post, I will provide a step-by-step walkthrough for enabling X-Ray tracing in the Go programming language. You can use these steps to add X-Ray tracing to any distributed application.
Revel: A web framework for the Go language
This section will assist you with designing a guestbook application. Skip to “Instrumenting with AWS X-Ray” section below if you already have a Go language application.
Revel is a web framework for the Go language. It facilitates the rapid development of web applications by providing a predefined framework for controllers, views, routes, filters, and more.
To get started with Revel, run revel new github.com/jamesdbowman/guestbook. A project base is then copied to $GOPATH/src/github.com/jamesdbowman/guestbook.
A basic guestbook application can consist of just two routes: one to sign the guestbook and another to list all entries. Let’s set up these routes by adding a Book controller, which can be routed to by modifying ./conf/routes.
./app/controllers/book.go:
package controllers
import (
"math/rand"
"time"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/dynamodb"
"github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
"github.com/revel/revel"
)
const TABLE_NAME = "guestbook"
const SUCCESS = "Success.\n"
const DAY = 86400
var letters = []rune("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
func init() {
rand.Seed(time.Now().UnixNano())
}
// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letters[rand.Intn(len(letters))]
}
return string(b)
}
// Book controls interactions with the guestbook.
type Book struct {
*revel.Controller
ddbClient *dynamodb.DynamoDB
}
// Signature represents a user's signature.
type Signature struct {
Message string
Epoch int64
ID string
}
// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
if c.ddbClient == nil {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String(endpoints.UsWest2RegionID),
}))
c.ddbClient = dynamodb.New(sess)
}
return c.ddbClient
}
// Sign allows users to sign the book.
// The message is to be passed as application/json typed content, listed under the "message" top level key.
func (c Book) Sign() revel.Result {
var s Signature
err := c.Params.BindJSON(&s)
if err != nil {
return c.RenderError(err)
}
now := time.Now()
s.Epoch = now.Unix()
s.ID = randString(20)
item, err := dynamodbattribute.MarshalMap(s)
if err != nil {
return c.RenderError(err)
}
putItemInput := &dynamodb.PutItemInput{
TableName: aws.String(TABLE_NAME),
Item: item,
}
_, err = c.ddb().PutItem(putItemInput)
if err != nil {
return c.RenderError(err)
}
return c.RenderText(SUCCESS)
}
// List allows users to list all signatures in the book.
func (c Book) List() revel.Result {
scanInput := &dynamodb.ScanInput{
TableName: aws.String(TABLE_NAME),
Limit: aws.Int64(100),
}
res, err := c.ddb().Scan(scanInput)
if err != nil {
return c.RenderError(err)
}
messages := make([]string, 0)
for _, v := range res.Items {
messages = append(messages, *(v["Message"].S))
}
return c.RenderJSON(messages)
}
./conf/routes: POST /sign Book.Sign GET /list Book.List
Creating the resources and testing
For the purposes of this blog post, the application will be run and tested locally. We will store and retrieve messages from an Amazon DynamoDB table. Use the following AWS CLI command to create the guestbook table:
Now, let’s test our sign and list routes. If everything is working correctly, the following result appears:
$ curl -d '{"message":"Hello from cURL!"}' -H "Content-Type: application/json" http://localhost:9000/book/sign
Success.
$ curl http://localhost:9000/book/list
[
"Hello from cURL!"
]%
Integrating with AWS X-Ray
Download and run the AWS X-Ray daemon
The AWS SDKs emit trace segments over UDP on port 2000. (This port can be configured.) In order for the trace segments to make it to the X-Ray service, the daemon must listen on this port and batch the segments in calls to the PutTraceSegments API. For information about downloading and running the X-Ray daemon, see the AWS X-Ray Developer Guide.
Installing the AWS X-Ray SDK for Go
To download the SDK from GitHub, run go get -u github.com/aws/aws-xray-sdk-go/... The SDK will appear in the $GOPATH.
Enabling the incoming request filter
The first step to instrumenting an application with AWS X-Ray is to enable the generation of trace segments on incoming requests. The SDK conveniently provides an implementation of http.Handler which does exactly that. To ensure incoming web requests travel through this handler, we can modify app/init.go, adding a custom function to be run on application start.
The application will now emit a segment for each incoming web request. The service graph appears:
You can customize the name of the segment to make it more descriptive by providing an alternate implementation of SegmentNamer to xray.Handler. For example, you can use xray.NewDynamicSegmentNamer(fallback, pattern) in place of the fixed namer. This namer will use the host name from the incoming web request (if it matches pattern) as the segment name. This is often useful when you are trying to separate different instances of the same application.
In addition, HTTP-centric information such as method and URL is collected in the segment’s http subsection:
To provide detailed performance metrics for distributed applications, the AWS X-Ray SDK needs to measure the time it takes to make outbound requests. Trace context is passed to downstream services using the X-Amzn-Trace-Id header. To draw a detailed and accurate representation of a distributed application, outbound call instrumentation is required.
AWS SDK calls
The AWS X-Ray SDK for Go provides a one-line AWS client wrapper that enables the collection of detailed per-call metrics for any AWS client. We can modify the DynamoDB client instantiation to include this line:
// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
if c.ddbClient == nil {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String(endpoints.UsWest2RegionID),
}))
c.ddbClient = dynamodb.New(sess)
xray.AWS(c.ddbClient.Client) // add subsegment-generating X-Ray handlers to this client
}
return c.ddbClient
}
We also need to ensure that the segment generated by our xray.Handler is passed to these AWS calls so that the X-Ray SDK knows to which segment these generated subsegments belong. In Go, the context.Context object is passed throughout the call path to achieve this goal. (In most other languages, some variant of ThreadLocal is used.) AWS clients provide a *WithContext method variant for each AWS operation, which we need to switch to:
We now see much more detail in the Timeline view of the trace for the sign and list operations:
We can use this detail to help diagnose throttling on our DynamoDB table. In the following screenshot, the purple in the DynamoDB service graph node indicates that our table is underprovisioned. The red in the GuestbookApp node indicates that the application is throwing faults due to this throttling.
HTTP calls
Although the guestbook application does not make any non-AWS outbound HTTP calls in its current state, there is a similar one-liner to wrap HTTP clients that make outbound requests. xray.Client(c *http.Client) wraps an existing http.Client (or nil if you want to use a default HTTP client). For example:
X-Ray can also assist in measuring the performance of local compute operations. To see this in action, let’s create a custom subsegment inside the randString method:
// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(ctx context.Context, n int) string {
xray.Capture(ctx, "randString", func(innerCtx context.Context) {
b := make([]rune, n)
for i := range b {
b[i] = letters[rand.Intn(len(letters))]
}
s := string(b)
})
return s
}
// we'll also need to change the callsite
s.ID = randString(c.Request.Context(), 20)
Summary
By now, you are an expert on how to instrument X-Ray for your Go applications. Instrumenting X-Ray with your applications is an easy way to analyze and debug performance issues and understand customer impact. Please feel free to give any feedback or comments below.
AWS X-Ray helps developers analyze and debug production applications built using microservices or serverless architectures and quantify customer impact. With X-Ray, you can understand how your application and its underlying services are performing and identify and troubleshoot the root cause of performance issues and errors. You can use these insights to identify issues and opportunities for optimization.
In this blog post, I will show you how you can use Amazon CloudWatch and Amazon SNS to get notified when X-Ray detects high latency, errors, and faults in your application. Specifically, I will show you how to use this sample app to get notified through an email or SMS message when your end users observe high latencies or server-side errors when they use your application. You can customize the alarms and events by updating the sample app code.
Sample App Overview
The sample app uses the X-Ray GetServiceGraph API to get the following information:
Aggregated response time.
Requests that failed with 4xx status code (errors).
429 status code (throttle).
5xx status code (faults).
Overview of sample app architecture
Getting started
The sample app uses AWS CloudFormation to deploy the required resources. To install the sample app:
Run git clone to get the sample app.
Update the JSON file in the Setup folder with threshold limits and notification details.
Run the install.py script to install the sample app.
For more information about the installation steps, see the readme file on GitHub.
You can update the app configuration to include your phone number or email to get notified when your application in X-Ray breaches the latency, error, and fault limits you set in the configuration. If you prefer to not provide your phone number and email, then you can use the CloudWatch alarm deployed by the sample app to monitor your application in X-Ray.
The sample app deploys resources with the sample app namespace you provided during setup. This enables you to have multiple sample apps in the same region.
CloudWatch rules
The sample app uses two CloudWatch rules:
SCHEDULEDLAMBDAFOR-sample_app_name to trigger at regular intervals the AWS Lambda function that queries the GetServiceGraph API.
XRAYALERTSFOR-sample_app_name to look for published CloudWatch events that match the pattern defined in this rule.
CloudWatch rules created for the sample app
CloudWatch alarms
If you did not provide your phone number or email in the JSON file, the sample app uses a CloudWatch alarm named XRayCloudWatchAlarm-sample_app_name in combination with the CloudWatch event that you can use for monitoring.
CloudWatch alarm created for the sample app
Amazon SNS messages
The sample app creates two SNS topics:
sample_app_name-cloudwatcheventsnstopic to send out an SMS message when the CloudWatch event matches a pattern published from the Lambda function.
sample_app_name-cloudwatchalarmsnstopic to send out an email message when the CloudWatch alarm goes into an ALARM state.
Amazon SNS created for the sample app
Getting notifications
The CloudWatch event looks for the following matching pattern:
The event then invokes an SNS topic that sends out an SMS message.
SMS that is sent when CloudWatch Event invokes Amazon SNS topic
The CloudWatch alarm looks for the TriggeredRules metric that is published whenever the CloudWatch event matches the event pattern. It goes into the ALARM state whenever TriggeredRules > 0 for the specified evaluation period and invokes an SNS topic that sends an email message.
Email that is sent when CloudWatch Alarm goes to ALARM state
Stopping notifications
If you provided your phone number or email address, but would like to stop getting notified, change the SUBSCRIBE_TO_EMAIL_SMS environment variable in the Lambda function to No. Then, go to the Amazon SNS console and delete the subscriptions. You can still monitor your application for elevated levels of latency, errors, and faults by using the CloudWatch console.
Change environment variable in Lambda
Delete subscriptions to stop getting notified
Uninstalling the sample app
To uninstall the sample app, run the uninstall.py script in the Setup folder.
Extending the sample app
The sample app notifes you when when X-Ray detects high latency, errors, and faults in your application. You can extend it to provide more value for your use cases (for example, to perform an action on a resource when the state of a CloudWatch alarm changes).
To summarize, after this set up you will be able to get notified through Amazon SNS when X-Ray detects high latency, errors and faults in your application.
I hope you found this information about setting up alarms and alerts for your application in AWS X-Ray helpful. Feel free to leave questions or other feedback in the comments. Feel free to learn more about AWS X-Ray, Amazon SNS and Amazon CloudWatch
About the Author
Bharath Kumar is a Sr.Product Manager with AWS X-Ray. He has developed and launched mobile games, web applications on microservices and serverless architecture.
Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content to end users through a worldwide network of edge locations. CloudFront provides a number of benefits and capabilities that can help you secure your applications and content while meeting compliance requirements. For example, you can configure CloudFront to help enforce secure, end-to-end connections using HTTPS SSL/TLS encryption. You also can take advantage of CloudFront integration with AWS Shield for DDoS protection and with AWS WAF (a web application firewall) for protection against application-layer attacks, such as SQL injection and cross-site scripting.
Now, CloudFront field-level encryption helps secure sensitive data such as a customer phone numbers by adding another security layer to CloudFront HTTPS. Using this functionality, you can help ensure that sensitive information in a POST request is encrypted at CloudFront edge locations. This information remains encrypted as it flows to and beyond your origin servers that terminate HTTPS connections with CloudFront and throughout the application environment. In this blog post, we demonstrate how you can enhance the security of sensitive data by using CloudFront field-level encryption.
Note: This post assumes that you understand concepts and services such as content delivery networks, HTTP forms, public-key cryptography, CloudFront, AWS Lambda, and the AWS CLI. If necessary, you should familiarize yourself with these concepts and review the solution overview in the next section before proceeding with the deployment of this post’s solution.
How field-level encryption works
Many web applications collect and store data from users as those users interact with the applications. For example, a travel-booking website may ask for your passport number and less sensitive data such as your food preferences. This data is transmitted to web servers and also might travel among a number of services to perform tasks. However, this also means that your sensitive information may need to be accessed by only a small subset of these services (most other services do not need to access your data).
User data is often stored in a database for retrieval at a later time. One approach to protecting stored sensitive data is to configure and code each service to protect that sensitive data. For example, you can develop safeguards in logging functionality to ensure sensitive data is masked or removed. However, this can add complexity to your code base and limit performance.
Field-level encryption addresses this problem by ensuring sensitive data is encrypted at CloudFront edge locations. Sensitive data fields in HTTPS form POSTs are automatically encrypted with a user-provided public RSA key. After the data is encrypted, other systems in your architecture see only ciphertext. If this ciphertext unintentionally becomes externally available, the data is cryptographically protected and only designated systems with access to the private RSA key can decrypt the sensitive data.
It is critical to secure private RSA key material to prevent unauthorized access to the protected data. Management of cryptographic key material is a larger topic that is out of scope for this blog post, but should be carefully considered when implementing encryption in your applications. For example, in this blog post we store private key material as a secure string in the Amazon EC2 Systems Manager Parameter Store. The Parameter Store provides a centralized location for managing your configuration data such as plaintext data (such as database strings) or secrets (such as passwords) that are encrypted using AWS Key Management Service (AWS KMS). You may have an existing key management system in place that you can use, or you can use AWS CloudHSM. CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys in the AWS Cloud.
To illustrate field-level encryption, let’s look at a simple form submission where Name and Phone values are sent to a web server using an HTTP POST. A typical form POST would contain data such as the following.
POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length:60
Name=Jane+Doe&Phone=404-555-0150
Instead of taking this typical approach, field-level encryption converts this data similar to the following.
POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 1713
Name=Jane+Doe&Phone=AYABeHxZ0ZqWyysqxrB5pEBSYw4AAA...
To further demonstrate field-level encryption in action, this blog post includes a sample serverless application that you can deploy by using a CloudFormation template, which creates an application environment using CloudFront, Amazon API Gateway, and Lambda. The sample application is only intended to demonstrate field-level encryption functionality and is not intended for production use. The following diagram depicts the architecture and data flow of this sample application.
Sample application architecture and data flow
Here is how the sample solution works:
An application user submits an HTML form page with sensitive data, generating an HTTPS POST to CloudFront.
Field-level encryption intercepts the form POST and encrypts sensitive data with the public RSA key and replaces fields in the form post with encrypted ciphertext. The form POST ciphertext is then sent to origin servers.
The serverless application accepts the form post data containing ciphertext where sensitive data would normally be. If a malicious user were able to compromise your application and gain access to your data, such as the contents of a form, that user would see encrypted data.
Lambda stores data in a DynamoDB table, leaving sensitive data to remain safely encrypted at rest.
An administrator uses the AWS Management Console and a Lambda function to view the sensitive data.
During the session, the administrator retrieves ciphertext from the DynamoDB table.
Decrypted sensitive data is transmitted over SSL/TLS via the AWS Management Console to the administrator for review.
Deployment walkthrough
The high-level steps to deploy this solution are as follows:
Stage the required artifacts When deployment packages are used with Lambda, the zipped artifacts have to be placed in an S3 bucket in the target AWS Region for deployment. This step is not required if you are deploying in the US East (N. Virginia) Region because the package has already been staged there.
Generate an RSA key pair Create a public/private key pair that will be used to perform the encrypt/decrypt functionality.
Upload the public key to CloudFront and associate it with the field-level encryption configuration After you create the key pair, the public key is uploaded to CloudFront so that it can be used by field-level encryption.
Launch the CloudFormation stack Deploy the sample application for demonstrating field-level encryption by using AWS CloudFormation.
Add the field-level encryption configuration to the CloudFront distribution After you have provisioned the application, this step associates the field-level encryption configuration with the CloudFront distribution.
Store the RSA private key in the Parameter Store Store the private key in the Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value.
Deploy the solution
1. Stage the required artifacts
(If you are deploying in the US East [N. Virginia] Region, skip to Step 2, “Generate an RSA key pair.”)
Stage the Lambda function deployment package in an Amazon S3 bucket located in the AWS Region you are using for this solution. To do this, download the zipped deployment package and upload it to your in-region bucket. For additional information about uploading objects to S3, see Uploading Object into Amazon S3.
2. Generate an RSA key pair
In this section, you will generate an RSA key pair by using OpenSSL:
Confirm access to OpenSSL.
$ openssl version
You should see version information similar to the following.
OpenSSL <version> <date>
Create a private key using the following command.
$ openssl genrsa -out private_key.pem 2048
The command results should look similar to the following.
Restrict access to the private key.$ chmod 600 private_key.pem Note: You will use the public and private key material in Steps 3 and 6 to configure the sample application.
3. Upload the public key to CloudFront and associate it with the field-level encryption configuration
Now that you have created the RSA key pair, you will use the AWS Management Console to upload the public key to CloudFront for use by field-level encryption. Complete the following steps to upload and configure the public key.
Note: Do not include spaces or special characters when providing the configuration values in this section.
From the AWS Management Console, choose Services > CloudFront.
In the navigation pane, choose Public Key and choose Add Public Key.
Complete the Add Public Key configuration boxes:
Key Name: Type a name such as DemoPublicKey.
Encoded Key: Paste the contents of the public_key.pem file you created in Step 2c. Copy and paste the encoded key value for your public key, including the -----BEGIN PUBLIC KEY----- and -----END PUBLIC KEY----- lines.
Comment: Optionally add a comment.
Choose Create.
After adding at least one public key to CloudFront, the next step is to create a profile to tell CloudFront which fields of input you want to be encrypted. While still on the CloudFront console, choose Field-level encryption in the navigation pane.
Under Profiles, choose Create profile.
Complete the Create profile configuration boxes:
Name: Type a name such as FLEDemo.
Comment: Optionally add a comment.
Public key: Select the public key you configured in Step 4.b.
Provider name: Type a provider name such as FLEDemo. This information will be used when the form data is encrypted, and must be provided to applications that need to decrypt the data, along with the appropriate private key.
Pattern to match: Type phone. This configures field-level encryption to match based on the phone.
Choose Save profile.
Configurations include options for whether to block or forward a query to your origin in scenarios where CloudFront can’t encrypt the data. Under Encryption Configurations, choose Create configuration.
Complete the Create configuration boxes:
Comment: Optionally add a comment.
Content type: Enter application/x-www-form-urlencoded. This is a common media type for encoding form data.
Default profile ID: Select the profile you added in Step 3e.
Choose Save configuration
4. Launch the CloudFormation stack
Launch the sample application by using a CloudFormation template that automates the provisioning process.
Input parameter
Input parameter description
ProviderID
Enter the Provider name you assigned in Step 3e. The ProviderID is used in field-level encryption configuration in CloudFront (letters and numbers only, no special characters)
PublicKeyName
Enter the Key Name you assigned in Step 3b. This name is assigned to the public key in field-level encryption configuration in CloudFront (letters and numbers only, no special characters).
PrivateKeySSMPath
Leave as the default: /cloudfront/field-encryption-sample/private-key
The path in the S3 bucket containing artifact files. Leave as default if deploying in us-east-1.
To finish creating the CloudFormation stack:
Choose Next on the Select Template page, enter the input parameters and choose Next. Note: The Artifacts configuration needs to be updated only if you are deploying outside of us-east-1 (US East [N. Virginia]). See Step 1 for artifact staging instructions.
On the Options page, accept the defaults and choose Next.
On the Review page, confirm the details, choose the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create. (The stack will be created in approximately 15 minutes.)
5. Add the field-level encryption configuration to the CloudFront distribution
While still on the CloudFront console, choose Distributions in the navigation pane, and then:
In the Outputs section of the FLE-Sample-App stack, look for CloudFrontDistribution and click the URL to open the CloudFront console.
Choose Behaviors, choose the Default (*) behavior, and then choose Edit.
For Field-level Encryption Config, choose the configuration you created in Step 3g.
Choose Yes, Edit.
While still in the CloudFront distribution configuration, choose the General Choose Edit, scroll down to Distribution State, and change it to Enabled.
Choose Yes, Edit.
6. Store the RSA private key in the Parameter Store
In this step, you store the private key in the EC2 Systems Manager Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value. For more information about AWS KMS, see the AWS Key Management Service Developer Guide. You will need a working installation of the AWS CLI to complete this step.
Store the private key in the Parameter Store with the AWS CLI by running the following command. You will find the <KMSKeyID> in the KMSKeyID in the CloudFormation stack Outputs. Substitute it for the placeholderin the following command.
Verify the parameter. Your private key material should be accessible through the ssm get-parameter in the following command in the Value The key material has been truncated in the following output.
Notice we use the —with decryption argument in this command. This returns the private key as cleartext.
This completes the sample application deployment. Next, we show you how to see field-level encryption in action.
Delete the private key from local storage. On Linux for example, using the shred command, securely delete the private key material from your workstation as shown below. You may also wish to store the private key material within an AWS CloudHSM or other protected location suitable for your security requirements. For production implementations, you also should implement key rotation policies.
Use the following steps to test the sample application with field-level encryption:
Open sample application in your web browser by clicking the ApplicationURL link in the CloudFormation stack Outputs. (for example, https:d199xe5izz82ea.cloudfront.net/prod/). Note that it may take several minutes for the CloudFront distribution to reach the Deployed Status from the previous step, during which time you may not be able to access the sample application.
Fill out and submit the HTML form on the page:
Complete the three form fields: Full Name, Email Address, and Phone Number.
Choose Submit. Notice that the application response includes the form values. The phone number returns the following ciphertext encryption using your public key. This ciphertext has been stored in DynamoDB.
Execute the Lambda decryption function to download ciphertext from DynamoDB and decrypt the phone number using the private key:
In the CloudFormation stack Outputs, locate DecryptFunction and click the URL to open the Lambda console.
Configure a test event using the “Hello World” template.
Choose the Test button.
View the encrypted and decrypted phone number data.
Summary
In this blog post, we showed you how to use CloudFront field-level encryption to encrypt sensitive data at edge locations and help prevent access from unauthorized systems. The source code for this solution is available on GitHub. For additional information about field-level encryption, see the documentation.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please start a new thread on the CloudFront forum.
Cloud security with scalability and innovation: at AWS, this is our top priority. To help you securely architect cloud solutions, AWS Training and Certification recently added new free digital training about security, including a new course about Amazon GuardDuty, a new managed threat-detection service.
These introductory courses, built by AWS experts, are suitable for users and decision makers alike. Each course lasts 10–15 minutes, and you can learn at your own pace.
Introduction to Amazon GuardDuty (10 minutes) Learn how to use Amazon GuardDuty, a new AWS managed threat-detection service that continuously monitors your account to detect known and unknown threats.
Introduction to Amazon Macie (10 minutes) Learn how Amazon Macie uses machine learning to automatically discover, classify, and protect sensitive data in the AWS Cloud.
Introduction to AWS Artifact (10 minutes) Get a brief overview of this service, including the definition of an audit artifact and a demonstration of AWS Artifact.
Introduction to AWS WAF (15 minutes) Review common AWS WAF use cases and learn which conditions AWS WAF (a web application firewall) can detect. A brief demonstration shows how to configure AWS WAF filters and rules.
Go deeper with AWS security courses
To supplement your foundational training, take these security-focused courses:
AWS Security Fundamentals (3 hours) is a self-paced digital course. This course introduces you to fundamental cloud computing and AWS security concepts, including AWS access control and management, governance, logging, and encryption methods. The course also covers security-related compliance protocols and risk management strategies, as well as procedures related to auditing your AWS security infrastructure.
Security Operations on AWS (3 days) is an instructor-led course that offers in-depth classroom instruction. This course demonstrates how to use AWS security services to help remain secure and compliant in the AWS Cloud. The course focuses on AWS security best practices that you can implement to enhance the security of your data and systems in the cloud.
AWS Training and Certification continually evaluates and expands the training courses available to you, so be sure to visit the website regularly to explore the latest offerings.
As part of the AWS Online Tech Talks series, AWS will present Cloud-Native DDoS Mitigation with AWS Shield on Wednesday, December 13. This tech talk will start at 9:00 A.M. Pacific Time and end at 9:40 A.M. Pacific Time.
Distributed Denial of Service (DDoS) mitigation can help you maintain application availability, but traditional solutions are hard to scale and require expensive hardware. AWS Shield is a managed DDoS protection service that helps you safeguard web applications running in the AWS Cloud. In this tech talk, you will learn simple techniques for using AWS Shield to help you build scalable DDoS defenses into your applications without investing in costly infrastructure. You also will learn how AWS Shield helps you monitor your applications to detect DDoS attempts and how to respond to in-progress events.
Today, the following Security, Compliance, & Identity sessions, workshops, and chalk talks will be presented at AWS re:Invent 2017 in Las Vegas. All sessions are in the MGM Grand and all times are local. See the re:Invent Session Catalog for complete information about every session. You can also download the AWS re:Invent 2017 Mobile App for the latest updates and information.
If you are not attending re:Invent 2017, keep in mind that all videos of and slide decks from these sessions will be made available next week. We will publish a post on the Security Blog next week that links to all available videos and slide decks.
We can’t believe that there are just few days left before re:Invent 2017. If you are attending this year, you’ll want to check out our Big Data sessions! The Big Data and Machine Learning categories are bigger than ever. As in previous years, you can find these sessions in various tracks, including Analytics & Big Data, Deep Learning Summit, Artificial Intelligence & Machine Learning, Architecture, and Databases.
We have great sessions from organizations and companies like Vanguard, Cox Automotive, Pinterest, Netflix, FINRA, Amtrak, AmazonFresh, Sysco Foods, Twilio, American Heart Association, Expedia, Esri, Nextdoor, and many more. All sessions are recorded and made available on YouTube. In addition, all slide decks from the sessions will be available on SlideShare.net after the conference.
This post highlights the sessions that will be presented as part of the Analytics & Big Data track, as well as relevant sessions from other tracks like Architecture, Artificial Intelligence & Machine Learning, and IoT. If you’re interested in Machine Learning sessions, don’t forget to check out our Guide to Machine Learning at re:Invent 2017.
This year’s session catalog contains the following breakout sessions.
Raju Gulabani, VP, Database, Analytics and AI at AWS will discuss the evolution of database and analytics services in AWS, the new database and analytics services and features we launched this year, and our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different forms. Storage, management, and analysis of this data require database services that scale and perform in ways not possible before. AWS offers a collection of database and other data services—including Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR—to process, store, manage, and analyze data. In this session, we provide an overview of AWS database and analytics services and discuss how customers are using these services today.
Deep dive customer use cases
ABD401 – How Netflix Monitors Applications in Near Real-Time with Amazon Kinesis Thousands of services work in concert to deliver millions of hours of video streams to Netflix customers every day. These applications vary in size, function, and technology, but they all make use of the Netflix network to communicate. Understanding the interactions between these services is a daunting challenge both because of the sheer volume of traffic and the dynamic nature of deployments. In this session, we first discuss why Netflix chose Kinesis Streams to address these challenges at scale. We then dive deep into how Netflix uses Kinesis Streams to enrich network traffic logs and identify usage patterns in real time. Lastly, we cover how Netflix uses this system to build comprehensive dependency maps, increase network efficiency, and improve failure resiliency. From this session, you will learn how to build a real-time application monitoring system using network traffic logs and get real-time, actionable insights.
In this session, learn how Nextdoor replaced their home-grown data pipeline based on a topology of Flume nodes with a completely serverless architecture based on Kinesis and Lambda. By making these changes, they improved both the reliability of their data and the delivery times of billions of records of data to their Amazon S3–based data lake and Amazon Redshift cluster. Nextdoor is a private social networking service for neighborhoods.
ABD205 – Taking a Page Out of Ivy Tech’s Book: Using Data for Student Success Data speaks. Discover how Ivy Tech, the nation’s largest singly accredited community college, uses AWS to gather, analyze, and take action on student behavioral data for the betterment of over 3,100 students. This session outlines the process from inception to implementation across the state of Indiana and highlights how Ivy Tech’s model can be applied to your own complex business problems.
ABD207 – Leveraging AWS to Fight Financial Crime and Protect National Security Banks aren’t known to share data and collaborate with one another. But that is exactly what the Mid-Sized Bank Coalition of America (MBCA) is doing to fight digital financial crime—and protect national security. Using the AWS Cloud, the MBCA developed a shared data analytics utility that processes terabytes of non-competitive customer account, transaction, and government risk data. The intelligence produced from the data helps banks increase the efficiency of their operations, cut labor and operating costs, and reduce false positive volumes. The collective intelligence also allows greater enforcement of Anti-Money Laundering (AML) regulations by helping members detect internal risks—and identify the challenges to detecting these risks in the first place. This session demonstrates how the AWS Cloud supports the MBCA to deliver advanced data analytics, provide consistent operating models across financial institutions, reduce costs, and strengthen national security.
ABD208 – Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores New Innovation with Amazon Kinesis Firehose In this session, learn how Cox Automotive is using Splunk Cloud for real time visibility into its AWS and hybrid environments to achieve near instantaneous MTTI, reduce auction incidents by 90%, and proactively predict outages. We also introduce a highly anticipated capability that allows you to ingest, transform, and analyze data in real time using Splunk and Amazon Kinesis Firehose to gain valuable insights from your cloud resources. It’s now quicker and easier than ever to gain access to analytics-driven infrastructure monitoring using Splunk Enterprise & Splunk Cloud.
ABD209 – Accelerating the Speed of Innovation with a Data Sciences Data & Analytics Hub at Takeda Historically, silos of data, analytics, and processes across functions, stages of development, and geography created a barrier to R&D efficiency. Gathering the right data necessary for decision-making was challenging due to issues of accessibility, trust, and timeliness. In this session, learn how Takeda is undergoing a transformation in R&D to increase the speed-to-market of high-impact therapies to improve patient lives. The Data and Analytics Hub was built, with Deloitte, to address these issues and support the efficient generation of data insights for functions such as clinical operations, clinical development, medical affairs, portfolio management, and R&D finance. In the AWS hosted data lake, this data is processed, integrated, and made available to business end users through data visualization interfaces, and to data scientists through direct connectivity. Learn how Takeda has achieved significant time reductions—from weeks to minutes—to gather and provision data that has the potential to reduce cycle times in drug development. The hub also enables more efficient operations and alignment to achieve product goals through cross functional team accountability and collaboration due to the ability to access the same cross domain data.
ABD210 – Modernizing Amtrak: Serverless Solution for Real-Time Data Capabilities As the nation’s only high-speed intercity passenger rail provider, Amtrak needs to know critical information to run their business such as: Who’s onboard any train at any time? How are booking and revenue trending? Amtrak was faced with unpredictable and often slow response times from existing databases, ranging from seconds to hours; existing booking and revenue dashboards were spreadsheet-based and manual; multiple copies of data were stored in different repositories, lacking integration and consistency; and operations and maintenance (O&M) costs were relatively high. Join us as we demonstrate how Deloitte and Amtrak successfully went live with a cloud-native operational database and analytical datamart for near-real-time reporting in under six months. We highlight the specific challenges and the modernization of architecture on an AWS native Platform as a Service (PaaS) solution. The solution includes cloud-native components such as AWS Lambda for microservices, Amazon Kinesis and AWS Data Pipeline for moving data, Amazon S3 for storage, Amazon DynamoDB for a managed NoSQL database service, and Amazon Redshift for near-real time reports and dashboards. Deloitte’s solution enabled “at scale” processing of 1 million transactions/day and up to 2K transactions/minute. It provided flexibility and scalability, largely eliminate the need for system management, and dramatically reduce operating costs. Moreover, it laid the groundwork for decommissioning legacy systems, anticipated to save at least $1M over 3 years.
ABD211 – Sysco Foods: A Journey from Too Much Data to Curated Insights In this session, we detail Sysco’s journey from a company focused on hindsight-based reporting to one focused on insights and foresight. For this shift, Sysco moved from multiple data warehouses to an AWS ecosystem, including Amazon Redshift, Amazon EMR, AWS Data Pipeline, and more. As the team at Sysco worked with Tableau, they gained agile insight across their business. Learn how Sysco decided to use AWS, how they scaled, and how they became more strategic with the AWS ecosystem and Tableau.
ABD217 – From Batch to Streaming: How Amazon Flex Uses Real-time Analytics to Deliver Packages on Time Reducing the time to get actionable insights from data is important to all businesses, and customers who employ batch data analytics tools are exploring the benefits of streaming analytics. Learn best practices to extend your architecture from data warehouses and databases to real-time solutions. Learn how to use Amazon Kinesis to get real-time data insights and integrate them with Amazon Aurora, Amazon RDS, Amazon Redshift, and Amazon S3. The Amazon Flex team describes how they used streaming analytics in their Amazon Flex mobile app, used by Amazon delivery drivers to deliver millions of packages each month on time. They discuss the architecture that enabled the move from a batch processing system to a real-time system, overcoming the challenges of migrating existing batch data to streaming data, and how to benefit from real-time analytics.
ABD218 – How EuroLeague Basketball Uses IoT Analytics to Engage Fans IoT and big data have made their way out of industrial applications, general automation, and consumer goods, and are now a valuable tool for improving consumer engagement across a number of industries, including media, entertainment, and sports. The low cost and ease of implementation of AWS analytics services and AWS IoT have allowed AGT, a leader in IoT, to develop their IoTA analytics platform. Using IoTA, AGT brought a tailored solution to EuroLeague Basketball for real-time content production and fan engagement during the 2017-18 season. In this session, we take a deep dive into how this solution is architected for secure, scalable, and highly performant data collection from athletes, coaches, and fans. We also talk about how the data is transformed into insights and integrated into a content generation pipeline. Lastly, we demonstrate how this solution can be easily adapted for other industries and applications.
ABD222 – How to Confidently Unleash Data to Meet the Needs of Your Entire Organization Where are you on the spectrum of IT leaders? Are you confident that you’re providing the technology and solutions that consistently meet or exceed the needs of your internal customers? Do your peers at the executive table see you as an innovative technology leader? Innovative IT leaders understand the value of getting data and analytics directly into the hands of decision makers, and into their own. In this session, Daren Thayne, Domo’s Chief Technology Officer, shares how innovative IT leaders are helping drive a culture change at their organizations. See how transformative it can be to have real-time access to all of the data that’ is relevant to YOUR job (including a complete view of your entire AWS environment), as well as understand how it can help you lead the way in applying that same pattern throughout your entire company
ABD303 – Developing an Insights Platform – Sysco’s Journey from Disparate Systems to Data Lake and Beyond Sysco has nearly 200 operating companies across its multiple lines of business throughout the United States, Canada, Central/South America, and Europe. As the global leader in food services, Sysco identified the need to streamline the collection, transformation, and presentation of data produced by the distributed units and systems, into a central data ecosystem. Sysco’s Business Intelligence and Analytics team addressed these requirements by creating a data lake with scalable analytics and query engines leveraging AWS services. In this session, Sysco will outline their journey from a hindsight reporting focused company to an insights driven organization. They will cover solution architecture, challenges, and lessons learned from deploying a self-service insights platform. They will also walk through the design patterns they used and how they designed the solution to provide predictive analytics using Amazon Redshift Spectrum, Amazon S3, Amazon EMR, AWS Glue, Amazon Elasticsearch Service and other AWS services.
ABD309 – How Twilio Scaled Its Data-Driven Culture As a leading cloud communications platform, Twilio has always been strongly data-driven. But as headcount and data volumes grew—and grew quickly—they faced many new challenges. One-off, static reports work when you’re a small startup, but how do you support a growth stage company to a successful IPO and beyond? Today, Twilio’s data team relies on AWS and Looker to provide data access to 700 colleagues. Departments have the data they need to make decisions, and cloud-based scale means they get answers fast. Data delivers real-business value at Twilio, providing a 360-degree view of their customer, product, and business. In this session, you hear firsthand stories directly from the Twilio data team and learn real-world tips for fostering a truly data-driven culture at scale.
ABD310 – How FINRA Secures Its Big Data and Data Science Platform on AWS FINRA uses big data and data science technologies to detect fraud, market manipulation, and insider trading across US capital markets. As a financial regulator, FINRA analyzes highly sensitive data, so information security is critical. Learn how FINRA secures its Amazon S3 Data Lake and its data science platform on Amazon EMR and Amazon Redshift, while empowering data scientists with tools they need to be effective. In addition, FINRA shares AWS security best practices, covering topics such as AMI updates, micro segmentation, encryption, key management, logging, identity and access management, and compliance.
ABD331 – Log Analytics at Expedia Using Amazon Elasticsearch Service Expedia uses Amazon Elasticsearch Service (Amazon ES) for a variety of mission-critical use cases, ranging from log aggregation to application monitoring and pricing optimization. In this session, the Expedia team reviews how they use Amazon ES and Kibana to analyze and visualize Docker startup logs, AWS CloudTrail data, and application metrics. They share best practices for architecting a scalable, secure log analytics solution using Amazon ES, so you can add new data sources almost effortlessly and get insights quickly
ABD316 – American Heart Association: Finding Cures to Heart Disease Through the Power of Technology Combining disparate datasets and making them accessible to data scientists and researchers is a prevalent challenge for many organizations, not just in healthcare research. American Heart Association (AHA) has built a data science platform using Amazon EMR, Amazon Elasticsearch Service, and other AWS services, that corrals multiple datasets and enables advanced research on phenotype and genotype datasets, aimed at curing heart diseases. In this session, we present how AHA built this platform and the key challenges they addressed with the solution. We also provide a demo of the platform, and leave you with suggestions and next steps so you can build similar solutions for your use cases
ABD319 – Tooling Up for Efficiency: DIY Solutions @ Netflix At Netflix, we have traditionally approached cloud efficiency from a human standpoint, whether it be in-person meetings with the largest service teams or manually flipping reservations. Over time, we realized that these manual processes are not scalable as the business continues to grow. Therefore, in the past year, we have focused on building out tools that allow us to make more insightful, data-driven decisions around capacity and efficiency. In this session, we discuss the DIY applications, dashboards, and processes we built to help with capacity and efficiency. We start at the ten thousand foot view to understand the unique business and cloud problems that drove us to create these products, and discuss implementation details, including the challenges encountered along the way. Tools discussed include Picsou, the successor to our AWS billing file cost analyzer; Libra, an easy-to-use reservation conversion application; and cost and efficiency dashboards that relay useful financial context to 50+ engineering teams and managers.
ABD312 – Deep Dive: Migrating Big Data Workloads to AWS Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
ABD402 – How Esri Optimizes Massive Image Archives for Analytics in the Cloud Petabyte scale archives of satellites, planes, and drones imagery continue to grow exponentially. They mostly exist as semi-structured data, but they are only valuable when accessed and processed by a wide range of products for both visualization and analysis. This session provides an overview of how ArcGIS indexes and structures data so that any part of it can be quickly accessed, processed, and analyzed by reading only the minimum amount of data needed for the task. In this session, we share best practices for structuring and compressing massive datasets in Amazon S3, so it can be analyzed efficiently. We also review a number of different image formats, including GeoTIFF (used for the Public Datasets on AWS program, Landsat on AWS), cloud optimized GeoTIFF, MRF, and CRF as well as different compression approaches to show the effect on processing performance. Finally, we provide examples of how this technology has been used to help image processing and analysis for the response to Hurricane Harvey.
ABD329 – A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Massive Scale Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
ABD327 – Migrating Your Traditional Data Warehouse to a Modern Data Lake In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base. MCL202 – Ally Bank & Cognizant: Transforming Customer Experience Using Amazon Alexa Given the increasing popularity of natural language interfaces such as Voice as User technology or conversational artificial intelligence (AI), Ally® Bank was looking to interact with customers by enabling direct transactions through conversation or voice. They also needed to develop a capability that allows third parties to connect to the bank securely for information sharing and exchange, using oAuth, an authentication protocol seen as the future of secure banking technology. Cognizant’s Architecture team partnered with Ally Bank’s Enterprise Architecture group and identified the right product for oAuth integration with Amazon Alexa and third-party technologies. In this session, we discuss how building products with conversational AI helps Ally Bank offer an innovative customer experience; increase retention through improved data-driven personalization; increase the efficiency and convenience of customer service; and gain deep insights into customer needs through data analysis and predictive analytics to offer new products and services.
MCL317 – Orchestrating Machine Learning Training for Netflix Recommendations At Netflix, we use machine learning (ML) algorithms extensively to recommend relevant titles to our 100+ million members based on their tastes. Everything on the member home page is an evidence-driven, A/B-tested experience that we roll out backed by ML models. These models are trained using Meson, our workflow orchestration system. Meson distinguishes itself from other workflow engines by handling more sophisticated execution graphs, such as loops and parameterized fan-outs. Meson can schedule Spark jobs, Docker containers, bash scripts, gists of Scala code, and more. Meson also provides a rich visual interface for monitoring active workflows and inspecting execution logs. It has a powerful Scala DSL for authoring workflows as well as the REST API. In this session, we focus on how Meson trains recommendation ML models in production, and how we have re-architected it to scale up for a growing need of broad ETL applications within Netflix. As a driver for this change, we have had to evolve the persistence layer for Meson. We talk about how we migrated from Cassandra to Amazon RDS backed by Amazon Aurora
MCL350 – Humans vs. the Machines: How Pinterest Uses Amazon Mechanical Turk’s Worker Community to Improve Machine Learning Ever since the term “crowdsourcing” was coined in 2006, it’s been a buzzword for technology companies and social institutions. In the technology sector, crowdsourcing is instrumental for verifying machine learning algorithms, which, in turn, improves the user’s experience. In this session, we explore how Pinterest adapted to an increased reliability on human evaluation to improve their product, with a focus on how they’ve integrated with Mechanical Turk’s platform. This presentation is aimed at engineers, analysts, program managers, and product managers who are interested in how companies rely on Mechanical Turk’s human evaluation platform to better understand content and improve machine learning algorithms. The discussion focuses on the analysis and product decisions related to building a high quality crowdsourcing system that takes advantage of Mechanical Turk’s powerful worker community.
ABD201 – Big Data Architectural Patterns and Best Practices on AWS In this session, we simplify big data processing as a data bus comprising various stages: collect, store, process, analyze, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost
ABD202 – Best Practices for Building Serverless Big Data Applications Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.
ABD206 – Building Visualizations and Dashboards with Amazon QuickSight Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
ABD203 – Real-Time Streaming Applications on AWS: Use Cases and Patterns To win in the marketplace and provide differentiated customer experiences, businesses need to be able to use live data in real time to facilitate fast decision making. In this session, you learn common streaming data processing use cases and architectures. First, we give an overview of streaming data and AWS streaming data capabilities. Next, we look at a few customer examples and their real-time streaming applications. Finally, we walk through common architectures and design patterns of top streaming data use cases.
ABD213 – How to Build a Data Lake with AWS Glue Data Catalog As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
ABD214 – Real-time User Insights for Mobile and Web Applications with Amazon Pinpoint With customers demanding relevant and real-time experiences across a range of devices, digital businesses are looking to gather user data at scale, understand this data, and respond to customer needs instantly. This requires tools that can record large volumes of user data in a structured fashion, and then instantly make this data available to generate insights. In this session, we demonstrate how you can use Amazon Pinpoint to capture user data in a structured yet flexible manner. Further, we demonstrate how this data can be set up for instant consumption using services like Amazon Kinesis Firehose and Amazon Redshift. We walk through example data based on real world scenarios, to illustrate how Amazon Pinpoint lets you easily organize millions of events, record them in real-time, and store them for further analysis.
ABD223 – IT Innovators: New Technology for Leveraging Data to Enable Agility, Innovation, and Business Optimization Companies of all sizes are looking for technology to efficiently leverage data and their existing IT investments to stay competitive and understand where to find new growth. Regardless of where companies are in their data-driven journey, they face greater demands for information by customers, prospects, partners, vendors and employees. All stakeholders inside and outside the organization want information on-demand or in “real time”, available anywhere on any device. They want to use it to optimize business outcomes without having to rely on complex software tools or human gatekeepers to relevant information. Learn how IT innovators at companies such as MasterCard, Jefferson Health, and TELUS are using Domo’s Business Cloud to help their organizations more effectively leverage data at scale.
ABD301 – Analyzing Streaming Data in Real Time with Amazon Kinesis Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we present an end-to-end streaming data solution using Kinesis Streams for data ingestion, Kinesis Analytics for real-time processing, and Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system
ABD302 – Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution. First, we cover how to configure an Amazon ES cluster and ingest data using Amazon Kinesis Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data. Then we demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we review approaches for generating custom, ad-hoc reports.
ABD304 – Best Practices for Data Warehousing with Amazon Redshift & Redshift Spectrum Most companies are over-run with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we take an in-depth look at how modern data warehousing blends and analyzes all your data, inside and outside your data warehouse without moving the data, to give you deeper insights to run your business. We will cover best practices on how to design optimal schemas, load data efficiently, and optimize your queries to deliver high throughput and performance.
ABD305 – Design Patterns and Best Practices for Data Analytics with Amazon EMR Amazon EMR is one of the largest Hadoop operators in the world, enabling customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about lowering cost with Auto Scaling and Spot Instances, and security best practices for encryption and fine-grained access control. Finally, we dive into some of our recent launches to keep you current on our latest features.
ABD307 – Deep Analytics for Global AWS Marketing Organization To meet the needs of the global marketing organization, the AWS marketing analytics team built a scalable platform that allows the data science team to deliver custom econometric and machine learning models for end user self-service. To meet data security standards, we use end-to-end data encryption and different AWS services such as Amazon Redshift, Amazon RDS, Amazon S3, Amazon EMR with Apache Spark and Auto Scaling. In this session, you see real examples of how we have scaled and automated critical analysis, such as calculating the impact of marketing programs like re:Invent and prioritizing leads for our sales teams.
ABD311 – Deploying Business Analytics at Enterprise Scale with Amazon QuickSight One of the biggest tradeoffs customers usually make when deploying BI solutions at scale is agility versus governance. Large-scale BI implementations with the right governance structure can take months to design and deploy. In this session, learn how you can avoid making this tradeoff using Amazon QuickSight. Learn how to easily deploy Amazon QuickSight to thousands of users using Active Directory and Federated SSO, while securely accessing your data sources in Amazon VPCs or on-premises. We also cover how to control access to your datasets, implement row-level security, create scheduled email reports, and audit access to your data.
ABD315 – Building Serverless ETL Pipelines with AWS Glue Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
ABD318 – Architecting a data lake with Amazon S3, Amazon Kinesis, and Amazon Athena Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users – from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways. In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users.
Companies have valuable data that they may not be analyzing due to the complexity, scalability, and performance issues of loading the data into their data warehouse. However, with the right tools, you can extend your analytics to query data in your data lake—with no loading required. Amazon Redshift Spectrum extends the analytic power of Amazon Redshift beyond data stored in your data warehouse to run SQL queries directly against vast amounts of unstructured data in your Amazon S3 data lake. This gives you the freedom to store your data where you want, in the format you want, and have it available for analytics when you need it. Join a discussion with AWS solution architects to ask question.
ABD330 – Combining Batch and Stream Processing to Get the Best of Both Worlds Today, many architects and developers are looking to build solutions that integrate batch and real-time data processing, and deliver the best of both approaches. Lambda architecture (not to be confused with the AWS Lambda service) is a design pattern that leverages both batch and real-time processing within a single solution to meet the latency, accuracy, and throughput requirements of big data use cases. Come join us for a discussion on how to implement Lambda architecture (batch, speed, and serving layers) and best practices for data processing, loading, and performance tuning
ABD335 – Real-Time Anomaly Detection Using Amazon Kinesis Amazon Kinesis Analytics offers a built-in machine learning algorithm that you can use to easily detect anomalies in your VPC network traffic and improve security monitoring. Join us for an interactive discussion on how to stream your VPC flow Logs to Amazon Kinesis Streams and identify anomalies using Kinesis Analytics.
ABD339 – Deep Dive and Best Practices for Amazon Athena Amazon Athena is an interactive query service that enables you to process data directly from Amazon S3 without the need for infrastructure. Since its launch at re:invent 2016, several organizations have adopted Athena as the central tool to process all their data. In this talk, we dive deep into the most common use cases, including working with other AWS services. We review the best practices for creating tables and partitions and performance optimizations. We also dive into how Athena handles security, authorization, and authentication. Lastly, we hear from a customer who has reduced costs and improved time to market by deploying Athena across their organization.
We look forward to meeting you at re:Invent 2017!
About the Author
Roy Ben-Alta is a solution architect and principal business development manager at Amazon Web Services in New York. He focuses on Data Analytics and ML Technologies, working with AWS customers to build innovative data-driven products.
Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging
Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.
However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.
If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.
What is event-driven computing?
Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.
Which AWS services publish events to SNS natively?
Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.
Compute services
Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).
AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.
Elastic Load Balancing:Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.
Amazon S3:Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.
Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.
Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.
AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.
Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.
Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.
Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.
AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.
Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.
AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.
In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:
Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.
Conclusion
Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.
Now that you can reserve seating in AWS re:Invent 2017 breakout sessions, workshops, chalk talks, and other events, the time is right to review the list of introductory, advanced, and expert content being offered this year. To learn more about breakout content types and levels, see Breakout Content.
SID201 – IAM for Enterprises: How Vanguard Strikes the Balance Between Agility, Governance, and Security For Vanguard, managing the creation of AWS Identity and Access Management (IAM) objects is key to balancing developer velocity and compliance. In this session, you learn how Vanguard designs IAM roles to control the blast radius of AWS resources and maintain simplicity for developers. Vanguard will also share best practices to help you manage governance and improve your visibility across your AWS resources.
SID202 – Deep dive about how Capital One automates the delivery of directory services across AWS accounts Traditional solutions for using Microsoft Active Directory across on-premises and AWS Cloud Windows workloads can require complex networking or syncing identities across multiple systems. AWS Directory Service for Microsoft Active Directory, also known as AWS Managed AD, offers you actual Microsoft Active Directory in the AWS Cloud as a managed service. In this session, you will learn how Capital One uses AWS Managed AD to provide highly available authentication and authorization services for its Windows workloads, such as Amazon RDS for SQL Server.
SID205 – Building the Largest Repo for Serverless Compliance-as-Code When you use the cloud to enable speed and agility, how do you know if you’ve done it correctly? We are on a mission to help builders follow industry best practices within security guardrails by creating the largest compliance-as-code repository, available to all. Compliance-as-code is the idea to translate best practices, guardrails, policies, and standards into codified unit testing. Apply this to your AWS environment to provide insights about what can or must be improved. Learn why compliance-as-code matters to gain speed (by getting developers, architects, and security pros on the same page), how it is currently used (demo), and how to start using it or being part of building it.
SID206 – Best Practices for Managing Security Operation on AWS To help prevent unexpected access to your AWS resources, it is critical to maintain strong identity and access policies and track, detect, and react to changes. In this session, you will learn how to use AWS Identity and Access Management (IAM) to control access to AWS resources and integrate your existing authentication system with IAM. We will cover how to deploy and control AWS infrastructure using code templates, including change management policies with AWS CloudFormation.
SID207 – Feedback Security in the Cloud Like many security teams, Riot has been challenged by new paradigms that came with the move to the cloud. We discuss how our security team has developed a security culture based on feedback and self-service to best thrive in the cloud. We detail how the team assessed the security gaps and challenges in our move to AWS, and then describe how the team works within Riot’s unique feedback culture.
SID208 – Less (Privilege) Is More: Getting Least Privilege Right in AWS AWS services are designed to enable control through AWS Identity and Access Management (IAM) and Amazon Virtual Private Cloud (VPC). Join us in this chalk talk to learn how to apply these toward the security principal of least privilege for applications and data and how to practically integrate them in your security operations.
SID209 – Designing and Deploying an AWS Account Factory AWS customers start off with one AWS account, but quickly realize the benefits of having multiple AWS accounts. A common learning curve for customers is how to securely baseline and set up new accounts at scale. This talk helps you understand how to use AWS Organizations, AWS Identity and Access Management (IAM), AWS CloudFormation, and other tools to baseline new accounts, set them up for federation, and make a secure and repeatable account factory to create new AWS accounts. Walk away with demos and tools to use in your own environment.
SID210 – A CISO’s Journey at Vonage: Achieving Unified Security at Scale Making sense of the risks of IT deployments that sit in hybrid environments and span multiple countries is a major challenge. When you add in multiple toolsets and global compliance requirements, including GDPR, it can get overwhelming. Listen to Vonage’s Chief Information Security Officer, Johan Hybinette, share his experiences tackling these challenges.
SID212 – Maximizing Your Move to AWS – Five Key Lessons from Vanguard and Cloud Technology Partners CTP’s Robert Christiansen and Mike Kavis describe how to maximize the value of your AWS initiative. From building a Minimum Viable Cloud to establishing a cloud robust security and compliance posture, we walk through key client success stories and lessons learned. We also explore how CTP has helped Vanguard, the leading provider of investor communications and technology, take advantage of AWS to delight customers, drive new revenue streams, and transform their business.
SID213 – Managing Regulator Expectations – Lessons Learned on Positioning AWS Services from an Audit Perspective Cloud migration in highly regulated industries can stall without a solid understanding of how (and when) to address regulatory expectations. This session provides a guide to explaining the aspects of AWS services that are most frequently the subject of an internal or regulatory audit. Because regulatory agencies and internal auditors might not share a common understanding of the cloud, this session is designed to help you to help them, regardless of their level of technical fluency.
SID214 – Best Security Practices in the Intelligence Community Executives from the Intelligence community discuss cloud security best practices in a field where security is imperative to operations. CIA security cloud chief John Nicely and NGA security cloud chief Scot Kaplan share success stories of migrating mass data to the cloud from a security perspective. Hear how they migrated their IT portfolios while managing their organizations’ unique blend of constraints, budget issues, politics, culture, and security pressures. Learn how these institutions overcame barriers to migration, and ask these panelists what actions you can take to better prepare yourself for the journey of mass migration to the cloud.
SID216 – Defending Diverse Applications Against Common Threats In this session, you learn how to adapt application defenses and operational responses based on your unique requirements. You also hear directly from customers about how they architected their applications on AWS to protect their applications. There are many ways to build secure, high-availability applications in the cloud. Services such as Amazon API Gateway, Amazon VPC, ALB, ELB, and Amazon EC2 are the basic building blocks that enable you to address a wide range of use cases. Best practices for defending your applications against Distributed Denial of Service (DDoS) attacks, exploitation attempts, and bad bots can vary with your choices in architecture.
Advanced level
SID301 – Using AWS Lambda as a Security Team Operating a security practice on AWS brings many new challenges that haven’t been faced in data center environments. The dynamic nature of infrastructure, the relationship between development team members and their applications, and the architecture paradigms have all changed as a result of building software on top of AWS. In this session, learn how your security team can leverage AWS Lambda as a tool to monitor, audit, and enforce your security policies within an AWS environment.
SID302 – Force Multiply Your Security Team with Automation and Alexa Adversaries automate. Who says the good guys can’t as well? By combining AWS offerings like AWS CloudTrail, Amazon Cloudwatch, AWS Config, and AWS Lambda with the power of Amazon Alexa, you can do more security tasks faster, with fewer resources. Force multiplying your security team is all about automation! Last year, we showed off penetration testing at the push of an (AWS IoT) button, and surprise-previewed how to ask Alexa to run Inspector as-needed. Want to see other ways to ask Alexa to be your cloud security sidekick? We have crazy new demos at the ready to show security geeks how to sling security automation solutions for their AWS environments (and impress and help your boss, too).
SID303 – How You Can Use AWS’s Identity Services to be Successful on Your AWS Cloud Journey Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services. Along each of these journeys, identity and access management helps customers protect their applications and resources. In this session, you will learn how AWS’s identity services provide you a secure, flexible, and easy solution for managing identities and access on the AWS Cloud. With AWS’’s Identity Services, you do not have to adapt to AWS. Instead, you have a choice of services designed to meet you anywhere along your journey to the AWS Cloud. Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services.
SID304 – SecOps 2021 Today: Using AWS Services to Deliver SecOps This talk dives deep on how to build end-to-end security capabilities using AWS. Our goal is orchestrating AWS Security services with other AWS building blocks to deliver enhanced security. We cover working with AWS CloudWatch Events as a queueing mechanism for processing security events, using Amazon DynamoDB to provide a stateful layer to provide tailored response to events and other ancillary functions, using DynamoDB as an attack signature engine, and the use of analytics to derive tailored signatures for detection with AWS Lambda.
SID305 – How CrowdStrike Built a Real-time Security Monitoring Service on AWS The CrowdStrike motto is “We Stop Breaches.” To do that, it needed to build a real-time security monitoring service to detect threats. Join this session to learn how Crowdstrike uses Amazon EC2 and Amazon EBS to help its customers identify vulnerabilities before they become large-scale problems.
SID306 – How Chick-fil-A Embraces DevSecOps on AWS As Chick-fil-A became a cloud-first organization, their security team didn’t want to become the bottleneck for agility. But the security team also wanted to raise the bar for their security posture on AWS. Robert Davis, security architect at Chick-fil-A, provides an overview about how he and his team recognized that writing code was the best way for their security policies to scale across the many AWS accounts that Chick-fil-A operates.
SID307 – Serverless for Security Officers: Paradigm Walkthrough and Comprehensive Security Best Practices For security practitioners, serverless represents a context switch from the familiar servers and networks to a decentralized set of code snippets and AWS platform constructs. This new ecosystem also represents new operational teams, data flows, security tooling, and faster-then-ever change velocity. In this talk, we perform live demos and provide code samples for a wide array of security best practices aligned to industry standards such as NIST 800-53 and ISO 27001.
SID308 – Multi-Account Strategies We will explore a multi-account architecture and how to approach the design/thought process around it. This chalk talk will allow attendees to dive deep into the topic and discuss the nuances of the architecture as well as provide feedback around the approach.
SID309 – Credentials, Credentials, Credentials, Oh My! For new and experienced customers alike, understanding the various credential forms and exchange mechanisms within AWS can be a daunting exercise. In this chalk talk, we clear up the confusion by performing a cartography exercise. We visually depict the right source credentials (for example, enterprise user name and password, IAM keys, AWS STS tokens, and so on) and transformation mechanisms (for example, AssumeRole and so on) to use depending on what you’re trying to do and where you’re coming from.
SID310 – Moving from the Shadows to the Throne What do you do when leadership embraces what was called “shadow IT” as the new path forward? How do you onboard new accounts while simultaneously pushing policy to secure all existing accounts? This session walks through Cisco’s journey consolidating over 700 existing accounts in the Cisco organization, while building and applying Cisco’s new cloud policies.
SID311 – Designing Security and Governance Across a Multi-Account Strategy When organizations plan their journey to cloud adoption at scale, they quickly encounter questions such as: How many accounts do we need? How do we share resources? How do we integrate with existing identity solutions? In this workshop, we present best practices and give you the hands-on opportunity to test and develop best practices. You will work in teams to set up and create an AWS environment that is enterprise-ready for application deployment and integration into existing operations, security, and procurement processes. You will get hands-on experience with cross-account roles, consolidated logging, account governance and other challenges to solve.
SID312 – DevSecOps Capture the Flag In this Capture the Flag workshop, we divide groups into teams and work on AWS CloudFormation DevSecOps. The AWS Red Team supplies an AWS DevSecOps Policy that needs to be enforced via CloudFormation static analysis. Participant Blue Teams are provided with an AWS Lambda-based reference architecture to be used to inspect CloudFormation templates against that policy. Interesting items need to be logged, and made visible via ChatOps. Dangerous items need to be logged, and recorded accurately as a template fail. The secondary challenge is building a CloudFormation template to thwart the controls being created by the other Blue teams.
SID313 – Continuous Compliance on AWS at Scale In cloud migrations, the cloud’s elastic nature is often touted as a critical capability in delivering on key business initiatives. However, you must account for it in your security and compliance plans or face some real challenges. Always counting on a virtual host to be running, for example, causes issues when that host is rebooted or retired. Managing security and compliance in the cloud is continuous, requiring forethought and automation. Learn how a leading, next generation managed cloud provider uses automation and cloud expertise to manage security and compliance at scale in an ever-changing environment.
SID314 – IAM Policy Ninja Are you interested in learning how to control access to your AWS resources? Have you wondered how to best scope permissions to achieve least-privilege permissions access control? If your answer is “yes,” this session is for you. We look at the AWS Identity and Access Management (IAM) policy language, starting with the basics of the policy language and how to create and attach policies to IAM users, groups, and roles. We explore policy variables, conditions, and tools to help you author least privilege policies. We cover common use cases, such as granting a user secure access to an Amazon S3 bucket or to launch an Amazon EC2 instance of a specific type.
SID315 – Security and DevOps: Agility and Teamwork In this session, you learn pragmatic steps to integrate security controls into DevOps processes in your AWS environment at scale. Cybersecurity expert and founder of Alert Logic Misha Govshteyn shares insights from high performing teams who are embracing the reality that an agile security program can enable faster and more secure workload deployments. Joining Misha is Joey Peloquin, Director of Cloud Security Operations at Citrix, who discusses Citrix’s DevOps experiences and how they manage their cybersecurity posture within the AWS Cloud. Session sponsored by Alert Logic.
SID316 – Using Access Advisor to Strike the Balance Between Security and Usability AWS provides a killer feature for security operations teams: Access Advisor. In this session, we discuss how Access Advisor shows the services to which an IAM policy grants access and provides a timestamp for the last time that the role authenticated against that service. At Netflix, we use this valuable data to automatically remove permissions that are no longer used. By continually removing excess permissions, we can achieve a balance of empowering developers and maintaining a best-practice, secure environment.
SID317 – Automating Security and Compliance Testing of Infrastructure-as-Code for DevSecOps Infrastructure-as-Code (IaC) has emerged as an essential element of organizational DevOps practices. Tools such as AWS CloudFormation and Terraform allow software-defined infrastructure to be deployed quickly and repeatably to AWS. But the agility of CI/CD pipelines also creates new challenges in infrastructure security hardening. This session provides a foundation for how to bring proven software hardening practices into the world of infrastructure deployment. We discuss how to build security and compliance tests for infrastructure analogous to unit tests for application code, and showcase how security, compliance and governance testing fit in a modern CI/CD pipeline.
SID318 – From Obstacle to Advantage: The Changing Role of Security & Compliance in Your Organization A surprising trend is starting to emerge among organizations who are progressing through the cloud maturity lifecycle: major improvements in revenue growth, customer satisfaction, and mission success are being directly attributed to improvements in security and compliance. At one time thought of as speed bumps in the path to deployment, security and compliance are now seen as critical ingredients that help organizations differentiate their offerings in the market, win more deals, and achieve mission-critical goals faster. This session explores how organizations like Jive Software and the National Geospatial Agency use the Evident Security Platform, AWS, and AWS Quick Starts to automate security and compliance processes in their organization to accomplish more, do it faster, and deliver better results.
SID319 – Incident Response in the Cloud In this session, we walk you through a hypothetical incident response managed on AWS. Learn how to apply existing best practices as well as how to leverage the unique security visibility, control, and automation that AWS provides. We cover how to set up your AWS environment to prevent a security event and how to build a cloud-specific incident response plan so that your organization is prepared before a security event occurs. This session also covers specific environment recovery steps available on AWS.
SID320 – Fraud Prevention, Detection, Lessons Learned, and Best Practices Fighting fraud means countering human actors that quickly adapt to whatever you do to stop them. In this presentation, we discuss the key components of a fraud prevention program in the cloud. Additionally, we provide techniques for detecting known and unknown fraud activity and explore different strategies for effectively preventing detected patterns. Finally, we discuss lessons learned from our own prevention activities as well as the best practices that you can apply to manage risk.
SID321 – How Capital One Applies AWS Organizations Best Practices to Manage Multiple AWS Accounts In this session, we review best practices for managing multiple AWS accounts using AWS Organizations. We cover how to think about the master account and your account strategy, as well as how to roll out changes. You learn how Capital One applies these best practices to manage its AWS accounts, which number over 160, and PCI workloads.
SID322 – The AWS Philosophy of Security AWS distinguished engineer Eric Brandwine speaks with hundreds of customers each year, and noticed one question coming up more than any other, “How does AWS operationalize its own security?” In this session, Eric details both strategic and tactical considerations, along with an insider’s look at AWS tooling and processes.
SID324 – Automating DDoS Response in the Cloud If left unmitigated, Distributed Denial of Service (DDoS) attacks have the potential to harm application availability or impair application performance. DDoS attacks can also act as a smoke screen for intrusion attempts or as a harbinger for attacks against non-cloud infrastructure. Accordingly, it’s crucial that developers architect for DDoS resiliency and maintain robust operational capabilities that allow for rapid detection and engagement during high-severity events. In this session, you learn how to build a DDoS-resilient application and how to use services like AWS Shield and Amazon CloudWatch to defend against DDoS attacks and automate response to attacks in progress.
SID325 – Amazon Macie: Data Visibility Powered by Machine Learning for Security and Compliance Workloads In this session, Edmunds discusses how they create workflows to manage their regulated workloads with Amazon Macie, a newly-released security and compliance management service that leverages machine learning to classify your sensitive data and business-critical information. Amazon Macie uses recurrent neural networks (RNN) to identify and alert potential misuse of intellectual property. They do a deep dive into machine learning within the security ecosystem.
SID326 – AWS Security State of the Union Steve Schmidt, chief information security officer at AWS, addresses the current state of security in the cloud, with a particular focus on feature updates, the AWS internal “secret sauce,” and what’s on horizon in terms of security, identity, and compliance tooling.
SID327 – How Zocdoc Achieved Security and Compliance at Scale With Infrastructure as Code In less than 12 months, Zocdoc became a cloud-first organization, diversifying their tech stack and liberating data to help drive rapid product innovation. Brian Lozada, CISO at Zocdoc, and Zhen Wang, Director of Engineering, provide an overview on how their teams recognized that infrastructure as code was the most effective approach for their security policies to scale across their AWS infrastructure. They leveraged tools such as AWS CloudFormation, hardened AMIs, and hardened containers. The use of DevSecOps within Zocdoc has enhanced data protection with the use of AWS services such as AWS KMS and AWS CloudHSM and auditing capabilities, and event-based policy enforcement with Amazon Elasticsearch Service and Amazon CloudWatch, all built on top of AWS.
SID328 – Cloud Adoption in Regulated Financial Services Macquarie, a global provider of financial services, identified early on that it would require strong partnership between its business, technology and risk teams to enable the rapid adoption of AWS cloud technologies. As a result, Macquarie built a Cloud Governance Platform to enable its risk functions to move as quickly as its development teams. This platform has been the backbone of Macquarie’s adoption of AWS over the past two years and has enabled Macquarie to accelerate its use of cloud technologies for the benefit of clients across multiple global markets. This talk will outline the strategy that Macquarie embarked on, describe the platform they built, and provide examples of other organizations who are on a similar journey.
SID329 – A Deep Dive into AWS Encryption Services AWS Encryption Services provide an easy and cost-effective way to protect your data in AWS. In this session, you learn about leveraging the latest encryption management features to minimize risk for your data.
SID330 – Best Practices for Implementing Your Encryption Strategy Using AWS Key Management Service AWS Key Management Service (KMS) is a managed service that makes it easy for you to create and manage the encryption keys used to encrypt your data. In this session, we will dive deep into best practices learned by implementing AWS KMS at AWS’s largest enterprise clients. We will review the different capabilities described in the AWS Cloud Adoption Framework (CAF) Security Perspective and how to implement these recommendations using AWS KMS. In addition to sharing recommendations, we will also provide examples that will help you protect sensitive information on the AWS Cloud.
SID331 – Architecting Security and Governance Across a Multi-Account Strategy Whether it is per business unit or per application, many AWS customers use multiple accounts to meet their infrastructure isolation, separation of duties, and billing requirements. In this session, we discuss considerations, limitations, and security patterns when building out a multi-account strategy. We explore topics such as identity federation, cross-account roles, consolidated logging, and account governance. Thomson Reuters shared their journey and their approach to a multi-account strategy. At the end of the session, we present an enterprise-ready, multi-account architecture that you can start leveraging today.
SID332 – Identity Management for Your Users and Apps: A Deep Dive on Amazon Cognito Learn how to set up an end-user directory, secure sign-up and sign-in, manage user profiles, authenticate and authorize your APIs, federate from enterprise and social identity providers, and use OAuth to integrate with your app—all without any server setup or code. With clear blueprints, we show you how to leverage Amazon Cognito to administer and secure your end users and enable identity for the applied patterns of mobile, web, and enterprise apps.
SID334 – How Amazon Business Uses Amazon Cloud Directory as the Data Store for Its Account Management Platform Join the Amazon Business team to learn how it uses Amazon Cloud Directory as the data store for its account management platform. You will learn how Amazon Business uses Amazon DynamoDB with Cloud Directory to manage user authorization and business process workflows. You also will learn how Cloud Directory helps to manage hierarchical datasets and how to get started modeling these datasets in Cloud Directory.
SID335 – Implementing Security and Governance across a Multi-Account Strategy As existing or new organizations expand their AWS footprint, managing multiple accounts while maintaining security quickly becomes a challenge. In this chalk talk, we will demonstrate how AWS Organizations, IAM roles, identity federation, and cross-account manager can be combined to build a scalable multi-account management platform. By the end of this session, attendees will have the understanding and deployment patterns to bring a secure, flexible and automated multi-account management platform to their organizations.
SID336 – Use AWS to Effectively Manage GDPR Compliance The General Data Protection Regulation (GDPR) is considered to be the most stringent privacy regulation ever enacted. Complying with GDPR could be a challenge for organizations, and AWS services can help get you ahead of the May 2018 enforcement deadline. In this chalk talk, the Legal and Compliance GDPR leadership at AWS discusses what enforcement of GDPR might mean for you and your customer’s compliance programs.
SID337 – Best Practices for Managing Access to AWS Resources Using IAM Roles In this chalk talk, we discuss why using temporary security credentials to manage access to your AWS resources is an AWS Identity and Access Management (IAM) best practice. IAM roles help you follow this best practice by delivering and rotating temporary credentials automatically. We discuss the different types of IAM roles, the assume role functionality, and how to author fine-grained trust and access policies that limit the scope of IAM roles. We then show you how to attach IAM roles to your AWS resources, such as Amazon EC2 instances and AWS Lambda functions. We also discuss migrating applications that use long-term AWS access keys to temporary credentials managed by IAM roles.
SID338 – [email protected] Once a customer achieves success with using AWS in a few pilot projects, most look to rapidly adopt an “all-in” enterprise migration strategy. Along this journey, several new challenges emerge that quickly become blockers and slow down migrations if they are not addressed properly. At this scale, customers will deal with the governance of hundreds of accounts, as well as thousands of IT resources residing within those accounts. Humans and traditional IT management processes cannot scale at the same pace and inevitably challenging questions emerge. In this session, we discuss those questions about governance at scale.
SID339 – Deep Dive on AWS CloudHSM Organizations building applications that handle confidential or sensitive data are subject to many types of regulatory requirements and often rely on hardware security modules (HSMs) to provide validated control of encryption keys and cryptographic operations. AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud using FIPS 140-2 Level 3 validated HSMs. This chalk talk will provide you a deep-dive on CloudHSM, and demonstrate how you can quickly and easily use CloudHSM to help secure your data and meet your compliance requirements.
SID340 – Using Infrastructure as Code to Inject Security Best Practices as Part of the Software Deployment Lifecycle A proactive approach to security is key to securing your applications as part of software deployment. In this chalk talk, T. Rowe Price, a financial asset management institution, outlines how they built their security automation process in enabling their numerous developer teams to rapidly and securely build and deploy applications at scale on AWS. Learn how they use services like AWS Identity and Access Management (IAM), HashiCorp tools, Terraform for automation, and Vault for secrets management, and incorporate certificate management and monitoring as part of the deployment process. T. Rowe Price discusses lessons learned and best practices to move from a tightly controlled legacy environment to an agile, automated software development process on AWS.
SID341 – Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection This workshop gives you an opportunity to develop a solution that can continuously monitor for and detect a realistic threat by analyzing AWS CloudTrail log data. Participants are provided with a CloudTrail data source and some clues to get started. Then you have to design a system that can process the logs, detect the threat, and trigger an alarm. You can make use of any AWS services that can assist in this endeavor, such as AWS Lambda for serverless detection logic, Amazon CloudWatch or Amazon SNS for alarming and notification, Amazon S3 for data and configuration storage, and more.
SID342 – Protect Your Web Applications from Common Attack Vectors Using AWS WAF As attacks and attempts to exploit vulnerabilities in web applications become more sophisticated, having an effective web request filtering solution becomes key to keeping your users’ data safe. In this workshop, discover how the OWASP Top 10 list of application security risks can help you secure your web applications. Learn how to use AWS services, such as AWS WAF, to mitigate vulnerabilities. This session includes hands-on labs to help you build a solution. Key learning goals include understanding the breadth and complexity of vulnerabilities customers need to protect from, understanding the AWS tools and capabilities that can help mitigate vulnerabilities, and learning how to configure effective HTTP request filtering rules using AWS WAF.
SID343 – User Management and App Authentication with Amazon Cognito Are you curious about how to authenticate and authorize your applications on AWS? Have you thought about how to integrate AWS Identity and Access Management (IAM) with your app authentication? Have you tried to integrate third-party SAML providers with your app authentication? Look no further. This workshop walks you through step by step to configure and create Amazon Cognito user pools and identity pools. This workshop presents you with the framework to build an application using Java, .NET, and serverless. You choose the stack and build the app with local users. See the service being used not only with mobile applications but with other stacks that normally don’t include Amazon Cognito.
SID344 – Soup to Nuts: Identity Federation for AWS AWS offers customers multiple solutions for federating identities on the AWS Cloud. In this session, we will embark on a tour of these solutions and the use cases they support. Along the way, we will dive deep with demonstrations and best practices to help you be successful managing identities on the AWS Cloud. We will cover how and when to use Security Assertion Markup Language 2.0 (SAML), OpenID Connect (OIDC), and other AWS native federation mechanisms. You will learn how these solutions enable federated access to the AWS Management Console, APIs, and CLI, AWS Infrastructure and Managed Services, your web and mobile applications running on the AWS Cloud, and much more.
SID345 – AWS Encryption SDK: The Busy Engineer’s Guide to Client-Side Encryption You know you want client-side encryption for your service but you don’t know exactly where to start. Join us for a hands-on workshop where we review some of your client-side encryption options and explore implementing client-side encryption using the AWS Encryption SDK. In this session, we cover the basics of client-side encryption, perform encrypt and decrypt operations using AWS KMS and the AWS Encryption SDK, and discuss security and performance considerations when implementing client-side encryption in your service.
Expert level
SID401 – Let’s Dive Deep Together: Advancing Web Application Security Beginning with a recap of best practices in CloudFront, AWS WAF, Route 53, and Amazon VPC security, we break into small teams to work together on improving the security of a typical web application. How can we creatively use the services? What additional features would help us? This technically advanced chalk talk requires certification at the solutions architect associate level or greater.
SID402 – An AWS Security Odyssey: Implementing Security Controls in the World of Internet, Big Data, IoT and E-Commerce Platforms This workshop will give participants the opportunity to take a security-focused journey across various AWS services and implement automated controls along the way. You will learn how to apply AWS security controls to services such as Amazon EC2, Amazon S3, AWS Lambda, and Amazon VPC. In short, you will learn how to use the cloud to protect the cloud. We will talk about how to: Adopt a workload-centric approach to your security strategy, Address security issues in a cost-effective manner Automate your security responses to promote maturity and auditability. In order to complete this workshop, attendees will need a laptop with wireless access, an AWS account and an IAM user that has full administrative privileges within their account. AWS credits will be provided as attendees depart the session to cover the cost of running the workshop in their own account.
SID404 – Amazon Inspector – Automating the “Sec” in DevSecOps Adopting DevSecOps can be challenging using traditional security tools that are designed for on-premises infrastructure. Amazon Inspector is an automated security assessment service that helps you adopt DevSecOps by integrating security assessments directly into the development process of applications running on Amazon EC2. We dive deep on how to use Inspector to automate host security assessments. We show you how to integrate Inspector with other AWS Cloud services to provide automated security assessments throughout your development process. We demo installing the AWS agent, setting up assessment targets and templates, and running assessments. We review the findings and discuss how you can automate the management and remediation of those findings with your available AWS services.
SID405 – Five New Security Automation Improvements You Can Make by Using Amazon CloudWatch Events and AWS Config Rules This presentation will include a deep dive into the code behind multiple security automation and remediation functions. This session will consider potential use cases, as well as feature a demonstration of a proposed script, and then walk through the code set to explain the various challenges and solutions of the intended script. All examples of code will be previously unreleased and will feature integration with services such as Trusted Advisor and Macie. All code will be released as OSS after re:Invent.
It’s almost always a good idea to support two-factor authentication (2FA), especially for back-office systems. 2FA comes in many different forms, some of which include SMS, TOTP, or even hardware tokens.
Enabling them requires a similar flow:
The user goes to their profile page (skip this if you want to force 2fa upon registration)
Clicks “Enable two-factor authentication”
Enters some data to enable the particular 2FA method (phone number, TOTP verification code, etc.)
Next time they login, in addition to the username and password, the login form requests the 2nd factor (verification code) and sends that along with the credentials
I will focus on Google Authenticator, which uses a TOTP (Time-based one-time password) for generating a sequence of verification codes. The ideas is that the server and the client application share a secret key. Based on that key and on the current time, both come up with the same code. Of course, clocks are not perfectly synced, so there’s a window of a few codes that the server accepts as valid.
How to implement that with Java (on the server)? Using the GoogleAuth library. The flow is as follows:
The user goes to their profile page
Clicks “Enable two-factor authentication”
The server generates a secret key, stores it as part of the user profile and returns a URL to a QR code
The user scans the QR code with their Google Authenticator app thus creating a new profile in the app
The user enters the verification code shown the app in a field that has appeared together with the QR code and clicks “confirm”
The server marks the 2FA as enabled in the user profile
If the user doesn’t scan the code or doesn’t verify the process, the user profile will contain just a orphaned secret key, but won’t be marked as enabled
There should be an option to later disable the 2FA from their user profile page
The most important bit from theoretical point of view here is the sharing of the secret key. The crypto is symmetric, so both sides (the authenticator app and the server) have the same key. It is shared via a QR code that the user scans. If an attacker has control on the user’s machine at that point, the secret can be leaked and thus the 2FA – abused by the attacker as well. But that’s not in the threat model – in other words, if the attacker has access to the user’s machine, the damage is already done anyway.
Upon login, the flow is as follows:
The user enters username and password and clicks “Login”
Using an AJAX request the page asks the server whether this email has 2FA enabled
If 2FA is not enabled, just submit the username & password form
If 2FA is enabled, the login form is not submitted, but instead an additional field is shown to let the user input the verification code from the authenticator app
After the user enters the code and presses login, the form can be submitted. Either using the same login button, or a new “verify” button, or the verification input + button could be an entirely new screen (hiding the username/password inputs).
The server then checks again if the user has 2FA enabled and if yes, verifies the verification code. If it matches, login is successful. If not, login fails and the user is allowed to reenter the credentials and the verification code. Note here that you can have different responses depending on whether username/password are wrong or in case the code is wrong. You can also attempt to login prior to even showing the verification code input. That way is arguably better, because that way you don’t reveal to a potential attacker that the user uses 2FA.
While I’m speaking of username and password, that can apply to any other authentication method. After you get a success confirmation from an OAuth / OpenID Connect / SAML provider, or after you can a token from SecureLogin, you can request the second factor (code).
In code, the above processes look as follows (using Spring MVC; I’ve merged the controller and service layer for brevity. You can replace the @AuthenticatedPrincipal bit with your way of supplying the currently logged in user details to the controllers). Assuming the methods are in controller mapped to “/user/”:
@RequestMapping(value = "/init2fa", method = RequestMethod.POST)
@ResponseBody
public String initTwoFactorAuth(@AuthenticationPrincipal LoginAuthenticationToken token) {
User user = getLoggedInUser(token);
GoogleAuthenticatorKey googleAuthenticatorKey = googleAuthenticator.createCredentials();
user.setTwoFactorAuthKey(googleAuthenticatorKey.getKey());
dao.update(user);
return GoogleAuthenticatorQRGenerator.getOtpAuthURL(GOOGLE_AUTH_ISSUER, email, googleAuthenticatorKey);
}
@RequestMapping(value = "/confirm2fa", method = RequestMethod.POST)
@ResponseBody
public boolean confirmTwoFactorAuth(@AuthenticationPrincipal LoginAuthenticationToken token, @RequestParam("code") int code) {
User user = getLoggedInUser(token);
boolean result = googleAuthenticator.authorize(user.getTwoFactorAuthKey(), code);
user.setTwoFactorAuthEnabled(result);
dao.update(user);
return result;
}
@RequestMapping(value = "/disable2fa", method = RequestMethod.GET)
@ResponseBody
public void disableTwoFactorAuth(@AuthenticationPrincipal LoginAuthenticationToken token) {
User user = getLoggedInUser(token);
user.setTwoFactorAuthKey(null);
user.setTwoFactorAuthEnabled(false);
dao.update(user);
}
@RequestMapping(value = "/requires2fa", method = RequestMethod.POST)
@ResponseBody
public boolean login(@RequestParam("email") String email) {
// TODO consider verifying the password here in order not to reveal that a given user uses 2FA
return userService.getUserDetailsByEmail(email).isTwoFactorAuthEnabled();
}
On the client side it’s simple AJAX requests to the above methods (sidenote: I kind of feel the term AJAX is no longer trendy, but I don’t know how to call them. Async? Background? Javascript?).
The login form code depends very much on the existing login form you are using, but the point is to call the /requires2fa with the email (and password) to check if 2FA is enabled and then show a verification code input.
Overall, the implementation if two-factor authentication is simple and I’d recommend it for most systems, where security is more important than simplicity of the user experience.
AWS Developer Tools is a set of services that include AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Together, these services help you securely store and maintain version control of your application’s source code and automatically build, test, and deploy your application to AWS or your on-premises environment. These services are designed to enable developers and IT professionals to rapidly and safely deliver software.
As part of our continued commitment to extend the AWS Developer Tools ecosystem to third-party tools and services, we’re pleased to announce AWS CodeStar and AWS CodeBuild now integrate with GitHub. This will make it easier for GitHub users to set up a continuous integration and continuous delivery toolchain as part of their release process using AWS Developer Tools.
AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. Its unified user interface helps you easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, so you can start releasing code faster.
When AWS CodeStar launched in April of this year, it used AWS CodeCommit as the hosted source repository. You can now choose between AWS CodeCommit or GitHub as the source control service for your CodeStar projects. In addition, your CodeStar project dashboard lets you centrally track GitHub activities, including commits, issues, and pull requests. This makes it easy to manage project activity across the components of your CI/CD toolchain. Adding the GitHub dashboard view will simplify development of your AWS applications.
In this section, I will show you how to use GitHub as the source provider for your CodeStar projects. I’ll also show you how to work with recent commits, issues, and pull requests in the CodeStar dashboard.
Sign in to the AWS Management Console and from the Services menu, choose CodeStar. In the CodeStar console, choose Create a new project. You should see the Choose a project template page.
Choose an option by programming language, application category, or AWS service. I am going to choose the Ruby on Rails web application that will be running on Amazon EC2.
On the Project details page, you’ll now see the GitHub option. Type a name for your project, and then choose Connect to GitHub.
You’ll see a message requesting authorization to connect to your GitHub repository. When prompted, choose Authorize, and then type your GitHub account password.
This connects your GitHub identity to AWS CodeStar through OAuth. You can always review your settings by navigating to your GitHub application settings.
You’ll see AWS CodeStar is now connected to GitHub:
You can choose a public or private repository. GitHub offers free accounts for users and organizations working on public and open source projects and paid accounts that offer unlimited private repositories and optional user management and security features.
In this example, I am going to choose the public repository option. Edit the repository description, if you like, and then choose Next.
Review your CodeStar project details, and then choose Create Project. On Choose an Amazon EC2 Key Pair, choose Create Project.
On the Review project details page, you’ll see Edit Amazon EC2 configuration. Choose this link to configure instance type, VPC, and subnet options. AWS CodeStar requires a service role to create and manage AWS resources and IAM permissions. This role will be created for you when you select the AWS CodeStar would like permission to administer AWS resources on your behalf check box.
Choose Create Project. It might take a few minutes to create your project and resources.
When you create a CodeStar project, you’re added to the project team as an owner. If this is the first time you’ve used AWS CodeStar, you’ll be asked to provide the following information, which will be shown to others:
Your display name.
Your email address.
This information is used in your AWS CodeStar user profile. User profiles are not project-specific, but they are limited to a single AWS region. If you are a team member in projects in more than one region, you’ll have to create a user profile in each region.
User settings
Choose Next. AWS CodeStar will create a GitHub repository with your configuration settings (for example, https://github.com/biyer/ruby-on-rails-service).
When you integrate your integrated development environment (IDE) with AWS CodeStar, you can continue to write and develop code in your preferred environment. The changes you make will be included in the AWS CodeStar project each time you commit and push your code.
After setting up your IDE, choose Next to go to the CodeStar dashboard. Take a few minutes to familiarize yourself with the dashboard. You can easily track progress across your entire software development process, from your backlog of work items to recent code deployments.
After the application deployment is complete, choose the endpoint that will display the application.
This is what you’ll see when you open the application endpoint:
The Commit history section of the dashboard lists the commits made to the Git repository. If you choose the commit ID or the Open in GitHub option, you can use a hotlink to your GitHub repository.
Your AWS CodeStar project dashboard is where you and your team view the status of your project resources, including the latest commits to your project, the state of your continuous delivery pipeline, and the performance of your instances. This information is displayed on tiles that are dedicated to a particular resource. To see more information about any of these resources, choose the details link on the tile. The console for that AWS service will open on the details page for that resource.
You can also filter issues based on their status and the assigned user.
AWS CodeBuild Now Supports Building GitHub Pull Requests
CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can use prepackaged build environments to get started quickly or you can create custom build environments that use your own build tools.
We recently announced support for GitHub pull requests in AWS CodeBuild. This functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild. You can use the AWS CodeBuild or AWS CodePipeline consoles to run AWS CodeBuild. You can also automate the running of AWS CodeBuild by using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the AWS CodeBuild Plugin for Jenkins.
In this section, I will show you how to trigger a build in AWS CodeBuild with a pull request from GitHub through webhooks.
Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/. Choose Create project. If you already have a CodeBuild project, you can choose Edit project, and then follow along. CodeBuild can connect to AWS CodeCommit, S3, BitBucket, and GitHub to pull source code for builds. For Source provider, choose GitHub, and then choose Connect to GitHub.
After you’ve successfully linked GitHub and your CodeBuild project, you can choose a repository in your GitHub account. CodeBuild also supports connections to any public repository. You can review your settings by navigating to your GitHub application settings.
On Source: What to Build, for Webhook, select the Rebuild every time a code change is pushed to this repository check box.
Note: You can select this option only if, under Repository, you chose Use a repository in my account.
In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild. For Operating system, choose Ubuntu. For Runtime, choose Base. For Version, choose the latest available version. For Build specification, you can provide a collection of build commands and related settings, in YAML format (buildspec.yml) or you can override the build spec by inserting build commands directly in the console. AWS CodeBuild uses these commands to run a build. In this example, the output is the string “hello.”
On Artifacts: Where to put the artifacts from this build project, for Type, choose No artifacts. (This is also the type to choose if you are just running tests or pushing a Docker image to Amazon ECR.) You also need an AWS CodeBuild service role so that AWS CodeBuild can interact with dependent AWS services on your behalf. Unless you already have a role, choose Create a role, and for Role name, type a name for your role.
In this example, leave the advanced settings at their defaults.
If you expand Show advanced settings, you’ll see options for customizing your build, including:
A build timeout.
A KMS key to encrypt all the artifacts that the builds for this project will use.
Options for building a Docker image.
Elevated permissions during your build action (for example, accessing Docker inside your build container to build a Dockerfile).
Resource options for the build compute type.
Environment variables (built-in or custom). For more information, see Create a Build Project in the AWS CodeBuild User Guide.
You can use the AWS CodeBuild console to create a parameter in Amazon EC2 Systems Manager. Choose Create a parameter, and then follow the instructions in the dialog box. (In that dialog box, for KMS key, you can optionally specify the ARN of an AWS KMS key in your account. Amazon EC2 Systems Manager uses this key to encrypt the parameter’s value during storage and decrypt during retrieval.)
Choose Continue. On the Review page, either choose Save and build or choose Save to run the build later.
Choose Start build. When the build is complete, the Build logs section should display detailed information about the build.
To demonstrate a pull request, I will fork the repository as a different GitHub user, make commits to the forked repo, check in the changes to a newly created branch, and then open a pull request.
As soon as the pull request is submitted, you’ll see CodeBuild start executing the build.
GitHub sends an HTTP POST payload to the webhook’s configured URL (highlighted here), which CodeBuild uses to download the latest source code and execute the build phases.
If you expand the Show all checks option for the GitHub pull request, you’ll see that CodeBuild has completed the build, all checks have passed, and a deep link is provided in Details, which opens the build history in the CodeBuild console.
Summary:
In this post, I showed you how to use GitHub as the source provider for your CodeStar projects and how to work with recent commits, issues, and pull requests in the CodeStar dashboard. I also showed you how you can use GitHub pull requests to automatically trigger a build in AWS CodeBuild — specifically, how this functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild.
About the author:
Balaji Iyer is an Enterprise Consultant for the Professional Services Team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly scalable distributed systems, serverless architectures, large scale migrations, operational security, and leading strategic AWS initiatives. Before he joined Amazon, Balaji spent more than a decade building operating systems, big data analytics solutions, mobile services, and web applications. In his spare time, he enjoys experiencing the great outdoors and spending time with his family.
Amazon Cognito user pools are full-fledged identity providers (IdP) that you can use to maintain a user directory. The directory can scale to hundreds of millions of users and also add sign-up and sign-in support to your mobile or web applications.
You can provide access from an Amazon Cognito user pool to Amazon QuickSight with IAM identity federation in a serverless way. This solution uses Amazon API Gateway and AWS Lambda as a backend fronted by a simple web app. New features such as Amazon Cognito user pools app Integration make it even easier to add sign-in and sign-up logic to your application and federation use cases.
Solution architecture
Here’s how you can accomplish this solution.
In this scenario, your web app hosted on Amazon S3 integrates with Amazon Cognito User Pools to authenticate users. It uses Amazon Cognito Federated Identities to authorize access to Amazon QuickSight on behalf of the authenticated user, with temporary AWS credentials and appropriate permissions. The app then uses an ID token generated by Amazon Cognito to call API Gateway and Lambda to obtain a sign-in token for Amazon QuickSight from AWS Sign-In Federation. With this token, the app redirects access to Amazon QuickSight:
The Amazon Cognito hosted UI provided by the app integration domain performs all sign-in, sign-up, verification, and authentication logic for the web app. This allows you to register and authenticate users.
After a user is authenticated with a valid user name and password, an OpenID Connect token (ID token) is sent to Amazon Cognito Federated Identities. The token retrieves temporary AWS credentials based on an IAM role with “quickSight:CreateUser” permissions. These credentials are used to build a session string that is encoded into the URL https://signin.aws.amazon.com/federation?Action=getSigninToken.
The ID token, along with the encoded URL, is sent to API Gateway, which in turn verifies the token with a user pool authorizer to authorize the API call.
The URL is passed on to a Lambda function that calls the AWS SSO federation endpoint to retrieve a sign-In token.
AWS SSO processes the federation request, authenticates the user, and forwards the authentication token to Amazon QuickSight, which then uses the authentication token and authorizes user access.
How can you use, configure and test this serverless solution in your own AWS account? I created a simple SAM (Serverless Application Model) template that can be used to spin up all the resources needed for the solution.
Implementation steps
Using the AWS CLI, create an S3 bucket in the same region in which to deploy all resources:
Then, execute a couple of commands in a terminal or command prompt from inside the uncompressed folder where all files are located, referring to the S3 bucket created earlier:
AWS CloudFormation automatically creates and configures the following resources in your account:
Amazon CloudFront distribution
S3 static website
Amazon Cognito user pool
Amazon Cognito identity pool
IAM role for authenticated users
API Gateway API
Lambda function
You can follow the progress of the stack creation from the CloudFormation console. View the Outputs tab for the completed stack to get the identifiers of all created resources. You could also execute the following command with the AWS CLI:
aws cloudformation describe-stacks --query 'Stacks[0].[Outputs[].[OutputKey,OutputValue]]|[]' --output text --stack-name CognitoQuickSight
Use the information from the console or CLI command to replace the related resource identifiers in the file “auth.js”.
In the Amazon Cognito User Pools console, select the pool named QuickSightUsers generated by CloudFormation.
Under App integration, choose Domain name and create a domain. Domain names must be unique to the region. Add the domain to the “auth.js” file accordingly:
Choose App integration, App client settings and then select the option Cognito User Pool. Add the CloudFront distribution address (with https://, as SSL is a requirement for the callback/sign out URLs) and make sure that the address matches the related settings in the “auth.js” file exactly. For Allowed OAuth Flows, select implicit grant. For Allowed OAuth Scopes, select openid.
The app integration configuration is now done. Your “auth.js” file should look like the following:
The preceding resources don’t exist anymore. The CloudFormation stack that generated them was deleted. I recommend that you delete your stack after testing, for cleanup purposes. Deleting the stack also deletes all the resources.
Next, upload the four JS and HTML files to the S3 bucket named “cognitoquicksight-s3website-xxxxxxxxx”. Make sure that all files are publicly readable:
Congratulations, the configuration part is now finished!
It’s time to create your first user. Access your CloudFront distribution address in a browser and choose SIGN IN / SIGN UP.
On the Amazon Cognito hosted UI, choose SIGN UP and provide a user name, password and a valid email.
You receive a verification code in email to confirm the user.
In a production system, you might not want to allow open access to your dashboards. As you now have a confirmed user, you can disable the sign-up functionality altogether to avoid letting other users sign themselves up.
In the Amazon Cognito User Pools console, choose General settings, Policies and select Only allow administrators to create users.
In the web app, you can now sign in as the Amazon Cognito user to access the Amazon QuickSight console. Because this is the first time this user is accessing Amazon QuickSight with an IAM role, provide your email address and sign up as an Amazon QuickSight user.
Enjoy your federated access to Amazon QuickSight!
After you’re done testing, go to the CloudFormation console and delete the CognitoQuickSight stack to remove all the resources.
Extending and customizing the solution
Additionally, you could configure SAML federation for your user pool with a couple of clicks, following the instructions in the Amazon Cognito User Pools supports federation with SAML post. If you add more than one SAML IdP, Amazon Cognito can identify the provider to which to redirect the authentication request, based on the user’s corporate email address.
It’s important to understand that while Amazon Cognito User Pools is authenticating (AuthN) the user, the IAM role created for the identity pool is authorizing (AuthZ) the user to perform actions on specific resources. As it is configured, the role only allows “quickSight:CreateUser” permissions. For additional permissions, modify the role accordingly, as in Setting Your IAM Policy. If your users create datasets, remember to add access to data sources such as Amazon S3.
You can customize this solution further by adding multiple groups to your user pool and associating each group with different IAM roles. For more information, see Amazon Cognito Groups and Fine-Grained Role-Based Access Control. For instance, with role-based access control, it’s possible to have a group for Amazon QuickSight administrators and one for users.
You can also modify the sign-In URL (Step 6) to redirect your user to any other service console, provided that the IAM role has appropriate permissions. After receiving a valid sign-in token from the SSO federation endpoint, the user is redirected to https://quicksight.aws.amazon.com. However, you can change the redirection to the main AWS console at https://console.aws.amazon.com. You could also change it to specific services, such as Amazon Redshift, Amazon EMR, Amazon Elasticsearch Service and Kibana (In this particular case the application needs to return an AWS Signature Version 4 – Sigv4 – signed URL based on the temporary credentials received from Cognito instead of calling the AWS Sign-In Federation endpoint), Amazon Kinesis, or AWS Lambda. Or, you can even customize a frontend portal with separate links to multiple specific services and resources that the user is only allowed to access provided that the IAM role being assumed has access to those services and resources.
Conclusion
With the power, flexibility, security, scalability, and the new federation and application integration features of Amazon Cognito user pools, there’s no need to worry about the undifferentiated heavy lifting of maintaining your own identity servers. This allows you to focus on application logic and secure access to other great AWS services, such as Amazon QuickSight.
If you have questions or suggestions, please comment below.
About the Author
Ed Lima is a Solutions Architect who helps AWS customers with their journey in the cloud. He has provided thought leadership to define and drive strategic direction for adoption of Amazon platforms and technologies, skillfully adapting and blending business requirements with technical aspects to achieve the best outcome helping implement well architected solutions. In his spare time, he enjoys snowboarding.
As part of the AWS Online Tech Talks series, AWS will present How to Use AWS WAF to Mitigate OWASP Top 10 Attacks on Thursday, September 28. This tech talk will start at 9:00 A.M. Pacific Time and end at 9:40 A.M. Pacific Time.
The Open Web Application Security Project (OWASP) Top 10 identifies the most critical vulnerabilities that web developers must address in their applications. AWS WAF, a web application firewall, helps you address the vulnerabilities identified in the OWASP Top 10. In this webinar, you will learn how to use AWS WAF to write rules to match common patterns of exploitation and block malicious requests from reaching your web servers.
The SecureLogin protocol is very interesting, as it does not rely on any central party (e.g. OAuth providers like Facebook and Twitter), thus avoiding all the pitfalls of OAuth (which Homakov has often criticized). It is not a password manager either. It is just a client-side software that performs a bit of crypto in order to prove to the server that it is indeed the right user. For that to work, two parts are key:
Using a master password to generate a private key. It uses a key-derivation function, which guarantees that the produced private key has sufficient entropy. That way, using the same master password and the same email, you will get the same private key everytime you use the password, and therefore the same public key. And you are the only one who can prove this public key is yours, by signing a message with your private key.
Service providers (websites) identify you by your public key by storing it in the database when you register and then looking it up on each subsequent login
The client-side part is performed ideally by a native client – a browser plugin (one is available for Chrome) or a OS-specific application (including mobile ones). That may sound tedious, but it’s actually quick and easy and a one-time event (and is easier than password managers).
I have to admit – I like it, because I’ve been having a similar idea for a while. In my “biometric identification” presentation (where I discuss the pitfalls of using biometrics-only identification schemes), I proposed (slide 23) an identification scheme that uses biometrics (e.g. scanned with your phone) + a password to produce a private key (using a key-derivation function). And the biometric can easily be added to SecureLogin in the future.
It’s not all roses, of course, as one issue isn’t fully resolved yet – revocation. In case someone steals your master password (or you suspect it might be stolen), you may want to change it and notify all service providers of that change so that they can replace your old public key with a new one. That has two implications – first, you may not have a full list of sites that you registered on, and since you may have changed devices, or used multiple devices, there may be websites that never get to know about your password change. There are proposed solutions (points 3 and 4), but they are not intrinsic to the protocol and rely on centralized services. The second issue is – what if the attacker changes your password first? To prevent that, service providers should probably rely on email verification, which is neither part of the protocol, nor is encouraged by it. But you may have to do it anyway, as a safeguard.
Homakov has not only defined a protocol, but also provided implementations of the native clients, so that anyone can start using it. So I decided to add it to a project I’m currently working on (the login page is here). For that I needed a java implementation of the server verification, and since no such implementation existed (only ruby and node.js are provided for now), I implemented it myself. So if you are going to use SecureLogin with a Java web application, you can use that instead of rolling out your own. While implementing it, I hit a few minor issues that may lead to protocol changes, so I guess backward compatibility should also be somehow included in the protocol (through versioning).
So, how does the code look like? On the client side you have a button and a little javascript:
<!-- get the latest sdk.js from the GitHub repo of securelogin
or include it from https://securelogin.pw/sdk.js -->
<script src="js/securelogin/sdk.js"></script>
....
<p class="slbutton" id="securelogin">⚡ SecureLogin</p>
$("#securelogin").click(function() {
SecureLogin(function(sltoken){
// TODO: consider adding csrf protection as in the demo applications
// Note - pass as request body, not as param, as the token relies
// on url-encoding which some frameworks mess with
$.post('/app/user/securelogin', sltoken, function(result) {
if(result == 'ok') {
window.location = "/app/";
} else {
$.notify("Login failed, try again later", "error");
}
});
});
return false;
});
A single button can be used for both login and signup, or you can have a separate signup form, if it has to include additional details rather than just an email. Since I added SecureLogin in addition to my password-based login, I kept the two forms.
On the server, you simply do the following:
@RequestMapping(value = "/securelogin/register", method = RequestMethod.POST)
@ResponseBody
public String secureloginRegister(@RequestBody String token, HttpServletResponse response) {
try {
SecureLogin login = SecureLogin.verify(request.getSecureLoginToken(), Options.create(websiteRootUrl));
UserDetails details = userService.getUserDetailsByEmail(login.getEmail());
if (details == null || !login.getRawPublicKey().equals(details.getSecureLoginPublicKey())) {
return "failure";
}
// sets the proper cookies to the response
TokenAuthenticationService.addAuthentication(response, login.getEmail(), secure));
return "ok";
} catch (SecureLoginVerificationException e) {
return "failure";
}
}
This is spring-mvc, but it can be any web framework. You can also incorporate that into a spring-security flow somehow. I’ve never liked spring-security’s complexity, so I did it manually. Also, instead of strings, you can return proper status codes. Note that I’m doing a lookup by email and only then checking the public key (as if it’s a password). You can do the other way around if you have the proper index on the public key column.
I wouldn’t suggest having a SecureLogin-only system, as the project is still in an early stage and users may not be comfortable with it. But certainly adding it as an option is a good idea.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.