Tag Archives: Amazon RDS

Amazon Aurora Fast Database Cloning

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database-cloning/

Today, I want to quickly show off a feature of Amazon Aurora that I find incredibly useful: Fast Database Cloning. By taking advantage of Aurora’s underlying distributed storage engine you’re able to quickly and cheaply create a copy-on-write clone of your database.

In my career I’ve frequently spent time waiting on some representative sample of data to use in development, experiments, or analytics. If I had a 2TB database it could take hours just waiting for a copy of the data to be ready before I could peform my tasks. Even within RDS MySQL, I would still have to wait several hours for a snapshot copy to complete before I was able to test a schema migration or perform some analytics. Aurora solves this problem in a very interesting way.

The distributed storage engine for Aurora allows us to do things which are normally not feasible or cost-effective with a traditional database engine. By creating pointers to individual pages of data the storage engine enables fast database cloning. Then, when you make changes to the data in the source or the clone, a copy-on-write protocol creates a new copy of that page and updates the pointers. This means my 2TB snapshot restore job that used to take an hour is now ready in about 5 minutes – and most of that time is spent provisioning a new RDS instance.

The time it takes to create the clone is independent of the size of the database since we’re pointing at the same storage. It also makes cloning a very cost-effective operation since I only pay storage costs for the changed pages instead of an entire copy. The database clone is still a regular Aurora Database Cluster with all the same durability guarentees.

Let’s clone a database. First, I’ll select an Aurora (MySQL) instance and select “create-clone” from the Instance Actions.

Next I’ll name our clone dolly-the-sheep and provision it.

It took about 5 minutes and 30 seconds for my clone to become available and I started making some large schema changes and saw no performance impact. The schema changes themselves completed faster than they would have on traditional MySQL due to improvements the Aurora team made to enable faster DDL operations. I could subsequently create a clone-of-a-clone or even a clone-of-a-clone-of-a-clone (and so on) if I wanted to have another team member perform some tests on my schema changes while I continued to make changes of my own. It’s important to note here that clones are first class databases from the perspective of RDS. I still have all of the features that every other Aurora database supports: snapshots, backups, monitoring and more.

I hope this feature will allow you and your teams to save a lot of time and money on experimenting and developing applications based on Amazon Aurora. You can read more about this feature in the Amazon Aurora User Guide and I strongly suggest following the AWS Database Blog. Anurag Gupta’s posts on quorums and Amazon Aurora storage are particularly interesting.

Have follow-up questions or feedback? Ping us at [email protected], or leave a comment here. We’d love to get your thoughts and suggestions.

Randall

Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server

Post Syndicated from Jorge A. Lopez original https://aws.amazon.com/blogs/big-data/deploy-a-data-warehouse-quickly-with-amazon-redshift-amazon-rds-for-postgresql-and-tableau-server/

One of the benefits of a data warehouse environment using both Amazon Redshift and Amazon RDS for PostgreSQL is that you can leverage the advantages of each service. Amazon Redshift is a high performance, petabyte-scale data warehouse service optimized for the online analytical processing (OLAP) queries typical of analytic reporting and business intelligence applications. On the other hand, a service like RDS excels at transactional OLTP workloads such as inserting, deleting, or updating rows.

In the recent JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink post, we showed how you can deploy such an environment. Now, you can deploy a similar architecture using the Modern Data Warehouse on AWS Quick Start. The Quick Start is an automated deployment that uses AWS CloudFormation templates to launch, configure, and run the services required to deploy a data warehousing environment on AWS, based on Amazon Redshift and RDS for PostgreSQL.

The Quick Start also includes an instance of Tableau Server, running on Amazon EC2. This gives you the ability to host and serve analytic dashboards, workbooks and visualizations, supported by a trial license. You can play with the sample data source and dashboard, or create your own analyses by uploading your own data sets.

For more information about the Modern Data Warehouse on AWS Quick Start, download the full deployment guide. If you’re ready to get started, use one of the buttons below:

Option 1: Deploy Quick Start into a new VPC on AWS

Option 2: Deploy Quick Start into an existing VPC

If you have questions, please leave a comment below.


Next Steps

You can also join us for the webinar Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS on Tuesday, August 22, 2017. Pearson, the education and publishing company, will present best practices and lessons learned during their journey to Amazon Redshift and Tableau.

Newly Updated: Example AWS IAM Policies for You to Use and Customize

Post Syndicated from Deren Smith original https://aws.amazon.com/blogs/security/newly-updated-example-policies-for-you-to-use-and-customize/

To help you grant access to specific resources and conditions, the Example Policies page in the AWS Identity and Access Management (IAM) documentation now includes more than thirty policies for you to use or customize to meet your permissions requirements. The AWS Support team developed these policies from their experiences working with AWS customers over the years. The example policies cover common permissions use cases you might encounter across services such as Amazon DynamoDB, Amazon EC2, AWS Elastic Beanstalk, Amazon RDS, Amazon S3, and IAM.

In this blog post, I introduce the updated Example Policies page and explain how to use and customize these policies for your needs.

The new Example Policies page

The Example Policies page in the IAM User Guide now provides an overview of the example policies and includes a link to view each policy on a separate page. Note that each of these policies has been reviewed and approved by AWS Support. If you would like to submit a policy that you have found to be particularly useful, post it on the IAM forum.

To give you an idea of the policies we have included on this page, the following are a few of the EC2 policies on the page:

To see the full list of available policies, see the Example Polices page.

In the following section, I demonstrate how to use a policy from the Example Policies page and customize it for your needs.

How to customize an example policy for your needs

Suppose you want to allow an IAM user, Bob, to start and stop EC2 instances with a specific resource tag. After looking through the Example Policies page, you see the policy, Allows Starting or Stopping EC2 Instances a User Has Tagged, Programmatically and in the Console.

To apply this policy to your specific use case:

  1. Navigate to the Policies section of the IAM console.
  2. Choose Create policy.
    Screenshot of choosing "Create policy"
  3. Choose the Select button next to Create Your Own Policy. You will see an empty policy document with boxes for Policy Name, Description, and Policy Document, as shown in the following screenshot.
  4. Type a name for the policy, copy the policy from the Example Policies page, and paste the policy in the Policy Document box. In this example, I use “start-stop-instances-for-owner-tag” as the policy name and “Allows users to start or stop instances if the instance tag Owner has the value of their user name” as the description.
  5. Update the placeholder text in the policy (see the full policy that follows this step). For example, replace <REGION> with a region from AWS Regions and Endpoints and <ACCOUNTNUMBER> with your 12-digit account number. The IAM policy variable, ${aws:username}, is a dynamic property in the policy that automatically applies to the user to which it is attached. For example, when the policy is attached to Bob, the policy replaces ${aws:username} with Bob. If you do not want to use the key value pair of Owner and ${aws:username}, you can edit the policy to include your desired key value pair. For example, if you want to use the key value pair, CostCenter:1234, you can modify “ec2:ResourceTag/Owner”: “${aws:username}” to “ec2:ResourceTag/CostCenter”: “1234”.
    {
        "Version": "2012-10-17",
        "Statement": [
           {
          "Effect": "Allow",
          "Action": [
              "ec2:StartInstances",
              "ec2:StopInstances"
          ],
                 "Resource": "arn:aws:ec2:<REGION>:<ACCOUNTNUMBER>:instance/*",
                 "Condition": {
              "StringEquals": {
                  "ec2:ResourceTag/Owner": "${aws:username}"
              }
          }
            },
            {
                 "Effect": "Allow",
                 "Action": "ec2:DescribeInstances",
                 "Resource": "*"
            }
        ]
    }

  6. After you have edited the policy, choose Create policy.

You have created a policy that allows an IAM user to stop and start EC2 instances in your account, as long as these instances have the correct resource tag and the policy is attached to your IAM users. You also can attach this policy to an IAM group and apply the policy to users by adding them to that group.

Summary

We updated the Example Policies page in the IAM User Guide so that you have a central location where you can find examples of the most commonly requested and used IAM policies. In addition to these example policies, we recommend that you review the list of AWS managed policies, including the AWS managed policies for job functions. You can choose these predefined policies from the IAM console and associate them with your IAM users, groups, and roles.

We will add more IAM policies to the Example Policies page over time. If you have a useful policy you would like to share with others, post it on the IAM forum. If you have comments about this post, submit them in the “Comments” section below.

– Deren

AWS Hot Startups – July 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-july-2017/

Welcome back to another month of Hot Startups! Every day, startups are creating innovative and exciting businesses, applications, and products around the world. Each month we feature a handful of startups doing cool things using AWS.

July is all about learning! These companies are focused on providing access to tools and resources to expand knowledge and skills in different ways.

This month’s startups:

  • CodeHS – provides fun and accessible computer science curriculum for middle and high schools.
  • Insight – offers intensive fellowships to grow technical talent in Data Science.
  • iTranslate – enables people to read, write, and speak in over 90 languages, anywhere in the world.

CodeHS (San Francisco, CA)

In 2012, Stanford students Zach Galant and Jeremy Keeshin were computer science majors and TAs for introductory classes when they noticed a trend among their peers. Many wished that they had been exposed to computer science earlier in life. In their senior year, Zach and Jeremy launched CodeHS to give middle and high schools the opportunity to provide a fun, accessible computer science education to students everywhere. CodeHS is a web-based curriculum pathway complete with teacher resources, lesson plans, and professional development opportunities. The curriculum is supplemented with time-saving teacher tools to help with lesson planning, grading and reviewing student code, and managing their classroom.

CodeHS aspires to empower all students to meaningfully impact the future, and believe that coding is becoming a new foundational skill, along with reading and writing, that allows students to further explore any interest or area of study. At the time CodeHS was founded in 2012, only 10% of high schools in America offered a computer science course. Zach and Jeremy set out to change that by providing a solution that made it easy for schools and districts to get started. With CodeHS, thousands of teachers have been trained and are teaching hundreds of thousands of students all over the world. To use CodeHS, all that’s needed is the internet and a web browser. Students can write and run their code online, and teachers can immediately see what the students are working on and how they are doing.

Amazon EC2, Amazon RDS, Amazon ElastiCache, Amazon CloudFront, and Amazon S3 make it possible for CodeHS to scale their site to meet the needs of schools all over the world. CodeHS also relies on AWS to compile and run student code in the browser, which is extremely important when teaching server-side languages like Java that powers the AP course. Since usage rises and falls based on school schedules, Amazon CloudWatch and ELBs are used to easily scale up when students are running code so they have a seamless experience.

Be sure to visit the CodeHS website, and to learn more about bringing computer science to your school, click here!

Insight (Palo Alto, CA)

Insight was founded in 2012 to create a new educational model, optimize hiring for data teams, and facilitate successful career transitions among data professionals. Over the last 5 years, Insight has kept ahead of market trends and launched a series of professional training fellowships including Data Science, Health Data Science, Data Engineering, and Artificial Intelligence. Finding individuals with the right skill set, background, and culture fit is a challenge for big companies and startups alike, and Insight is focused on developing top talent through intensive 7-week fellowships. To date, Insight has over 1,000 alumni at over 350 companies including Amazon, Google, Netflix, Twitter, and The New York Times.

The Data Engineering team at Insight is well-versed in the current ecosystem of open source tools and technologies and provides mentorship on the best practices in this space. The technical teams are continually working with external groups in a variety of data advisory and mentorship capacities, but the majority of Insight partners participate in professional sessions. Companies visit the Insight office to speak with fellows in an informal setting and provide details on the type of work they are doing and how their teams are growing. These sessions have proved invaluable as fellows experience a significantly better interview process and companies yield engaged and enthusiastic new team members.

An important aspect of Insight’s fellowships is the opportunity for hands-on work, focusing on everything from building big-data pipelines to contributing novel features to industry-standard open source efforts. Insight provides free AWS resources for all fellows to use, in addition to mentorships from the Data Engineering team. Fellows regularly utilize Amazon S3, Amazon EC2, Amazon Kinesis, Amazon EMR, AWS Lambda, Amazon Redshift, Amazon RDS, among other services. The experience with AWS gives fellows a solid skill set as they transition into the industry. Fellowships are currently being offered in Boston, New York, Seattle, and the Bay Area.

Check out the Insight blog for more information on trends in data infrastructure, artificial intelligence, and cutting-edge data products.

 

iTranslate (Austria)

When the App Store was introduced in 2008, the founders of iTranslate saw an opportunity to be part of something big. The group of four fully believed that the iPhone and apps were going to change the world, and together they brainstormed ideas for their own app. The combination of translation and mobile devices seemed a natural fit, and by 2009 iTranslate was born. iTranslate’s mission is to enable travelers, students, business professionals, employers, and medical staff to read, write, and speak in all languages, anywhere in the world. The app allows users to translate text, voice, websites and more into nearly 100 languages on various platforms. Today, iTranslate is the leading player for conversational translation and dictionary apps, with more than 60 million downloads and 6 million monthly active users.

iTranslate is breaking language barriers through disruptive technology and innovation, enabling people to translate in real time. The app has a variety of features designed to optimize productivity including offline translation, website and voice translation, and language auto detection. iTranslate also recently launched the world’s first ear translation device in collaboration with Bragi, a company focused on smart earphones. The Dash Pro allows people to communicate freely, while having a personal translator right in their ear.

iTranslate started using Amazon Polly soon after it was announced. CEO Alexander Marktl said, “As the leading translation and dictionary app, it is our mission at iTranslate to provide our users with the best possible tools to read, write, and speak in all languages across the globe. Amazon Polly provides us with the ability to efficiently produce and use high quality, natural sounding synthesized speech.” The stable and simple-to-use API, low latency, and free caching allow iTranslate to scale as they continue adding features to their app. Customers also enjoy the option to change speech rate and change between male and female voices. To assure quality, speed, and reliability of their products, iTranslate also uses Amazon EC2, Amazon S3, and Amazon Route 53.

To get started with iTranslate, visit their website here.

—–

Thanks for reading!

-Tina

Deploying Java Microservices on Amazon EC2 Container Service

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/deploying-java-microservices-on-amazon-ec2-container-service/

This post and accompanying code graciously contributed by:

Huy Huynh
Sr. Solutions Architect
Magnus Bjorkman
Solutions Architect

Java is a popular language used by many enterprises today. To simplify and accelerate Java application development, many companies are moving from a monolithic to microservices architecture. For some, it has become a strategic imperative. Containerization technology, such as Docker, lets enterprises build scalable, robust microservice architectures without major code rewrites.

In this post, I cover how to containerize a monolithic Java application to run on Docker. Then, I show how to deploy it on AWS using Amazon EC2 Container Service (Amazon ECS), a high-performance container management service. Finally, I show how to break the monolith into multiple services, all running in containers on Amazon ECS.

Application Architecture

For this example, I use the Spring Pet Clinic, a monolithic Java application for managing a veterinary practice. It is a simple REST API, which allows the client to manage and view Owners, Pets, Vets, and Visits.

It is a simple three-tier architecture:

  • Client
    You simulate this by using curl commands.
  • Web/app server
    This is the Java and Spring-based application that you run using the embedded Tomcat. As part of this post, you run this within Docker containers.
  • Database server
    This is the relational database for your application that stores information about owners, pets, vets, and visits. For this post, use MySQL RDS.

I decided to not put the database inside a container as containers were designed for applications and are transient in nature. The choice was made even easier because you have a fully managed database service available with Amazon RDS.

RDS manages the work involved in setting up a relational database, from provisioning the infrastructure capacity that you request to installing the database software. After your database is up and running, RDS automates common administrative tasks, such as performing backups and patching the software that powers your database. With optional Multi-AZ deployments, Amazon RDS also manages synchronous data replication across Availability Zones with automatic failover.

Walkthrough

You can find the code for the example covered in this post at amazon-ecs-java-microservices on GitHub.

Prerequisites

You need the following to walk through this solution:

  • An AWS account
  • An access key and secret key for a user in the account
  • The AWS CLI installed

Also, install the latest versions of the following:

  • Java
  • Maven
  • Python
  • Docker

Step 1: Move the existing Java Spring application to a container deployed using Amazon ECS

First, move the existing monolith application to a container and deploy it using Amazon ECS. This is a great first step before breaking the monolith apart because you still get some benefits before breaking apart the monolith:

  • An improved pipeline. The container also allows an engineering organization to create a standard pipeline for the application lifecycle.
  • No mutations to machines.

You can find the monolith example at 1_ECS_Java_Spring_PetClinic.

Container deployment overview

The following diagram is an overview of what the setup looks like for Amazon ECS and related services:

This setup consists of the following resources:

  • The client application that makes a request to the load balancer.
  • The load balancer that distributes requests across all available ports and instances registered in the application’s target group using round-robin.
  • The target group that is updated by Amazon ECS to always have an up-to-date list of all the service containers in the cluster. This includes the port on which they are accessible.
  • One Amazon ECS cluster that hosts the container for the application.
  • A VPC network to host the Amazon ECS cluster and associated security groups.

Each container has a single application process that is bound to port 8080 within its namespace. In reality, all the containers are exposed on a different, randomly assigned port on the host.

The architecture is containerized but still monolithic because each container has all the same features of the rest of the containers

The following is also part of the solution but not depicted in the above diagram:

  • One Amazon EC2 Container Registry (Amazon ECR) repository for the application.
  • A service/task definition that spins up containers on the instances of the Amazon ECS cluster.
  • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers, so that the application can connect to the MySQL RDS instance.

I have automated setup with the 1_ECS_Java_Spring_PetClinic/ecs-cluster.cf AWS CloudFormation template and a Python script.

The Python script calls the CloudFormation template for the initial setup of the VPC, Amazon ECS cluster, and RDS instance. It then extracts the outputs from the template and uses those for API calls to create Amazon ECR repositories, tasks, services, Application Load Balancer, and target groups.

Environment variables and Spring properties binding

As part of the Python script, you pass in a number of environment variables to the container as part of the task/container definition:

'environment': [
{
'name': 'SPRING_PROFILES_ACTIVE',
'value': 'mysql'
},
{
'name': 'SPRING_DATASOURCE_URL',
'value': my_sql_options['dns_name']
},
{
'name': 'SPRING_DATASOURCE_USERNAME',
'value': my_sql_options['username']
},
{
'name': 'SPRING_DATASOURCE_PASSWORD',
'value': my_sql_options['password']
}
],

The preceding environment variables work in concert with the Spring property system. The value in the variable SPRING_PROFILES_ACTIVE, makes Spring use the MySQL version of the application property file. The other environment files override the following properties in that file:

  • spring.datasource.url
  • spring.datasource.username
  • spring.datasource.password

Optionally, you can also encrypt sensitive values by using Amazon EC2 Systems Manager Parameter Store. Instead of handing in the password, you pass in a reference to the parameter and fetch the value as part of the container startup. For more information, see Managing Secrets for Amazon ECS Applications Using Parameter Store and IAM Roles for Tasks.

Spotify Docker Maven plugin

Use the Spotify Docker Maven plugin to create the image and push it directly to Amazon ECR. This allows you to do this as part of the regular Maven build. It also integrates the image generation as part of the overall build process. Use an explicit Dockerfile as input to the plugin.

FROM frolvlad/alpine-oraclejdk8:slim
VOLUME /tmp
ADD spring-petclinic-rest-1.7.jar app.jar
RUN sh -c 'touch /app.jar'
ENV JAVA_OPTS=""
ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]

The Python script discussed earlier uses the AWS CLI to authenticate you with AWS. The script places the token in the appropriate location so that the plugin can work directly against the Amazon ECR repository.

Test setup

You can test the setup by running the Python script:
python setup.py -m setup -r <your region>

After the script has successfully run, you can test by querying an endpoint:
curl <your endpoint from output above>/owner

You can clean this up before going to the next section:
python setup.py -m cleanup -r <your region>

Step 2: Converting the monolith into microservices running on Amazon ECS

The second step is to convert the monolith into microservices. For a real application, you would likely not do this as one step, but re-architect an application piece by piece. You would continue to run your monolith but it would keep getting smaller for each piece that you are breaking apart.

By migrating microservices, you would get four benefits associated with microservices:

  • Isolation of crashes
    If one microservice in your application is crashing, then only that part of your application goes down. The rest of your application continues to work properly.
  • Isolation of security
    When microservice best practices are followed, the result is that if an attacker compromises one service, they only gain access to the resources of that service. They can’t horizontally access other resources from other services without breaking into those services as well.
  • Independent scaling
    When features are broken out into microservices, then the amount of infrastructure and number of instances of each microservice class can be scaled up and down independently.
  • Development velocity
    In a monolith, adding a new feature can potentially impact every other feature that the monolith contains. On the other hand, a proper microservice architecture has new code for a new feature going into a new service. You can be confident that any code you write won’t impact the existing code at all, unless you explicitly write a connection between two microservices.

Find the monolith example at 2_ECS_Java_Spring_PetClinic_Microservices.
You break apart the Spring Pet Clinic application by creating a microservice for each REST API operation, as well as creating one for the system services.

Java code changes

Comparing the project structure between the monolith and the microservices version, you can see that each service is now its own separate build.
First, the monolith version:

You can clearly see how each API operation is its own subpackage under the org.springframework.samples.petclinic package, all part of the same monolithic application.
This changes as you break it apart in the microservices version:

Now, each API operation is its own separate build, which you can build independently and deploy. You have also duplicated some code across the different microservices, such as the classes under the model subpackage. This is intentional as you don’t want to introduce artificial dependencies among the microservices and allow these to evolve differently for each microservice.

Also, make the dependencies among the API operations more loosely coupled. In the monolithic version, the components are tightly coupled and use object-based invocation.

Here is an example of this from the OwnerController operation, where the class is directly calling PetRepository to get information about pets. PetRepository is the Repository class (Spring data access layer) to the Pet table in the RDS instance for the Pet API:

@RestController
class OwnerController {

    @Inject
    private PetRepository pets;
    @Inject
    private OwnerRepository owners;
    private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);

    @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
    public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
        List<Pet> petList = this.owners.findById(ownerId).getPets();
        List<Visit> visitList = new ArrayList<Visit>();
        petList.forEach(pet -> visitList.addAll(pet.getVisits()));
        return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
    }
}

In the microservice version, call the Pet API operation and not PetRepository directly. Decouple the components by using interprocess communication; in this case, the Rest API. This provides for fault tolerance and disposability.

@RestController
class OwnerController {

    @Value("#{environment['SERVICE_ENDPOINT'] ?: 'localhost:8080'}")
    private String serviceEndpoint;

    @Inject
    private OwnerRepository owners;
    private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);

    @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
    public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
        List<Pet> petList = this.owners.findById(ownerId).getPets();
        List<Visit> visitList = new ArrayList<Visit>();
        petList.forEach(pet -> {
            logger.info(getPetVisits(pet.getId()).toString());
            visitList.addAll(getPetVisits(pet.getId()));
        });
        return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
    }

    private List<Visit> getPetVisits(int petId){
        List<Visit> visitList = new ArrayList<Visit>();
        RestTemplate restTemplate = new RestTemplate();
        Pet pet = restTemplate.getForObject("http://"+serviceEndpoint+"/pet/"+petId, Pet.class);
        logger.info(pet.getVisits().toString());
        return pet.getVisits();
    }
}

You now have an additional method that calls the API. You are also handing in the service endpoint that should be called, so that you can easily inject dynamic endpoints based on the current deployment.

Container deployment overview

Here is an overview of what the setup looks like for Amazon ECS and the related services:

This setup consists of the following resources:

  • The client application that makes a request to the load balancer.
  • The Application Load Balancer that inspects the client request. Based on routing rules, it directs the request to an instance and port from the target group that matches the rule.
  • The Application Load Balancer that has a target group for each microservice. The target groups are used by the corresponding services to register available container instances. Each target group has a path, so when you call the path for a particular microservice, it is mapped to the correct target group. This allows you to use one Application Load Balancer to serve all the different microservices, accessed by the path. For example, https:///owner/* would be mapped and directed to the Owner microservice.
  • One Amazon ECS cluster that hosts the containers for each microservice of the application.
  • A VPC network to host the Amazon ECS cluster and associated security groups.

Because you are running multiple containers on the same instances, use dynamic port mapping to avoid port clashing. By using dynamic port mapping, the container is allocated an anonymous port on the host to which the container port (8080) is mapped. The anonymous port is registered with the Application Load Balancer and target group so that traffic is routed correctly.

The following is also part of the solution but not depicted in the above diagram:

  • One Amazon ECR repository for each microservice.
  • A service/task definition per microservice that spins up containers on the instances of the Amazon ECS cluster.
  • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers. That way, the application can connect to the MySQL RDS instance.

I have again automated setup with the 2_ECS_Java_Spring_PetClinic_Microservices/ecs-cluster.cf CloudFormation template and a Python script.

The CloudFormation template remains the same as in the previous section. In the Python script, you are now building five different Java applications, one for each microservice (also includes a system application). There is a separate Maven POM file for each one. The resulting Docker image gets pushed to its own Amazon ECR repository, and is deployed separately using its own service/task definition. This is critical to get the benefits described earlier for microservices.

Here is an example of the POM file for the Owner microservice:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.springframework.samples</groupId>
    <artifactId>spring-petclinic-rest</artifactId>
    <version>1.7</version>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.2.RELEASE</version>
    </parent>
    <properties>
        <!-- Generic properties -->
        <java.version>1.8</java.version>
        <docker.registry.host>${env.docker_registry_host}</docker.registry.host>
    </properties>
    <dependencies>
        <dependency>
            <groupId>javax.inject</groupId>
            <artifactId>javax.inject</artifactId>
            <version>1</version>
        </dependency>
        <!-- Spring and Spring Boot dependencies -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-rest</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-cache</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <!-- Databases - Uses HSQL by default -->
        <dependency>
            <groupId>org.hsqldb</groupId>
            <artifactId>hsqldb</artifactId>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <scope>runtime</scope>
        </dependency>
        <!-- caching -->
        <dependency>
            <groupId>javax.cache</groupId>
            <artifactId>cache-api</artifactId>
        </dependency>
        <dependency>
            <groupId>org.ehcache</groupId>
            <artifactId>ehcache</artifactId>
        </dependency>
        <!-- end of webjars -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
            <plugin>
                <groupId>com.spotify</groupId>
                <artifactId>docker-maven-plugin</artifactId>
                <version>0.4.13</version>
                <configuration>
                    <imageName>${env.docker_registry_host}/${project.artifactId}</imageName>
                    <dockerDirectory>src/main/docker</dockerDirectory>
                    <useConfigFile>true</useConfigFile>
                    <registryUrl>${env.docker_registry_host}</registryUrl>
                    <!--dockerHost>https://${docker.registry.host}</dockerHost-->
                    <resources>
                        <resource>
                            <targetPath>/</targetPath>
                            <directory>${project.build.directory}</directory>
                            <include>${project.build.finalName}.jar</include>
                        </resource>
                    </resources>
                    <forceTags>false</forceTags>
                    <imageTags>
                        <imageTag>${project.version}</imageTag>
                    </imageTags>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Test setup

You can test this by running the Python script:

python setup.py -m setup -r <your region>

After the script has successfully run, you can test by querying an endpoint:

curl <your endpoint from output above>/owner

Conclusion

Migrating a monolithic application to a containerized set of microservices can seem like a daunting task. Following the steps outlined in this post, you can begin to containerize monolithic Java apps, taking advantage of the container runtime environment, and beginning the process of re-architecting into microservices. On the whole, containerized microservices are faster to develop, easier to iterate on, and more cost effective to maintain and secure.

This post focused on the first steps of microservice migration. You can learn more about optimizing and scaling your microservices with components such as service discovery, blue/green deployment, circuit breakers, and configuration servers at http://aws.amazon.com/containers.

If you have questions or suggestions, please comment below.

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

Post Syndicated from Rendy Oka original https://aws.amazon.com/blogs/big-data/analysis-of-top-n-dynamodb-objects-using-amazon-athena-and-amazon-quicksight/

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what is driving the business at any particular moment. For retail companies, for example, understanding which products are currently popular can aid in planning for future growth. Similarly, for PR companies, understanding the impact of an advertising campaign can help them market their products more effectively.

This post covers an architecture that helps you analyze your streaming data. You’ll build a solution using Amazon DynamoDB Streams, AWS Lambda, Amazon Kinesis Firehose, and Amazon Athena to analyze data intake at a frequency that you choose. And because this is a serverless architecture, you can use all of the services here without the need to provision or manage servers.

The data source

You’ll collect a random sampling of tweets via Twitter’s API and store a variety of attributes in your DynamoDB table, such as: Twitter handle, tweet ID, hashtags, location, and Time-To-Live (TTL) value.

In DynamoDB, the primary key is used as an input to an internal hash function. The output from this function determines the partition in which the data will be stored. When using a combination of primary key and sort key as a DynamoDB schema, you need to make sure that no single partition key contains many more objects than the other partition keys because this can cause partition level throttling. For the demonstration in this blog, the Twitter handle will be the primary key and the tweet ID will be the sort key. This allows you to group and sort tweets from each user.

To help you get started, I have written a script that pulls a live Twitter stream that you can use to generate your data. All you need to do is provide your own Twitter Apps credentials, and it should generate the data immediately. Alternatively, I have also provided a script that you can use to generate random Tweets with little effort.

You can find both scripts in the Github repository:

https://github.com/awslabs/aws-blog-dynamodb-analysis

There are some modules that you may need to install to run these scripts. You can find them in Python’s module repository:

To get your own Twitter credentials, go to https://www.twitter.com/ and sign up for a free account, if you don’t already have one. After your account is set up, go to https://apps.twitter.com/. On the main landing page, choose the Create New App button. After the application is created, go to Keys and Access Tokens to get your credentials to use the Twitter API. You’ll need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request.

Architecture overview

Before we begin, let’s take a look at the overall flow of information will look like, from data ingestion into DynamoDB to visualization of results in Amazon QuickSight.

As illustrated in the architecture diagram above, any changes made to the items in DynamoDB will be captured and processed using DynamoDB Streams. Next, a Lambda function will be invoked by a trigger that is configured to respond to events in DynamoDB Streams. The Lambda function processes the data prior to pushing to Amazon Kinesis Firehose, which will output to Amazon S3. Finally, you use Amazon Athena to analyze the streaming data landing in Amazon S3. The result can be explored and visualized in Amazon QuickSight for your company’s business analytics.

You’ll need to implement your custom Lambda function to help transform the raw <key, value> data stored in DynamoDB to a JSON format for Athena to digest, but I can help you with a sample code that you are free to modify.

Implementation

In the following sections, I’ll walk through how you can set up the architecture discussed earlier.

Create your DynamoDB table

First, let’s create a DynamoDB table and enable DynamoDB Streams. This will enable data to be copied out of this table. From the console, use the user_id as the partition key and tweet_id as the sort key:

After the table is ready, you can enable DynamoDB Streams. This process operates asynchronously, so there is no performance impact on the table when you enable this feature. The easiest way to manage DynamoDB Streams is also through the DynamoDB console.

In the Overview tab of your newly created table, click Manage Stream. In the window, choose the information that will be written to the stream whenever data in the table is added or modified. In this example, you can choose either New image or New and old images.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Configure Kinesis Firehose

Before creating the Lambda function, you need to configure Kinesis Firehose delivery stream so that it’s ready to accept data from Lambda. Open the Firehose console and choose Create Firehose Delivery Stream. From here, choose S3 as the destination and use the following to information to configure the resource. Note the Delivery stream name because you will use it in the next step.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html#console-to-s3

Create your Lambda function

Now that Kinesis Firehose is ready to accept data, you can create your Lambda function.

From the AWS Lambda console, choose the Create a Lambda function button and use the Blank Function. Enter a name and description, and choose Python 2.7 as the Runtime. Note your Lambda function name because you’ll need it in the next step.

In the Lambda function code field, you can paste the script that I have written for this purpose. All this function needs is the name of your Firehose stream name set as an environment variable.

import boto3
import json
import os

# Initiate Firehose client
firehose_client = boto3.client('firehose')

def lambda_handler(event, context):
    records = []
    batch   = []
    try :
        for record in event['Records']:
            tweet = {}
            t_stats = '{ "table_name":"%s", "user_id":"%s", "tweet_id":"%s", "approx_post_time":"%d" }\n' \
                      % ( record['eventSourceARN'].split('/')[1], \
                          record['dynamodb']['Keys']['user_id']['S'], \
                          record['dynamodb']['Keys']['tweet_id']['N'], \
                          int(record['dynamodb']['ApproximateCreationDateTime']) )
            tweet["Data"] = t_stats
            records.append(tweet)
        batch.append(records)
        res = firehose_client.put_record_batch(
            DeliveryStreamName = os.environ['firehose_stream_name'],
            Records = batch[0]
        )
        return 'Successfully processed {} records.'.format(len(event['Records']))
    except Exception :
        pass

The handler should be set to lambda_function.lambda_handler and you can use the existing lambda_dynamodb_streams role that’s been created by default.

Enable DynamoDB trigger and start collecting data

Everything is ready to go. Open your table using the DynamoDB console and go to the Triggers tab. Select the Create trigger drop down list and choose Existing Lambda function. In the pop-up window, select the function that you just created, and choose the Create button.

At this point, you can start collecting data with the Python script that I’ve provided. The first one will create a script that will pull public Twitter data and the other will generate fake tweets using Lorem Ipsum text.

Configure Amazon Athena to read the data

Next, you will configure Amazon Athena so that it can read the data Kinesis Firehose outputs to Amazon S3 and allow you to analyze the data as needed. You can connect to Athena directly from the Athena console, and you can establish a connection using JDBC or the Athena API. In this example, I’m going to demonstrate what this looks like on the Athena console.

First, create a new database and a new table. You can do this by running the following two queries. The first query creates a new database:

CREATE DATABASE IF NOT EXISTS ddbtablestats

And the second query creates a new table:

CREATE EXTERNAL TABLE IF NOT EXISTS ddbtablestats.twitterfeed (
    `table_name` string,
    `user_id` string,
    `tweet_id` bigint,
    `approx_post_time` timestamp 
) PARTITIONED BY (
    year string,
    month string,
    day string,
    hour string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://myBucket/dynamodb/streams/transactions/'

Note that this table is created using partitions. Partitioning separates your data into logical parts based on certain criteria, such as date, location, language, etc. This allows Athena to selectively pull your data without needing to process the entire data set. This effectively minimizes the query execution time, and it also allows you to have greater control over the data that you want to query.

After the query has completed, you should be able to see the table in the left side pane of the Athena dashboard.

After the database and table have been created, execute the ALTER TABLE query to populate the partitions in your table. Replace the date with the current date when the script was executed.

ALTER TABLE ddbtablestats.TwitterFeed ADD IF NOT EXISTS
PARTITION (year='2017',month='05',day='17',hour='01') location 's3://myBucket/dynamodb/streams/transactions/2017/05/17/01/'

Using the Athena console, you’ll need to manually populate each partition for each additional partition that you’d like to analyze, however you can programmatically automate this process by using the JDBC driver or any AWS SDK of your choice.

For more information on partitioning in Athena, check out our documentation:

http://docs.aws.amazon.com/athena/latest/ug/partitions.html

Querying the data in Amazon Athena

This is it! Let’s run this query to see the top 10 most active Twitter users in the last 24 hours. You can do this from the Athena console:

SELECT user_id, COUNT(DISTINCT tweet_id) tweets FROM ddbTableStats.TwitterFeed
WHERE year='2017' AND month='05' AND day='17'
GROUP BY user_id
ORDER BY tweets DESC
LIMIT 10

The result should look similar to the following:

Linking Athena to Amazon QuickSight

Finally, to make this data available to a larger audience, let’s visualize this data in Amazon QuickSight. Amazon QuickSight provides native connectivity to AWS data sources such as Amazon Redshift, Amazon RDS, and Amazon Athena. Amazon QuickSight can also connect to on-premises databases, Excel, or CSV files, and it can connect to cloud data sources such as Salesforce.com. For this solution, we will connect Amazon QuickSight to the Athena table we just created.

Amazon QuickSight has a free tier that provides 1 user and 1GB of SPICE (Superfast Parallel In-memory Calculated Engine) capacity free. So you can sign up and use QuickSight free of charge.

When you are signing up for Amazon QuickSight, ensure that you grant permissions for QuickSight to connect to Athena and the S3 bucket where the data is stored.

After you’ve signed up, navigate to the new analysis button, and choose new data set, and then select the Athena data source option. Create a new name for your data source and proceed to the next prompt. At this point, you should see the Athena table you created earlier.

Choose the option to import the data to SPICE for a quicker analysis. SPICE is an in-memory optimized calculation engine that is designed for quick data visualization through parallel processing. SPICE also enables you to refresh your data sets at a regular interval or on-demand as you want.

In the dialog box, confirm this data set creation, and you’ll arrive on the landing page where you can start building your graph. The X-axis will represent the user_id and the Value will be used to represent the SUM total of the tweets from each user.

The Amazon QuickSight report looks like this:

Through this visualization, I can easily see that there are 3 users that tweeted over 20 times that day and that the majority of the users have fewer than 10 tweets that day. I can also set up a scheduled refresh of my SPICE dataset so that I have a dashboard that is regularly updated with the latest data.

Closing thoughts

Here are the benefits that you can gain from using this architecture:

  1. You can optimize the design of your DynamoDB schema that follows AWS best practice recommendations.
  1. You can run analysis and data intelligence in order to understand the current customer demands for your business.
  1. You can store incremental backup for future auditing.

The flexibility of our AWS services invites you to create and design the ideal workflow for your production at any scale, and, as always, if you ever need some guidance, don’t hesitate to reach out to us.I  hope this has been helpful to you! Please leave any questions and comments below.

 


Additional Reading

Learn how to analyze VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight.


About the Author

Rendy Oka is a Big Data Support Engineer for Amazon Web Services. He provides consultations and architectural designs and partners with the TAMs, Solution Architects, and AWS product teams to help develop solutions for our customers. He is also a team lead for the big data support team in Seattle. Rendy has traveled to dozens of countries around the world and takes every opportunity to experience the local culture wherever he goes

 

 

 

 

AWS Hot Startups – June 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-june-2017/

Thanks for stopping by for another round of AWS Hot Startups! This month we are featuring:

  • CloudRanger – helping companies understand the cloud with visual representation.
  • quintly – providing social media analytics for brands on a single dashboard.
  • Tango Card – reinventing rewards programs for businesses and their customers worldwide.

Don’t forget to check out May’s Hot Startups in case you missed them.

CloudRanger (Letterkenny, Ireland)   

The idea for CloudRanger started where most great ideas do – at a bar in Las Vegas. During a late-night conversation with his friends at re:Invent 2014, Dave Gildea (Founder and CEO) used cocktail napkins and drink coasters to visually illustrate servers and backups, and the light on his phone to represent scheduling. By the end of the night, the idea for automated visual server management was born. With CloudRanger, companies can easily create backup and retention policies, visual scheduling, and simple restoration of snapshots and AMIs. The team behind CloudRanger believes that when servers and cloud resources are represented visually, they are easier to manage and understand. Users are able to see their servers, which turns them into a tangible and important piece of business inventory.

CloudRanger is an excellent platform for MSPs who manage many different AWS accounts, and need a quick method to display many servers and audit certain attributes. The company’s goal is to give anyone the ability to create backup policies in multiple regions, apply them using a tag-based methodology, and manage backups. Servers can be scheduled from one simple dashboard, and restoration is easy and step-by-step. With CloudRanger’s visual representation of resources, customers are encouraged to fully understand their backup policies, schedules, and servers.

As an AWS Partner, CloudRanger has built a globally redundant system after going all-in with AWS. They are using over 25 AWS services for everything including enterprise-level security, automation and 24/7 runtimes, and an emphasis on Machine Learning for efficiency in the sales process. CloudRanger continues to rely more on AWS as new services and features are released, and are replacing current services with AWS CodePipeline and AWS CodeBuild. CloudRanger was also named Startup Company of the Year at a recent Irish tech event!

To learn more about CloudRanger, visit their website.

quintly (Cologne, Germany)

In 2010, brothers Alexander Peiniger and Frederik Peiniger started a journey to help companies track their social media profiles and improve their strategies against competitors. The startup began under the name “Social.Media.Tracking” and then “AllFacebook Stats” before officially becoming quintly in 2013. With quintly, brands and agencies can analyze, benchmark, and optimize their social media activities on a global scale. The innovative dashboarding system gives clients an overview across all social media profiles on the most important networks (Facebook, Twitter, YouTube, Google+, LinkedIn, Instagram, etc.) and then derives an optimal social media strategy from those profiles. Today, quintly has users in over 180 countries and paying clients in over 65 countries including major agency networks and Fortune 500 companies.

Getting an overview of a brand’s social media activities can be time-consuming, and turning insights into actions is a challenge that not all brands master. Quintly offers a variety of features designed to help clients improve their social media reach. With their web-based SaaS product, brands and agencies can compare their social media performance against competitors and their best practices. Not only can clients learn from their own historic performance, but they can leverage data from any other brand around the world.

Since the company’s founding, quintly built and operates its SaaS offering on top of AWS services, leveraging Amazon EC2, Amazon ECS, Elastic Load Balancing, and Amazon Route53 to host their Docker-based environment. Large amounts of data are stored in Amazon DynamoDB and Amazon RDS, and they use Amazon CloudWatch to monitor and seamlessly scale to the current needs. In addition, quintly is using Amazon Machine Learning to add additional attributes to the data and to drive better decisions for their clients. With the help of AWS, quintly has been able to focus on their core business while having a scalable and well-performing solution to solve their technical needs.

For more on quintly, check out their Social Media Analytics blog.

Tango Card (Seattle, Washington)

Based in the heart of West Seattle, Tango Card is revolutionizing rewards programs for companies around the world. Too often customers redeem points in a loyalty or rebate program only to wait weeks for their prize to arrive. Companies generously give their employees appreciation gifts, but the gifts can be generic and impersonal. With Tango Card, companies can choose from a variety of rewards that fit the needs of their specific program, event, or business incentive. The extensive Rewards Catalog includes options for e-gift cards that are sure to excite any recipient. There are plenty of options for everyone from traditional e-gift cards to nonprofit donations to cash equivalent rewards.

Tango Card uses a combination of desired rewards, modern technology, and expert service to change the rewards and incentive experience. The Reward Delivery Platform offers solutions including Blast Rewards, Reward Link, and Rewards as a Service API (RaaS). Blast Rewards enables companies to purchase and send e-gift cards in bulk in just one business day. Reward Link lets recipients choose from an assortment of e-gift cards, prepaid cards, digital checks, and donations and is delivered instantly. Finally, Rewards as a Service is a robust digital gift card API that is built to support apps and platforms. With RaaS, Tango Card can send out e-gift cards on company-branded email templates or deliver them directly within a user interface.

The entire Tango Card Reward Delivery Platform leverages many AWS services. They use Amazon EC2 Container Service (ECS) for rapid deployment of containerized micro services, and Amazon Relational Database Service (RDS) for low overhead managed databases. Tango Card is also leveraging Amazon Virtual Private Cloud (VPC), AWS Key Management Service (KMS), and AWS Identity and Access Management (IMS).

To learn more about Tango Card, check out their blog!

I would also like to thank Alexander Moss-Bolanos for helping with the Hot Startups posts this year.

Thanks for reading and we’ll see you next month!

-Tina Barr

Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

 
Stephen Liedig, Solutions Architect

 

One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.

Walkthrough

To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.


In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.


In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
{
    while (_running)
    {
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
        {
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
            {
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 
            };

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);
        }

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
{
    foreach (var message in receiveMessageResponse.Result.Messages)
    {
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
        worker.RunWorkerAsync();
    }
}
else
{
    Console.WriteLine("No messages on queue");
}

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
{
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
};
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
{
    using (var sqs = new AmazonSQSClient())
    {
        try
        {
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            sqs.DeleteMessageAsync(deleteRequest);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        }
        catch (Exception ex)
        {
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);
        }
    }
}

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.

Summary

While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

AWS Hot Startups – May 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-may-2017/

April showers bring May startups! This month we have three hot startups for you to check out. Keep reading to find out what they’re up to, and how they’re using AWS to do it.

Today’s post features the following startups:

  • Lobster – an AI-powered platform connecting creative social media users to professionals.
  • Visii – helping consumers find the perfect product using visual search.
  • Tiqets – a curated marketplace for culture and entertainment.

Lobster (London, England)

Every day, social media users generate billions of authentic images and videos to rival typical stock photography. Powered by Artificial Intelligence, Lobster enables brands, agencies, and the press to license visual content directly from social media users so they can find that piece of content that perfectly fits their brand or story. Lobster does the work of sorting through major social networks (Instagram, Flickr, Facebook, Vk, YouTube, and Vimeo) and cloud storage providers (Dropbox, Google Photos, and Verizon) to find media, saving brands and agencies time and energy. Using filters like gender, color, age, and geolocation can help customers find the unique content they’re looking for, while Lobster’s AI and visual recognition finds images instantly. Lobster also runs photo challenges to help customers discover the perfect image to fit their needs.

Lobster is an excellent platform for creative people to get their work discovered while also protecting their content. Users are treated as copyright holders and earn 75% of the final price of every sale. The platform is easy to use: new users simply sign in with an existing social media or cloud account and can start showcasing their artistic talent right away. Lobster allows users to connect to any number of photo storage sources so they’re able to choose which items to share and which to keep private. Once users have selected their favorite photos and videos to share, they can sit back and watch as their work is picked to become the signature for a new campaign or featured on a cool website – and start earning money for their work.

Lobster is using a variety of AWS services to keep everything running smoothly. The company uses Amazon S3 to store photography that was previously ordered by customers. When a customer purchases content, the respective piece of content must be available at any given moment, independent from the original source. Lobster is also using Amazon EC2 for its application servers and Elastic Load Balancing to monitor the state of each server.

To learn more about Lobster, check them out here!

Visii (London, England)

In today’s vast web, a growing number of products are being sold online and searching for something specific can be difficult. Visii was created to cater to businesses and help them extract value from an asset they already have – their images. Their SaaS platform allows clients to leverage an intelligent visual search on their websites and apps to help consumers find the perfect product for them. With Visii, consumers can choose an image and immediately discover more based on their tastes and preferences. Whether it’s clothing, artwork, or home decor, Visii will make recommendations to get consumers to search visually and subsequently help businesses increase their conversion rates.

There are multiple ways for businesses to integrate Visii on their website or app. Many of Visii’s clients choose to build against their API, but Visii also work closely with many clients to figure out the most effective way to do this for each unique case. This has led Visii to help build innovative user interfaces and figure out the best integration points to get consumers to search visually. Businesses can also integrate Visii on their website with a widget – they just need to provide a list of links to their products and Visii does the rest.

Visii runs their entire infrastructure on AWS. Their APIs and pipeline all sit in auto-scaling groups, with ELBs in front of them, sending things across into Amazon Simple Queue Service and Amazon Aurora. Recently, Visii moved from Amazon RDS to Aurora and noted that the process was incredibly quick and easy. Because they make heavy use of machine learning, it is crucial that their pipeline only runs when required and that they maximize the efficiency of their uptime.

To see how companies are using Visii, check out Style Picker and Saatchi Art.

Tiqets (Amsterdam, Netherlands)

Tiqets is making the ticket-buying experience faster and easier for travelers around the world.  Founded in 2013, Tiqets is one of the leading curated marketplaces for admission tickets to museums, zoos, and attractions. Their mission is to help travelers get the most out of their trips by helping them find and experience a city’s culture and entertainment. Tiqets partners directly with vendors to adapt to a customer’s specific needs, and is now active in over 30 cities in the US, Europe, and the Middle East.

With Tiqets, travelers can book tickets either ahead of time or at their destination for a wide range of attractions. The Tiqets app provides real-time availability and delivers tickets straight to customer’s phones via email, direct download, or in the app. Customers save time skipping long lines (a perk of the app!), save trees (don’t need to physically print tickets), and most importantly, they can make the most out of their leisure time. For each attraction featured on Tiqets, there is a lot of helpful information including best modes of transportation, hours, commonly asked questions, and reviews from other customers.

The Tiqets platform consists of the consumer-facing website, the internal and external-facing APIs, and the partner self-service portals. For the app hosting and infrastructure, Tiqets uses AWS services such as Elastic Load Balancing, Amazon EC2, Amazon RDS, Amazon CloudFront, Amazon Route 53, and Amazon ElastiCache. Through the infrastructure orchestration of their AWS configuration, they can easily set up separate development or test environments while staying close to the production environment as well.

Tiqets is hiring! Be sure to check out their jobs page if you are interested in joining the Tiqets team.

Thanks for reading and don’t forget to check out April’s Hot Startups if you missed it.

-Tina Barr

 

 

Building a Secure Cross-Account Continuous Delivery Pipeline

Post Syndicated from Anuj Sharma original https://aws.amazon.com/blogs/devops/aws-building-a-secure-cross-account-continuous-delivery-pipeline/

Most organizations create multiple AWS accounts because they provide the highest level of resource and security isolation. In this blog post, I will discuss how to use cross account AWS Identity and Access Management (IAM) access to orchestrate continuous integration and continuous deployment.

Do I need multiple accounts?

If you answer “yes” to any of the following questions you should consider creating more AWS accounts:

  • Does your business require administrative isolation between workloads? Administrative isolation by account is the most straightforward way to grant independent administrative groups different levels of administrative control over AWS resources based on workload, development lifecycle, business unit (BU), or data sensitivity.
  • Does your business require limited visibility and discoverability of workloads? Accounts provide a natural boundary for visibility and discoverability. Workloads cannot be accessed or viewed unless an administrator of the account enables access to users managed in another account.
  • Does your business require isolation to minimize blast radius? Separate accounts help define boundaries and provide natural blast-radius isolation to limit the impact of a critical event such as a security breach, an unavailable AWS Region or Availability Zone, account suspensions, and so on.
  • Does your business require a particular workload to operate within AWS service limits without impacting the limits of another workload? You can use AWS account service limits to impose restrictions on a business unit, development team, or project. For example, if you create an AWS account for a project group, you can limit the number of Amazon Elastic Compute Cloud (Amazon EC2) or high performance computing (HPC) instances that can be launched by the account.
  • Does your business require strong isolation of recovery or auditing data? If regulatory requirements require you to control access and visibility to auditing data, you can isolate the data in an account separate from the one where you run your workloads (for example, by writing AWS CloudTrail logs to a different account).
  • Do your workloads depend on specific instance reservations to support high availability (HA) or disaster recovery (DR) capacity requirements? Reserved Instances (RIs) ensure reserved capacity for services such as Amazon EC2 and Amazon Relational Database Service (Amazon RDS) at the individual account level.

Use case

The identities in this use case are set up as follows:

  • DevAccount

Developers check the code into an AWS CodeCommit repository. It stores all the repositories as a single source of truth for application code. Developers have full control over this account. This account is usually used as a sandbox for developers.

  • ToolsAccount

A central location for all the tools related to the organization, including continuous delivery/deployment services such as AWS CodePipeline and AWS CodeBuild. Developers have limited/read-only access in this account. The Operations team has more control.

  • TestAccount

Applications using the CI/CD orchestration for test purposes are deployed from this account. Developers and the Operations team have limited/read-only access in this account.

  • ProdAccount

Applications using the CI/CD orchestration tested in the ToolsAccount are deployed to production from this account. Developers and the Operations team have limited/read-only access in this account.

Solution

In this solution, we will check in sample code for an AWS Lambda function in the Dev account. This will trigger the pipeline (created in AWS CodePipeline) and run the build using AWS CodeBuild in the Tools account. The pipeline will then deploy the Lambda function to the Test and Prod accounts.

 

Setup

  1. Clone this repository. It contains the AWS CloudFormation templates that we will use in this walkthrough.
git clone https://github.com/awslabs/aws-refarch-cross-account-pipeline.git
  1. Follow the instructions in the repository README to push the sample AWS Lambda application to an AWS CodeCommit repository in the Dev account.
  2. Install the AWS Command Line Interface as described here. To prepare your access keys or assume-role to make calls to AWS, configure the AWS CLI as described here.

Walkthrough

Note: Follow the steps in the order they’re written. Otherwise, the resources might not be created correctly.

  1. In the Tools account, deploy this CloudFormation template. It will create the customer master keys (CMK) in AWS Key Management Service (AWS KMS), grant access to Dev, Test, and Prod accounts to use these keys, and create an Amazon S3 bucket to hold artifacts from AWS CodePipeline.
aws cloudformation deploy --stack-name pre-reqs \
--template-file ToolsAcct/pre-reqs.yaml --parameter-overrides \
DevAccount=ENTER_DEV_ACCT TestAccount=ENTER_TEST_ACCT \
ProductionAccount=ENTER_PROD_ACCT

In the output section of the CloudFormation console, make a note of the Amazon Resource Number (ARN) of the CMK and the S3 bucket name. You will need them in the next step.

  1. In the Dev account, which hosts the AWS CodeCommit repository, deploy this CloudFormation template. This template will create the IAM roles, which will later be assumed by the pipeline running in the Tools account. Enter the AWS account number for the Tools account and the CMK ARN.
aws cloudformation deploy --stack-name toolsacct-codepipeline-role \
--template-file DevAccount/toolsacct-codepipeline-codecommit.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ToolsAccount=ENTER_TOOLS_ACCT CMKARN=FROM_1st_Step
  1. In the Test and Prod accounts where you will deploy the Lambda code, execute this CloudFormation template. This template creates IAM roles, which will later be assumed by the pipeline to create, deploy, and update the sample AWS Lambda function through CloudFormation.
aws cloudformation deploy --stack-name toolsacct-codepipeline-cloudformation-role \
--template-file TestAccount/toolsacct-codepipeline-cloudformation-deployer.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ToolsAccount=ENTER_TOOLS_ACCT CMKARN=FROM_1st_STEP  \
S3Bucket=FROM_1st_STEP
  1. In the Tools account, which hosts AWS CodePipeline, execute this CloudFormation template. This creates a pipeline, but does not add permissions for the cross accounts (Dev, Test, and Prod).
aws cloudformation deploy --stack-name sample-lambda-pipeline \
--template-file ToolsAcct/code-pipeline.yaml \
--parameter-overrides DevAccount=ENTER_DEV_ACCT TestAccount=ENTER_TEST_ACCT \
ProductionAccount=ENTER_PROD_ACCT CMKARN=FROM_1st_STEP \
S3Bucket=FROM_1st_STEP--capabilities CAPABILITY_NAMED_IAM
  1. In the Tools account, execute this CloudFormation template, which give access to the role created in step 4. This role will be assumed by AWS CodeBuild to decrypt artifacts in the S3 bucket. This is the same template that was used in step 1, but with different parameters.
aws cloudformation deploy --stack-name pre-reqs \
--template-file ToolsAcct/pre-reqs.yaml \
--parameter-overrides CodeBuildCondition=true
  1. In the Tools account, execute this CloudFormation template, which will do the following:
    1. Add the IAM role created in step 2. This role is used by AWS CodePipeline in the Tools account for checking out code from the AWS CodeCommit repository in the Dev account.
    2. Add the IAM role created in step 3. This role is used by AWS CodePipeline in the Tools account for deploying the code package to the Test and Prod accounts.
aws cloudformation deploy --stack-name sample-lambda-pipeline \
--template-file ToolsAcct/code-pipeline.yaml \
--parameter-overrides CrossAccountCondition=true \
--capabilities CAPABILITY_NAMED_IAM

What did we just do?

  1. The pipeline created in step 4 and updated in step 6 checks out code from the AWS CodeCommit repository. It uses the IAM role created in step 2. The IAM role created in step 4 has permissions to assume the role created in step 2. This role will be assumed by AWS CodeBuild to decrypt artifacts in the S3 bucket, as described in step 5.
  2. The IAM role created in step 2 has permission to check out code. See here.
  3. The IAM role created in step 2 also has permission to upload the checked-out code to the S3 bucket created in step 1. It uses the KMS keys created in step 1 for server-side encryption.
  4. Upon successfully checking out the code, AWS CodePipeline triggers AWS CodeBuild. The AWS CodeBuild project created in step 4 is configured to use the CMK created in step 1 for cryptography operations. See here. The AWS CodeBuild role is created later in step 4. In step 5, access is granted to the AWS CodeBuild role to allow the use of the CMK for cryptography.
  5. AWS CodeBuild uses pip to install any libraries for the sample Lambda function. It also executes the aws cloudformation package command to create a Lambda function deployment package, uploads the package to the specified S3 bucket, and adds a reference to the uploaded package to the CloudFormation template. See here.
  6. Using the role created in step 3, AWS CodePipeline executes the transformed CloudFormation template (received as an output from AWS CodeBuild) in the Test account. The AWS CodePipeline role created in step 4 has permissions to assume the IAM role created in step 3, as described in step 5.
  7. The IAM role assumed by AWS CodePipeline passes the role to an IAM role that can be assumed by CloudFormation. AWS CloudFormation creates and updates the Lambda function using the code that was built and uploaded by AWS CodeBuild.

This is what the pipeline looks like using the sample code:

Conclusion

Creating multiple AWS accounts provides the highest degree of isolation and is appropriate for a number of use cases. However, keeping a centralized account to orchestrate continuous delivery and deployment using AWS CodePipeline and AWS CodeBuild eliminates the need to duplicate the delivery pipeline. You can secure the pipeline through the use of cross account IAM roles and the encryption of artifacts using AWS KMS. For more information, see Providing Access to an IAM User in Another AWS Account That You Own in the IAM User Guide.

References

New – USASpending.gov on an Amazon RDS Snapshot

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-usaspending-gov-on-an-amazon-rds-snapshot/

My colleague Jed Sundwall runs the AWS Public Datasets program. He wrote the guest post below to tell you about an important new dataset that is available as an Amazon RDS Snapshot. In the post, Jed introduces the dataset and shows you how to create an Amazon RDS DB Instance from the snapshot.

Jeff;


I am very excited to announce that, starting today, the entire public USAspending.gov database is available for anyone to copy via Amazon Relational Database Service (RDS). USAspending.gov data includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more. The data is available via a PostgreSQL snapshot, which provides bulk access to the entire USAspending.gov database, and is updated nightly. At this time, the database includes all USAspending.gov for the second quarter of fiscal year 2017, and data going back to the year 2000 will be added over the summer. You can learn more about the database and how to access it on its AWS Public Dataset landing page.

Through the AWS Public Datasets program, we work with AWS customers to experiment with ways that the cloud can make data more accessible to more people. Most of our AWS Public Datasets are made available through Amazon S3 because of its tremendous flexibility and ability to scale to serve any volume of any kind of data files. What’s exciting about the USAspending.gov database is that it provides a great example of how Amazon RDS can be used to share an entire relational database quickly and easily. Typically, sharing a relational database requires extract, transfer, and load (ETL) processes that require redundant storage capacity, time for data transfer, and often scripts to migrate your database schema from one database engine to another. ETL processes can be so intimidating and cumbersome that they’re effectively impossible for many people to carry out.

By making their data available as a public Amazon RDS snapshot, the team at USASPending.gov has made it easy for anyone to get a copy of their entire production database for their own use within minutes. This will be useful for researchers and businesses who want to work with real data about all US Government spending and quickly combine it with their own data or other data resources.

Deploying the USASpending.gov Database Using the AWS Management Console
Let’s go through the steps involved in deploying the database in your AWS account using the AWS Management Console.

  1. Sign in to the AWS Management Console and select the US East (N. Virginia) region in the menu bar.
  2. Open the Amazon RDS Console and choose Snapshots in the navigation pane.
  3. In the filter for the search bar, select All Public Snapshots and search for 515495268755:
  4. Select the snapshot named arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db.
  5. Select Snapshot Actions -> Restore Snapshot. Select an instance size, and enter the other details, then click on Restore DB Instance.
  6. You will see that a DB Instance is being created from the snapshot, within your AWS account.
  7. After a few minutes, the status of the instance will change to Available.
  8. You can see the endpoint for your database on the main page along with other useful info:

Deploying the USASpending.gov Database Using the AWS CLI
You can also install the AWS Command Line Interface (CLI) and use it to create a DB Instance from the snapshot. Here’s a sample command:

$ aws rds restore-db-instance-from-db-snapshot --db-instance-identifier my-test-db-cli \
  --db-snapshot-identifier arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db \
  --region us-east-1

This will give you an ARN (Amazon Resource Name) that you can use to reference the DB Instance. For example:

$ aws rds describe-db-instances \
  --db-instance-identifier arn:aws:rds:us-east-1:917192695859:db:my-test-db-cli

This command will display the Endpoint.Address that you use to connect to the database.

Connecting to the DB Instance
After following the AWS Management Console or AWS CLI instructions above, you will have access to the full USAspending.gov database within this Amazon RDS DB instance, and you can connect to it using any PostgreSQL client using the following credentials:

  • Username: root
  • Password: password
  • Database: data_store_api

If you use psql, you can access the database using this command:

$ psql -h my-endpoint.rds.amazonaws.com -U root -d data_store_api

You should change the database password after you log in:

ALTER USER "root" WITH ENCRYPTED PASSWORD '{new password}';

If you can’t connect to your instance but think you should be able to, you may need to check your VPC Security Groups and make sure inbound and outbound traffic on the port (usually 5432) is allowed from your IP address.

Exploring the Data
The USAspending.gov data is very rich, so it will be hard to do it justice in this blog post, but hopefully these queries will give you an idea of what’s possible. To learn about the contents of the database, please review the USAspending.gov Data Dictionary.

The following query will return the total amount of money the government is obligated to pay for contracts awarded by NASA that include “Mars” or “Martian” in the description of the award:

select sum(total_obligation) from awards, subtier_agency 
  where (awards.description like '% MARTIAN %' OR awards.description like '% MARS %') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

As I write this, the result I get for this query is $55,411,025.42. Note that the database is updated nightly and will include more historical data in the coming months, so you may get a different result if you run this query.

Now, here’s the same query, but looking for awards with “Jupiter” or “Jovian” in the description:

select sum(total_obligation) from awards, subtier_agency
  where (awards.description like '%JUPITER%' OR awards.description like '%JOVIAN%') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

The result I get is $14,766,392.96.

Questions & Comments
I’m looking forward to seeing what people can do with this data. If you have any questions about the data, please create an issue on the USAspending.gov API’s issue tracker on GitHub.

— Jed

AWS Hot Startups – April 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-april-2017/

Spring is here, the flowers are blooming and Tina Barr is back with more great startups for you to check out!

-Ana


Welcome back to another month of hot AWS-powered startups! Today we have three exciting startups:

  • Beekeeper – simplifying employee communication in the workplace.
  • Betterment – making investing easier for everyone.
  • ClearSlide – a leading sales engagement platform.

Be sure to check out our March hot startups in case you missed them.

Beekeeper (Zurich, Switzerland)
Beekeeper logoFlavio Pfaffhauser and Christian Grossmann, both graduates of ETH Zurich, were passionate about building a technology that would connect and bring people together. What started as a student’s social community soon turned into Beekeeper – a communication platform for the workplace that allows employees to interact wherever they are. As Flavio and Christian learned how to build a social platform that engaged people properly, businesses began requesting a platform that could be adapted to their specific processes and needs. The platform started with the concept of helping people feel as if they are sitting right next to each other, whether they’re at a desk or in the field. Founded in 2012, Beekeeper is focused on improving information sharing, communication and peer collaboration, and the company strongly believes that listening to employees is crucial for organizations.

The “Mobile First, Desktop Friendly” platform has a simple and intuitive interface that easily integrates multiple operating systems into one ecosystem. The interface can be styled and customized to match a company’s brand and identity. Employees can connect with their colleagues anytime and anywhere with private and group chats, video and file sharing, and feedback surveys. With Beekeeper’s analytical dashboard leadership teams can identify trending topics of discussion and track employee engagement and app usage in real-time. Beekeeper is currently connecting users in 137 countries across industries including hospitality, construction, transportation, and more.

Beekeeper likes using AWS because it allows their engineers to focus on the things that really matter; solving customer issues. The company builds its infrastructure using services like Amazon EC2, Amazon S3, and Amazon RDS, all of which allow the technical teams to offload administrative tasks. Amazon Elastic Transcoder and Amazon QuickSight are used to build analytical dashboards and Amazon Redshift for data warehousing.

Check out the Beekeeper blog to keep up with their latest news!

Betterment (New York, NY)
Betterment logo
Betterment is on a mission to make investing easier and more accessible for everyone, no matter their financial goal. In 2008, Jon Stein founded Betterment with the intent to reinvent the industry and save future investors from making the same common mistakes he had been making. At that time, most people only had a couple of options when it came to investing their money – either do it yourself or hire another person to do it for you. Unfortunately, financial advisors are sometimes paid to recommend certain investments even if it’s not what is best for their clients. Betterment only chooses investments that are in their customer’s best interest and align with their financial goals. Today, they are the largest, independent online investment advisor managing more than $8 billion in assets for over 240,000 customers.

Betterment uses technology to make investing easier and more efficient, while also helping to increase after-tax returns. They offer a wide range of financial planning services that are personalized to their customer’s life goals. To start an investment plan, customers can input their age, retirement status, and annual income and Betterment will recommend how much money to invest and which type of account is the right choice. They will invest and manage it in a way that many traditional investment services can’t at a lower cost.

The engineers at Betterment are constantly working to build industry-changing technology as quickly as possible to help customers maximize their money. AWS gives Betterment the flexibility to easily provision infrastructure and offload functions to various services that once required entire teams to manage. When they first started in the cloud, Betterment was using standard implementations of Amazon EC2, Amazon RDS, and Amazon S3. Since they’ve gone all in with AWS, they have been leveraging services like Amazon Redshift, AWS Lambda, AWS Database Migration Service, Amazon Kinesis, Amazon DynamoDB, and more. Today, they are using over 20 AWS services to develop, test, and deploy features and enhancements on a daily basis.

Learn more about Betterment here.

ClearSlide (San Francisco, CA)
ClearSlide is one of today’s leading sales engagement platforms, offering a complete and integrated tool that makes every customer interaction successful. Since their founding in 2009, ClearSlide has looked for ways to improve customer experiences and have developed numerous enablement tools for sales leaders and teams, marketing, customer support teams, and more. The platform puts content, communication channels, and insights at their customer’s fingertips to help drive better decisions and manage opportunities. ClearSlide serves thousands of companies including Comcast, the Sacramento Kings, The Economist, and so far their customers have generated over 750 million minutes of engagement!

ClearSlide offers a solution for all parts of the sales process. For sales leaders, ClearSlide provides engagement dashboards to improve deal visibility, coaching, and sales forecast accuracy. For marketing and sales enablement teams, they guide sellers to the right content, at the right time, in the right context, and provide insight to maximize content ROI. For sales reps, ClearSlide integrates communications, content, and analytics in a single platform experience. Communications can be made across email, in-person or online meetings, web, or social. Today, ClearSlide customers report a 10-20% increase in closed deals, 25% decrease in onboarding time for new reps, and a 50-80% reduction in selling costs.

ClearSlide uses a range of AWS services, but Amazon EC2 and Amazon RDS have made the biggest impact on their business. EC2 enables them to easily scale compute capacity, which is critical for a fast-growing startup. It also provides consistency during deployment – from development and integration to staging and production. RDS reduces overhead and allows ClearSlide to scale their database infrastructure. Since AWS takes care of time-consuming database management tasks, ClearSlide sees a reduction in operations costs and can focus on being more strategic with their customers.

Watch this video to learn how LiveIntent reduced sales cycles by 22% using ClearSlide. Get all the latest updates by following them on Twitter!

Thanks for checking out another month of awesome AWS-powered startups!

-Tina

 

AWS and the General Data Protection Regulation (GDPR)

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-and-the-general-data-protection-regulation/

European Union image

Just over a year ago, the European Commission approved and adopted the new General Data Protection Regulation (GDPR). The GDPR is the biggest change in data protection laws in Europe since the 1995 introduction of the European Union (EU) Data Protection Directive, also known as Directive 95/46/EC. The GDPR aims to strengthen the security and protection of personal data in the EU and will replace the Directive and all local laws relating to it.

AWS welcomes the arrival of the GDPR. The new, robust requirements raise the bar for data protection, security, and compliance, and will push the industry to follow the most stringent controls, helping to make everyone more secure. I am happy to announce today that all AWS services will comply with the GDPR when it becomes enforceable on May 25, 2018.

In this blog post, I explain the work AWS is doing to help customers with the GDPR as part of our continued commitment to help ensure they can comply with EU Data Protection requirements.

What has AWS been doing?

AWS continually maintains a high bar for security and compliance across all of our regions around the world. This has always been our highest priority—truly “job zero.” The AWS Cloud infrastructure has been architected to offer customers the most powerful, flexible, and secure cloud-computing environment available today. AWS also gives you a number of services and tools to enable you to build GDPR-compliant infrastructure on top of AWS.

One tool we give you is a Data Processing Agreement (DPA). I’m happy to announce today that we have a DPA that will meet the requirements of the GDPR. This GDPR DPA is available now to all AWS customers to help you prepare for May 25, 2018, when the GDPR becomes enforceable. For additional information about the new GDPR DPA or to obtain a copy, contact your AWS account manager.

In addition to account managers, we have teams of compliance experts, data protection specialists, and security experts working with customers across Europe to answer their questions and help them prepare for running workloads in the AWS Cloud after the GDPR comes into force. To further answer customers’ questions, we have updated our EU Data Protection website. This website includes information about what the GDPR is, the changes it brings to organizations operating in the EU, the services AWS offers to help you comply with the GDPR, and advice about how you can prepare.

Another topic we cover on the EU Data Protection website is AWS’s compliance with the CISPE Code of Conduct. The CISPE Code of Conduct helps cloud customers ensure that their cloud infrastructure provider is using appropriate data protection standards to protect their data in a manner consistent with the GDPR. AWS has declared that Amazon EC2, Amazon S3, Amazon RDS, AWS Identity and Access Management (IAM), AWS CloudTrail, and Amazon Elastic Block Storage (Amazon EBS) are fully compliant with the CISPE Code of Conduct. This declaration provides customers with assurances that they fully control their data in a safe, secure, and compliant environment when they use AWS. For more information about AWS’s compliance with the CISPE Code of Conduct, go to the CISPE website.

As well as giving customers a number of tools and services to build GDPR-compliant environments, AWS has achieved a number of internationally recognized certifications and accreditations. In the process, AWS has demonstrated compliance with third-party assurance frameworks such as ISO 27017 for cloud security, ISO 27018 for cloud privacy, PCI DSS Level 1, and SOC 1, SOC 2, and SOC 3. AWS also helps customers meet local security standards such as BSI’s Common Cloud Computing Controls Catalogue (C5) that is important in Germany. We will continue to pursue certifications and accreditations that are important to AWS customers.

What can you do?

Although the GDPR will not be enforceable until May 25, 2018, we are encouraging our customers and partners to start preparing now. If you have already implemented a high bar for compliance, security, and data privacy, the move to GDPR should be simple. However, if you have yet to start your journey to GDPR compliance, we urge you to start reviewing your security, compliance, and data protection processes now to ensure a smooth transition in May 2018.

You should consider the following key points in preparation for GDPR compliance:

  • Territorial reach – Determining whether the GDPR applies to your organization’s activities is essential to ensuring your organization’s ability to satisfy its compliance obligations.
  • Data subject rights – The GDPR enhances the rights of data subjects in a number of ways. You will need to make sure you can accommodate the rights of data subjects if you are processing their personal data.
  • Data breach notifications – If you are a data controller, you must report data breaches to the data protection authorities without undue delay and in any event within 72 hours of you becoming aware of a data breach.
  • Data protection officer (DPO) – You may need to appoint a DPO who will manage data security and other issues related to the processing of personal data.
  • Data protection impact assessment (DPIA) – You may need to conduct and, in some circumstances, you might be required to file with the supervisory authority a DPIA for your processing activities.
  • Data processing agreement (DPA) – You may need a DPA that will meet the requirements of the GDPR, particularly if personal data is transferred outside the European Economic Area.

AWS offers a wide range of services and features to help customers meet requirements of the GDPR, including services for access controls, monitoring, logging, and encryption. For more information about these services and features, see EU Data Protection.

At AWS, security, data protection, and compliance are our top priorities, and we will continue to work vigilantly to ensure that our customers are able to enjoy the benefits of AWS securely, compliantly, and without disruption in Europe and around the world. As we head toward May 2018, we will share more news and resources with you to help you comply with the GDPR.

– Steve

Manage Access to Your RDS for MySQL and Amazon Aurora Databases Using AWS IAM

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/manage-access-to-your-rds-for-mysql-and-amazon-aurora-databases-using-aws-iam/

RDS service image

Starting today, Amazon RDS enables you to use AWS Identity and Access Management (IAM) to manage database access for Amazon RDS for MySQL database instances and Amazon Aurora database clusters. By using IAM, you can manage user access to all AWS resources from a single location, without needing to manage users in the database. This includes expanding and restricting permission levels, associating permissions with different roles, and revoking access. IAM authentication also allows easier and safer integration with your applications running on Amazon EC2.

To learn more, see the full announcement.

– Craig

Sign up Today – Preview of Amazon Aurora with PostgreSQL Compatibility

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/sign-up-today-preview-of-amazon-aurora-with-postgresql-compatibility/

Last year we announced that we would be bringing PostgreSQL compatibility to Amazon Aurora. At that time I invited you to sign up for our private preview so that you could take a closer look.

The response to that request was strong! Our customers already understood that Amazon Aurora would provide them with high availability and high durability, and were looking forward to running their PostgreSQL 9.6 applications in the AWS Cloud.

Opening up the Preview
Today we are opening up the preview of Amazon Aurora with PostgreSQL Compatibility to all interested customers and you can sign up today. The preview runs in the US East (Northern Virginia) Region and delivers two to three times the performance of PostgreSQL running in traditional environments. It also supports quick, easy creation of fast, low-latency read replicas.

Amazon RDS Performance Insights Included
The preview includes our new Amazon RDS Performance Insights tool. You will be able to use this tool to understand your database performance at a very detailed level, up to and including the ability to look inside of each query. You can use the Performance Insights dashboard to visualize the database load and to filter it by SQL statements, waits, users, or hosts:

Jeff;

 

 

AWS Hot Startups – March 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-march-2017/

As the madness of March rounds up, take a break from all the basketball and check out the cool startups Tina Barr brings you for this month!

-Ana


The arrival of spring brings five new startups this month:

  • Amino Apps – providing social networks for hundreds of thousands of communities.
  • Appboy – empowering brands to strengthen customer relationships.
  • Arterys – revolutionizing the medical imaging industry.
  • Protenus – protecting patient data for healthcare organizations.
  • Syapse – improving targeted cancer care with shared data from across the country.

In case you missed them, check out February’s hot startups here.

Amino Apps (New York, NY)
Amino Logo
Amino Apps was founded on the belief that interest-based communities were underdeveloped and outdated, particularly when it came to mobile. CEO Ben Anderson and CTO Yin Wang created the app to give users access to hundreds of thousands of communities, each of them a complete social network dedicated to a single topic. Some of the largest communities have over 1 million members and are built around topics like popular TV shows, video games, sports, and an endless number of hobbies and other interests. Amino hosts communities from around the world and is currently available in six languages with many more on the way.

Navigating the Amino app is easy. Simply download the app (iOS or Android), sign up with a valid email address, choose a profile picture, and start exploring. Users can search for communities and join any that fit their interests. Each community has chatrooms, multimedia content, quizzes, and a seamless commenting system. If a community doesn’t exist yet, users can create it in minutes using the Amino Creator and Manager app (ACM). The largest user-generated communities are turned into their own apps, which gives communities their own piece of real estate on members’ phones, as well as in app stores.

Amino’s vast global network of hundreds of thousands of communities is run on AWS services. Every day users generate, share, and engage with an enormous amount of content across hundreds of mobile applications. By leveraging AWS services including Amazon EC2, Amazon RDS, Amazon S3, Amazon SQS, and Amazon CloudFront, Amino can continue to provide new features to their users while scaling their service capacity to keep up with user growth.

Interested in joining Amino? Check out their jobs page here.

Appboy (New York, NY)
In 2011, Bill Magnuson, Jon Hyman, and Mark Ghermezian saw a unique opportunity to strengthen and humanize relationships between brands and their customers through technology. The trio created Appboy to empower brands to build long-term relationships with their customers and today they are the leading lifecycle engagement platform for marketing, growth, and engagement teams. The team recognized that as rapid mobile growth became undeniable, many brands were becoming frustrated with the lack of compelling and seamless cross-channel experiences offered by existing marketing clouds. Many of today’s top mobile apps and enterprise companies trust Appboy to take their marketing to the next level. Appboy manages user profiles for nearly 700 million monthly active users, and is used to power more than 10 billion personalized messages monthly across a multitude of channels and devices.

Appboy creates a holistic user profile that offers a single view of each customer. That user profile in turn powers contextual cross-channel messaging, lifecycle engagement automation, and robust campaign insights and optimization opportunities. Appboy offers solutions that allow brands to create push notifications, targeted emails, in-app and in-browser messages, news feed cards, and webhooks to enhance the user experience and increase customer engagement. The company prides itself on its interoperability, connecting to a variety of complimentary marketing tools and technologies so brands can build the perfect stack to enable their strategies and experiments in real time.

AWS makes it easy for Appboy to dynamically size all of their service components and automatically scale up and down as needed. They use an array of services including Elastic Load Balancing, AWS Lambda, Amazon CloudWatch, Auto Scaling groups, and Amazon S3 to help scale capacity and better deal with unpredictable customer loads.

To keep up with the latest marketing trends and tactics, visit the Appboy digital magazine, Relate. Appboy was also recently featured in the #StartupsOnAir video series where they gave insight into their AWS usage.

Arterys (San Francisco, CA)
Getting test results back from a physician can often be a time consuming and tedious process. Clinicians typically employ a variety of techniques to manually measure medical images and then make their assessments. Arterys founders Fabien Beckers, John Axerio-Cilies, Albert Hsiao, and Shreyas Vasanawala realized that much more computation and advanced analytics were needed to harness all of the valuable information in medical images, especially those generated by MRI and CT scanners. Clinicians were often skipping measurements and making assessments based mostly on qualitative data. Their solution was to start a cloud/AI software company focused on accelerating data-driven medicine with advanced software products for post-processing of medical images.

Arterys’ products provide timely, accurate, and consistent quantification of images, improve speed to results, and improve the quality of the information offered to the treating physician. This allows for much better tracking of a patient’s condition, and thus better decisions about their care. Advanced analytics, such as deep learning and distributed cloud computing, are used to process images. The first Arterys product can contour cardiac anatomy as accurately as experts, but takes only 15-20 seconds instead of the 45-60 minutes required to do it manually. Their computing cloud platform is also fully HIPAA compliant.

Arterys relies on a variety of AWS services to process their medical images. Using deep learning and other advanced analytic tools, Arterys is able to render images without latency over a web browser using AWS G2 instances. They use Amazon EC2 extensively for all of their compute needs, including inference and rendering, and Amazon S3 is used to archive images that aren’t needed immediately, as well as manage costs. Arterys also employs Amazon Route 53, AWS CloudTrail, and Amazon EC2 Container Service.

Check out this quick video about the technology that Arterys is creating. They were also recently featured in the #StartupsOnAir video series and offered a quick demo of their product.

Protenus (Baltimore, MD)
Protenus Logo
Protenus founders Nick Culbertson and Robert Lord were medical students at Johns Hopkins Medical School when they saw first-hand how Electronic Health Record (EHR) systems could be used to improve patient care and share clinical data more efficiently. With increased efficiency came a huge issue – an onslaught of serious security and privacy concerns. Over the past two years, 140 million medical records have been breached, meaning that approximately 1 in 3 Americans have had their health data compromised. Health records contain a repository of sensitive information and a breach of that data can cause major havoc in a patient’s life – namely identity theft, prescription fraud, Medicare/Medicaid fraud, and improper performance of medical procedures. Using their experience and knowledge from former careers in the intelligence community and involvement in a leading hedge fund, Nick and Robert developed the prototype and algorithms that launched Protenus.

Today, Protenus offers a number of solutions that detect breaches and misuse of patient data for healthcare organizations nationwide. Using advanced analytics and AI, Protenus’ health data insights platform understands appropriate vs. inappropriate use of patient data in the EHR. It also protects privacy, aids compliance with HIPAA regulations, and ensures trust for patients and providers alike.

Protenus built and operates its SaaS offering atop Amazon EC2, where Dedicated Hosts and encrypted Amazon EBS volume are used to ensure compliance with HIPAA regulation for the storage of Protected Health Information. They use Elastic Load Balancing and Amazon Route 53 for DNS, enabling unique, secure client specific access points to their Protenus instance.

To learn more about threats to patient data, read Hospitals’ Biggest Threat to Patient Data is Hiding in Plain Sight on the Protenus blog. Also be sure to check out their recent video in the #StartupsOnAir series for more insight into their product.

Syapse (Palo Alto, CA)
Syapse provides a comprehensive software solution that enables clinicians to treat patients with precision medicine for targeted cancer therapies — treatments that are designed and chosen using genetic or molecular profiling. Existing hospital IT doesn’t support the robust infrastructure and clinical workflows required to treat patients with precision medicine at scale, but Syapse centralizes and organizes patient data to clinicians at the point of care. Syapse offers a variety of solutions for oncologists that allow them to access the full scope of patient data longitudinally, view recommended treatments or clinical trials for similar patients, and track outcomes over time. These solutions are helping health systems across the country to improve patient outcomes by offering the most innovative care to cancer patients.

Leading health systems such as Stanford Health Care, Providence St. Joseph Health, and Intermountain Healthcare are using Syapse to improve patient outcomes, streamline clinical workflows, and scale their precision medicine programs. A group of experts known as the Molecular Tumor Board (MTB) reviews complex cases and evaluates patient data, documents notes, and disseminates treatment recommendations to the treating physician. Syapse also provides reports that give health system staff insight into their institution’s oncology care, which can be used toward quality improvement, business goals, and understanding variables in the oncology service line.

Syapse uses Amazon Virtual Private Cloud, Amazon EC2 Dedicated Instances, and Amazon Elastic Block Store to build a high-performance, scalable, and HIPAA-compliant data platform that enables health systems to make precision medicine part of routine cancer care for patients throughout the country.

Be sure to check out the Syapse blog to learn more and also their recent video on the #StartupsOnAir video series where they discuss their product, HIPAA compliance, and more about how they are using AWS.

Thank you for checking out another month of awesome hot startups!

-Tina Barr

 

How to Use Service Control Policies in AWS Organizations to Enforce Healthcare Compliance in Your AWS Account

Post Syndicated from Aaron Lima original https://aws.amazon.com/blogs/security/how-to-use-service-control-policies-in-aws-organizations-to-enforce-healthcare-compliance-in-your-aws-account/

AWS customers with healthcare compliance requirements such as the U.S. Health Insurance Portability and Accountability Act (HIPAA) and Good Laboratory, Clinical, and Manufacturing Practices (GxP) might want to control access to the AWS services their developers use to build and operate their GxP and HIPAA systems. For example, customers with GxP requirements might approve AWS as a supplier on the basis of AWS’s SOC certification and therefore want to ensure that only the services in scope for SOC are available to developers of GxP systems. Likewise, customers with HIPAA requirements might want to ensure that only AWS HIPAA Eligible Services are available to store and process protected health information (PHI). Now with AWS Organizations—policy-based management for multiple AWS accounts—you can programmatically control access to the services within your AWS accounts.

In this blog post, I show how to restrict an AWS account to HIPAA Eligible Services as well as explain why you should include additional supporting AWS services with service control policies (SCPs) in AWS Organizations. Although this example is HIPAA related, you can repurpose it for GxP, a database of Genotypes and Phenotypes (dbGaP) solutions, or other healthcare compliance requirements for which you want to control developers’ access to a specific scope of services.

Managing an account hierarchy with AWS Organizations

Let’s say I manage four AWS accounts: a Payer account, a Development account, a Corporate IT account, and a fourth account that contains PHI. In accordance with AWS’s Business Associate Agreement (BAA), I want to be sure that only AWS HIPAA Eligible Services are allowed in the fourth account along with supporting AWS services that help encrypt and control access to the account. The following diagram shows a logical view of the associated account structure.

Diagram showing the logical view of the account structure

As illustrated in the preceding diagram, Organizations allows me to create this account hierarchy between the four AWS accounts I manage. Before I proceed to show how to create and apply an SCP to the HIPAA account in this hierarchy, I’ll define some Organizations terminology that I use in this post:

  • Organization – A consolidated set of AWS accounts that you manage. For the preceding example, I have already created my organization and invited my accounts. For more information about creating an organization and inviting accounts, see AWS Organizations – Policy-Based Management for Multiple AWS Accounts.
  • Master account – The management hub for Organizations. This is where I invite existing accounts, create new accounts and manage my SCPs. I run all commands demonstrated in this post from this master account. This is also my payer account in the preceding account structure diagram.
  • Service control policy (SCP) – A set of controls that the organization’s master account can apply to the organization, selected OUs, and selected accounts. SCPs allow me to whitelist or blacklist services and actions that I can delegate to the users and roles in the account to which the SCPs are applied. The resultant security permissions for a user and role are the union of the permissions in an SCP and the permissions in an AWS Identity and Access Management (IAM) policy. I refer to SCPs as a policy type in some of this post’s command-line arguments.
  • Organizational unit (OU) – A container for a set of AWS accounts. OUs can be arranged into a hierarchy that can be as many as five levels deep. The top of the hierarchy of OUs is also known as the administrative root. In the walkthrough, I create a HIPAA OU and apply my policy to that OU. I then move the account into the OU to have the policy applied. To manage the organization depicted above, I might create OUs for my Corporate IT account and my Development account.

To restrict services in the fourth account to HIPAA Eligible Services and required supporting services, I will show how to create and apply an SCP to the account with the following steps:

  1. Create a JSON document that lists HIPAA Eligible Services and supporting AWS services.
  2. Create an SCP with a JSON document.
  3. Create an OU for the HIPAA account, and move the account into the OU.
  4. Attach the SCP to the HIPAA OU.
  5. Verify which SCPs are attached to the HIPAA OU.
  6. Detach the default FullAWSAccess SCP from the OU.
  7. Verify SCP enforcement.

How to create and apply an SCP to an account

Let’s walk through the steps to create an SCP and apply it to an account. I can manage my organization by using the Organizations console, AWS CLI, or AWS API from my master account. For the purposes of this post, I will demonstrate the creation and application of an SCP to my account by using the AWS CLI.

1.  Create a JSON document that lists HIPAA Eligible Services and supporting AWS services

Creating an SCP will be familiar if you have experience writing an IAM policy because the grammar in crafting the policy is similar. I will create a JSON document that lists only the services I want to allow in my account, and I will use this JSON document to create my SCP via the command line. The SCP I create from this document allows all actions for all resources of the listed services, effectively turning on only these services in my account. I name the document HIPAAExample.json and save it to the directory from which I will demonstrate the CLI commands.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                 "dynamodb:*","rds:*","ec2:*","s3:*","elasticmapreduce:*",
                 "glacier:*","elasticloadbalancing:*", "cloudwatch:*",
                 "importexport:*", "cloudformation:*", "redshift:*",
                 "iam:*", "health:*", "config:*", "snowball:*",
                 "trustedadvisor:*", "kms:*", "apigateway:*",
                 "autoscaling:*", "directconnect:*",
	         "execute-api:*", "sts:*"
             ],
             "Effect": "Allow",
             "Resource": "*"
        }
    ]
}

Note that the SCP includes more than just the HIPAA Eligible Services.

Why include additional supporting services in a HIPAA SCP?

You can use any service in your account, but you can use only HIPAA Eligible Services to store and process PHI. Some services, such as IAM and AWS Key Management Service (KMS), can be used because these services do not directly store or process PHI, but they might still be needed for administrative and security purposes.

To those ends, I include the following supporting services in the SCP to help me with account administration and security:

  • Access controls – I include IAM to ensure that I can manage access to resources in the account. Though Organizations can limit whether a service is available, I still need the granularity of access control that IAM provides.
  • Encryption – I need a way to encrypt the data. The integration of AWS KMS with Amazon Redshift, Amazon RDS, and Amazon Elastic Block Store (Amazon EBS) helps with this security requirement.
  • Auditing – I also need to be able to demonstrate controls in practice, track changes, and discover any malicious activity in my account. You will note that AWS CloudTrail is not included in the SCP, which prohibits any mutating actions against CloudTrail from users within the account. However, when setting up the account, CloudTrail was set up to send logs to a logging account as recommended in AWS Multiple Account Security Strategy. The logs do not reside in the account, and no one has privileges to change the trail including root or administrators, which helps ensure the protection of the API logging of the account. This highlights how SCPs can be used to secure services in an account.
  • Automation – Automation can help me with my security controls as shown in How to Translate HIPAA Controls to AWS CloudFormation Templates: Part 3 of the Automating HIPAA Compliance Series; therefore, I consider including AWS CloudFormation as a way to ensure that applications deployed in the account adhere to my security and compliance policies. Auto Scaling also is an important service to include to help me scale to meet demand and control cost.
  • Monitoring and support – The remaining services in the SCP such as Amazon CloudWatch are needed to make sure that I can monitor the environment and have visibility into the health of the workloads and applications in my AWS account, helping me maintain operational control. AWS Trusted Advisor is a service that helps to make sure that my cloud environment is well architected.

Now that I have created my JSON document with the services that I will include and explained in detail why I include them, I can create my SCP.

2.  Create an SCP with a JSON document

I will now create the SCP via the CLI with the aws organizations create-policy command. Using the name parameter, I name the SCP and define that I am creating an SCP, both of which are required parameters. I then provide a brief description of the SCP and specify the location of the JSON document I created in Step 1.

aws organizations create-policy --name hipaa-example-policy --type SERVICE_CONTROL_POLICY --description
 
"All HIPAA eligible services plus supporting AWS Services." --content file://./HIPAAExample.json

Output

{
    "policy": {
        "policySummary": {
            "type": "SERVICE_CONTROL_POLICY",
            "arn": "arn:aws:organizations::012345678900:policy/o-kzceys2q4j/SERVICE_CONTROL_POLICY/p-6ldl8bll",
            "name": "hipaa-example-policy",
            "awsManaged": false,
            "id": "p-6ldl8bll", "description": "All HIPAA eligible services and supporting AWS services."

I take note of the policy-id because I need it to attach the SCP to my OU in Step 4. Note: Throughout this post, fictitious placeholder values are shown for the purposes of demonstrating this post’s solution.

3.  Create an OU for the HIPAA account, and move the account into the OU

Grouping accounts by function will make it easier to manage the organization and apply policies across multiple accounts. In this step, I create an OU for the HIPAA account and move the target account into the OU. To create an OU, I need to know the ID for the parent object under which I will be placing the OU. In this case, I will place it under the root and need the ID for the root. To get the root ID, I run the list-roots command.

aws organizations list-roots

Output

{
    "Roots": [
        {
            "PolicyTypes": [
                {
                    "Status": "ENABLED", 
                    "Type": "SERVICE_CONTROL_POLICY"
                }
            ], 
            "Id": "r-rth4", 
            "Arn": "arn:aws:organizations::012345678900:root/o-p9bx61i0h1/r-rth4", 
            "Name": "Root"
        }
    ]
}

With the root ID, I can proceed to create the OU under the root.

aws organizations create-organizational-unit --parent-id r-rth4 --name HIPAA-Accounts

Output

{
    "OrganizationalUnit": {
       "Id": "ou-rth4-ezo5wonz", 
        "Arn": "arn:aws:organizations::012345678900:ou/o-p9bx61i0h1/ou-rth4-ezo5wonz", 
        "Name": "HIPAA-Accounts"
    }
}

I take note of the OU ID in the output because I need it in the next command to move my target account. I will also need the root ID in the command because I am moving the target account from the root into the OU.

aws organizations move-account --account-id 098765432110 --source-parent-id r-rth4 --destination-parent-id 
ou-rth4-ezo5wonz

No Output

 

4.  Attach the SCP to the HIPAA OU

Even though you may have enabled All Features in your organization, you still need to enable SCPs at the root level of the organization to attach SCPs to objects. To do this in my case, I will run the enable-policy-type command and provide the root ID.

aws organizations enable-policy-type --root-id r-rth4 --policy-type SERVICE_CONTROL_POLICY

Output

{
    "Root": {
        "PolicyTypes": [], 
        "Id": "r-rth4", 
        "Arn": "arn:aws:organizations::012345678900:root/o-p9bx61i0h1/r-rth4", 
        "Name": "Root"
    }
}

Now, I will attach the SCP to the OU by using the aws organizations attach-policy command. I must include the target-id, which is the OU ID noted in the previous step and the policy-id from the output of the command in Step 2.

aws organizations attach-policy --target-id ou-rth4-ezo5wonz --policy-id p-6ldl8bll

No Output

 

5.  Verify which SCPs are attached to the HIPAA OU

I will now verify which SCPs are attached to my account by using the aws organization list-policies-for-target command. I must provide the OU ID with the target-id parameter and then filter for SERVICE_CONTROL_POLICY type.

aws organizations list-policies-for-target --target-id ou-rth4-ezo5wonz --filter SERVICE_CONTROL_POLICY

Output

{
    "policies": [
        {
            "awsManaged": false,
            "arn": "arn:aws:organizations::012345678900:policy/o-kzceys2q4j/SERVICE_CONTROL_POLICY/p-6ldl8bll",
            "id": "p-6ldl8bll",
            "description": "All HIPAA eligible services plus supporting AWS Services.",
            "name": "hipaa-example-policy",
            "type": "SERVICE_CONTROL_POLICY"
        },
        {
            "awsManaged": true,
            "arn": "arn:aws:organizations::aws:policy/SERVICE_CONTROL_POLICY/p-FullAWSAccess",
            "id": "p-FullAWSAccess",
            "description": "Allows access to every operation",
            "name": "FullAWSAccess",
            "type": "SERVICE_CONTROL_POLICY"
        }
    ]
}

As the output shows, two SCPs are attached to this account. I want to detach the FullAWSAccess SCP so that the HIPAA SCP is properly in effect. The FullAWSAccess SCP is an Allow SCP that allows all AWS services. If I were to leave the default FullAWSAccess SCP in place, it would grant access to services I do not want to allow in my account. Detaching the FullAWSAccess SCP means that only the services I allow in the hipaa-example-policy are allowed in my account. Note that if I were to create a Deny SCP, the SCP would take precedence over an Allow SCP.

6.  Detach the default FullAWSAccess SCP from the OU

Before detaching the default FullAWSAccess SCP from my account, I run the aws workspaces describe-workspaces call from the Amazon WorkSpaces API. I am currently not running any WorkSpaces, so the output shows an empty list. However, I will test this again after I detach the FullAWSAccess SCP from my account and am left with only the HIPAA SCP attached to the account.

aws workspaces describe-workspaces

Output

{
    "Workspaces": []
}

In order to detach the FullAWSAccess SCP, I must run the aws organizations detach-policy command, providing it the policy-id and target-id of the OU.

aws organizations detach-policy --policy-id p-FullAWSAccess --target-id ou-rth4-ezo5wonz

No Output

 

If I rerun the list-policies-for-target command again, I see that only one SCP is attached to my account that allows HIPAA Eligible Services, as shown in the following output.

aws organizations list-policies-for-target --target-id ou-rth4-ezo5wonz --filter SERVICE_CONTROL_POLICY

Output

 

{
    "policies": [
        {
            "name": "hipaa-example-policy",
            "arn": "arn:aws:organizations::012345678900:policy/o-kzceys2q4j/SERVICE_CONTROL_POLICY/p-6ldl8bll",
            "description": "All HIPAA eligible services plus supporting AWS Services.",
            "awsManaged": false,
            "id": "p-6ldl8bll",
            "type": "SERVICE_CONTROL_POLICY"
        }
    ]
}

Now I can test and verify the enforcement of this SCP.

7.  Verify SCP enforcement

Previously, the administrator of the account had full access to all AWS services, including Amazon WorkSpaces. His IAM policy for Amazon WorkSpaces allows all actions for Amazon WorkSpaces. However, after I apply the HIPAA SCP to the account, this changes the effect of the IAM policy to deny all actions for Amazon WorkSpaces because it is not an allowed service.

The following screenshot of the IAM policy simulator shows which permissions are set for the administrator after I apply the HIPAA SCP. Also, note that the IAM policy simulator shows that the Deny permission is being denied by Organizations. Because the policy simulator is aware of the SCPs attached to an account, it is a good tool to use when troubleshooting or validating an SCP.

If I run the aws workspaces describe-workspaces call again as I did in Step 5, this time I receive an AccessDeniedException error, which validates that the HIPAA SCP is working because Amazon WorkSpaces is not an allowed service in the SCP.

aws workspaces describe-workspaces

Output

An error occurred (AccessDeniedException) when calling the DescribeWorkspaces operation: 
User: arn:aws:iam::098765432110:user/admin is not authorized to perform: workspaces:DescribeWorkspaces 
on resource: arn:aws:workspaces:us-east-1:098765432110:workspace/*

This completes the process of creating and applying an SCP to my account.

Summary

In this blog post, I have shown how to create an SCP and attach it to an OU to restrict an account to HIPAA Eligible Services and additional supporting services. I also showed how to create an OU, move an account into the OU, and then validate the SCP attached to the OU. For more information, see AWS Cloud Computing in Healthcare.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues with implementing this solution, please start a new thread on the IAM forum.

– Aaron

Amazon Aurora Update – More Cross Region & Cross Account Support, T2.Small DB Instances, Another Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-update-more-cross-region-cross-account-support-t2-small-db-instances-another-region/

I’m in catch-up mode again, and would like to tell you about some recent improvements that we have made to Amazon Aurora. As a reminder, Aurora is our high-performance MySQL-compatible (and soon PostgreSQL-compatible) enterprise-class database (read Now Available – Amazon Aurora and Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon RDS for an introduction).

Here’s are the newest additions to Aurora:

  • Cross Region Snapshot Copy
  • Cross Region Replication for Encrypted Databases
  • Cross Account Encrypted Snapshot Sharing
  • Availability in the US West (Northern California) Region
  • T2.Small Instance Support

Let’s take a quick look at each one!

Cross Region Snapshot Copy
You can now copy Amazon Aurora snapshots (either automatic or manual) from one region to another. Select the snapshot, choose Copy Snapshot from the Snapshot Actions menu, pick a region, and enter a name for the new snapshot:

You can also choose to encrypt the snapshot as part of this operation. To learn more, read Copying a DB Snapshot or DB Cluster Snapshot.

Cross Region Replication for Encrypted Databases
You can already enable encryption when you create a fresh Amazon Aurora DB Instance:

You can now create a read replica in another region with a couple of clicks. You can use this to build multi-region, highly available systems or to move the data closer to the user. To create a cross region read replica, simply select the existing DB Instance and choose Create Cross Region Read Replica from the menu:

Then you choose the destination region in the Network & Security settings, and click on Create:

The destination region must include a DB Subnet Group that encompasses 2 or more Availability Zones.

To learn more about this powerful new feature, read Replicating Amazon Aurora DB Clusters Across AWS Regions.

Cross Account Encrypted Snapshot Sharing
You already have the ability to configure periodic, automated snapshots when you create your Amazon Aurora DB Instance. You can also create snapshots at any desired time with a couple of clicks:

If the DB Instance is encrypted, the snapshot will be as well.

You can now share encrypted snapshots with other AWS accounts. In order to use this feature, the DB Instance (and therefore the snapshot) must be encrypted with a Master Key other than the default RDS key. Select the snapshot and choose Share Snapshot from the Snapshot Actions menu:

Then enter the target AWS Account ID(s), clicking Add after each one, and click on Save to share the snapshot:

You will also need to share the key that was used to encrypt the snapshot. To learn more about this feature, read Sharing a DB Snapshot or DB Cluster Snapshot.

Availability in the US West (Northern California Region)
You can now launch Amazon Aurora DB Instances in the US West (Northern California) Region. Here’s the full list of regions where Aurora is available:

  • US East (Northern Virginia)
  • US East (Ohio)
  • US West (Oregon)
  • US West (Northern California)
  • Canada (Central)
  • EU (Ireland)
  • EU (London)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Mumbai)

See the Amazon Aurora Pricing page for pricing info in each region.

T2.Small Instance Support
You can now launch t2.small DB Instances:

These economical instances are a great fit for dev & test environments and for light production workloads. You can also use them to gain some experience with Amazon Aurora. These instances (along with six others, including the t2.medium that we launched last November) are available in all AWS regions where Aurora is available.

On-Demand pricing for t2.small DB Instances starts at $0.041 per hour in the US East (Northern Virginia) Region, dropping to $0.018 per hour for an All Upfront Reserved Instance with a 3 year term (see the Amazon Aurora Pricing page for more info).

Jeff;