Tag Archives: AWS Database Migration Service

How to map out your migration of Oracle PeopleSoft to AWS

Post Syndicated from Ashok Shanmuga Sundaram original https://aws.amazon.com/blogs/architecture/how-to-map-out-your-migration-of-oracle-peoplesoft-to-aws/

Oracle PeopleSoft Enterprise is a widely used enterprise resource planning (ERP) application. Customers run production deployments of various PeopleSoft applications on AWS, including PeopleSoft Human Capital Management (HCM), Financials and Supply Chain Management (FSCM), Interactive Hub (IAH), and Customer Relationship Management (CRM).

We published a whitepaper on Best Practices for Running Oracle PeopleSoft on AWS in December 2017. It provides architectural guidance and outlines best practices for high availability, security, scalability, and disaster recovery for running Oracle PeopleSoft applications on AWS.

It also covers highly available, scalable, and cost-effective multi-region reference architectures for deploying PeopleSoft applications on AWS, like the one illustrated below.

While migrating your Oracle PeopleSoft applications to AWS, here are some things to keep in mind:

  • Multi-AZ deployments – Deploy your PeopleSoft servers and database across multiple Availability Zones (AZs) for high availability. AWS AZs allow you to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
  • Use Amazon Relational Database Service (Amazon RDS) to deploy your PeopleSoft databaseAmazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, allowing you to focus on your applications and business. Deploying an RDS for Oracle Database in multiple AZs simplifies creating a highly available architecture because you’ll have built-in support for automated failover from your primary database to a synchronously replicated secondary database in an alternative AZ.
  • Migration of large databases – Migrating large databases to Amazon RDS within a small downtime window requires careful planning:
    • We recommend that you take a point-in-time export of your database, transfer it to AWS, import it into Amazon RDS, and then apply the delta changes from on-premises.
    • Use AWS Direct Connect or AWS Snowball to transfer the export dump to AWS.
    • Use AWS Database Migration Service to apply the delta changes and sync the on-premises database with the Amazon RDS instance.
  • AWS Infrastructure Event Management (IEM) – Take advantage of AWS IEM to mitigate risks and help ensure a smooth migration. IEM is a highly focused engagement where AWS experts provide you with architectural and operational guidance, assist you in reviewing and fine-tuning your migration plan, and provide real-time support for your migration.
  • Cost optimization – There are a number of ways you can optimize your costs on AWS, including:
    • Use reserved instances for environments that are running most of the time, like production environments. A Reserved Instance is an EC2 offering that provides you with a significant discount (up to 75%) on EC2 usage compared to On-Demand pricing when you commit to a one-year or three-year term.
    • Shut down resources that are not in use. For example, development and test environments are typically used for only eight hours a day during the work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40 hours vs. 168 hours). Use the AWS Instance Scheduler to automatically start and stop your Amazon EC2 and Amazon RDS instances based on a schedule.

The Configuring Amazon RDS as an Oracle PeopleSoft Database whitepaper has detailed instructions on configuring a backend Amazon RDS database for your Oracle PeopleSoft deployment on AWS. After you read the whitepaper, I recommend these other resources as your next step:

  • For a real-world case study on migrating a large Oracle database to AWS, check out this blog post about how AFG migrated their mission-critical Oracle Siebel CRM system running on Oracle Exadata on-premises to Amazon RDS for Oracle.
  • For more information on running Oracle Enterprise Solutions on AWS, check out this re:Invent 2017 video.
  • You can find more Oracle on AWS resources here and here.

About the author

Ashok Shanmuga Sundaram is a partner solutions architect with the Global System Integrator (GSI) team at Amazon Web Services. He works with the GSIs to provide guidance on enterprise cloud adoption, migration and strategy.

Migrating a multi-tier application from a Microsoft Hyper-V environment using AWS SMS and AWS Migration Hub

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/migrating-a-multi-tier-application-from-a-microsoft-hyper-v-environment-using-aws-sms-and-aws-migration-hub/

Shane Baldacchino is a Solutions Architect at Amazon Web Services

Many customers ask for guidance to migrate end-to-end solutions running in their on-premises data center to AWS. This post provides an overview of moving a common blogging platform, WordPress, running on an on-premises virtualized Microsoft Hyper-V platform to AWS, including re-pointing the DNS records associated to the website.

AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. In November 2017, AWS added support for Microsoft’s Hyper-V hypervisor. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations. In this post, I guide you through migrating your multi-tier workloads using both AWS SMS and AWS Migration Hub.

Migration Hub provides a single location to track the progress of application migrations across multiple AWS and partner solutions. In this post, you use AWS SMS as a mechanism to migrate the virtual machines (VMs) and track them via Migration Hub. You can also use other third-party tools in Migration Hub, and choose the migration tools that best fit your needs. Migration Hub allows you to get progress updates across all migrations, identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

Migration Hub and AWS SMS are both free. You pay only for the cost of the individual migration tools that you use, and any resources being consumed on AWS.

Walkthrough

For this walkthrough, the WordPress blog is currently running as a two-tier stack in a corporate data center. The example environment is multi-tier and polyglot in nature. The frontend uses Windows Server 2016 (running IIS 10 with PHP as an ISAPI extension) and the backend is supported by a MySQL server running on Ubuntu 16.04 LTS. All systems are hosted on a virtualized platform. As the environment consists of multiple servers, you can use Migration Hub to group the servers together as an application and manage the holistic process of migrating the application.
The key elements of this migration process involve the following steps:

  1. Establish your AWS environment.
  2. Replicate your database.
  3. Download the SMS Connector from the AWS Management Console.
  4. Configure AWS SMS and Hyper-V permissions.
  5. Install and configure the SMS Connector appliance.
  6. Configure Hyper-V host permissions.
  7. Import your virtual machine inventory and create a replication job.
  8. Use AWS Migration Hub to track progress.
  9. Launch your Amazon EC2 instance.
  10. Change your DNS records to resolve the WordPress blog to your EC2 instance.

Before you start, ensure that your source systems OS and hypervisor version are supported by AWS SMS. For more information, see the Server Migration Service FAQ. This post focuses on the Microsoft Hyper-V hypervisor.

Establish your AWS environment

First, establish your AWS environment. If your organization is new to AWS, this may include account or subaccount creation, a new virtual private cloud (VPC), and associated subnets, route tables, internet gateways, and so on. Think of this phase as setting up your software-defined data center. For more information, see Getting Started with Amazon EC2 Linux Instances.

The blog is a two-tier stack, so go with two private subnets. Because you want it to be highly available, use multiple Availability Zones. An Availability Zone resides within an AWS Region. Each Availability Zone is isolated, but the zones within a Region are connected through low-latency links. This allows architects and solution designers to build highly available solutions.

Replicate your database

WordPress uses a MySQL relational database. You could continue to manage MySQL and the associated EC2 instances associated with maintaining and scaling a database. But for this walkthrough, I am using this opportunity to migrate to an RDS instance of Amazon Aurora, as it is a MySQL-compliant database. Not only is Amazon Aurora a high-performant database engine but it frees you up to focus on application development by managing time-consuming database administration tasks, including backups, software patching, monitoring, scaling, and replication.

Use AWS Database Migration Service (AWS DMS) to migrate your MySQL database to Amazon Aurora easily and securely. You can send the results from AWS DMS to Migration Hub. This allows you to create a single pane view of your application migration.

After a database migration instance has been instantiated, configure the source and destination endpoints and create a replication task.

By attaching to the MySQL binlog, you can seed in the current data in the database and also capture all future state changes in near–real time. For more information, see Migrating a MySQL-Compatible Database to Amazon Aurora.

Finally, the task shows that you are replicating current data in your WordPress blog database and future changes from MySQL into Amazon Aurora.

Download the SMS Connector from the AWS Management Console

Now, use AWS SMS to migrate your IIS/PHP frontend. AWS SMS is delivered as a virtual appliance that can be deployed in your Hyper-V environment.

To download the SMS Connector, log in to the console and choose Server Migration Service, Connectors, SMS Connector setup guide. Download the VHD file for SCVMM/Hyper-V.

Configure SMS

Your hypervisor and AWS SMS need an appropriate user with sufficient privileges to perform migrations:

Launch a new VM in Hyper-V based on the SMS Connector that you downloaded. To configure the connector, connect to it via HTTPS. You can obtain the SMS Connector IP address from within Hyper-V. By default, the SMS Connector uses DHCP to obtain a valid IP address.

Connect to the SMS Connector via HTTPS. In the example above, the connector IP address is 10.0.0.88. In your browser, enter https://10.0.0.88. As the SMS Connector can only work with one hypervisor at a time, you must state the hypervisor with which to interface. For the purpose of this post, the examples use Microsoft Hyper-V.

Configure the connector with the IAM and hypervisor credentials that you created earlier.

After you have entered in both your AWS and Hyper-V credentials and the associated connectivity and authentication checks have passed, you are redirected to the home page of your SMS Connector. The home page provides you a status on connectivity and the health of the SMS Connector.

Configure Hyper-V host permissions

You also must modify your Hyper-V hosts to provide WinRM connectivity. AWS provides a downloadable PowerShell script to configure your Windows environment to support WinRM communications with the SMS Connector. The same script is used for configuring either standalone Hyper-V or SCVMM.

Execute the PowerShell script and follow the prompts. In the following example, Reconfigure Hyper-V not managed by SCVMM (Standalone Hyper-V)… was selected.

Import your virtual machine inventory and create a replication job

You have now configured the SMS Connector and your Microsoft Hyper-V hosts. Switch to the console to import your server catalog to AWS SMS. Within AWS SMS, choose Connectors, Import Server Catalog.

This process can take up to a few minutes and is dependent on the number of machines in your Hyper-V inventory.

Select the server to migrate and choose Create replication job. The console guides you through the process. The time that the initial replication task takes to complete is dependent on the available bandwidth and the size of your VM. After the initial seed replication, network bandwidth is minimized as AWS SMS replicates only incremental changes occurring on the VM.

Use Migration Hub to track progress

You have now successfully started your database migration via AWS DMS, set up your SMS Connector, configured your Microsoft Hyper-V environment, and started a replication job.

You can now track the collective progress of your application migration. To track migration progress, connect AWS DMS and AWS SMS to Migration Hub.

To do this, navigate to Migration Hub in the AWS Management Console. Under Migrate and Tools, connect both services so that the migration status of these services is sent to Migration Hub.

You can then group your servers into an application in Migration Hub and collectively track the progress of your migration. In this example, I created an application, Company Blog, and added in my servers from both AWS SMS and AWS DMS.

The progress updates from linked services are automatically sent to Migration Hub so that you can track tasks in progress. The dashboard reflects any status changes that occur in the linked services. You can see from the following image that one server is complete while another is in progress.

Using Migration Hub, you can view the migration progress of all applications. This allows you to quickly get progress updates across all of your migrations, easily identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

Launch your EC2 instance

When your replication task is complete, the artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, using the custom AMI created by AWS SMS, noting that you may need to replace any host-based firewalls with security groups and NACLs.

When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance requirements while optimizing for cost.

While your new EC2 instance is a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differ on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, RDP, and Telnet.

From the RDS console, get your connection string details and update your WordPress configuration file to point to the Amazon Aurora database. As WordPress is expecting a MySQL database and Amazon Aurora is MySQL-compliant, this change of database engine is transparent to WordPress.

Change your DNS records to resolve the WordPress blog to your EC2 instance

You have validated that your WordPress application is running correctly, as you are still receiving changes from your on-premises data center via AWS DMS into your Amazon Aurora database. You can now update your DNS zone file using Amazon Route 53. Amazon Route 53 can be driven by multiple methods: console, SDK, or AWS CLI.

For this walkthrough, use Windows PowerShell for AWS to update the DNS zone file. The example shows UPSERTING the A record in the zone to resolve to the Amazon EC2 instance created with AWS SMS.

Based on the TTL of your DNS zone file, end users slowly resolve the WordPress blog to AWS.

Summary

You have now successfully migrated your WordPress blog to AWS using AWS migration services, specifically the AWS SMS Hyper-V/SCVMM Connector. Your blog now resolves to AWS. After validation, you are ready to decommission your on-premises resources.

Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example, by using Amazon CloudWatch metrics to drive scaling policies, you can use an Application Load Balancer as your frontend. This removes the single point of failure for a single EC2 instance

AWS Achieves Spain’s ENS High Certification Across 29 Services

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/aws-achieves-spains-ens-high-certification-across-29-services/

AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.

The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.

The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:

  • Amazon API Gateway
  • Amazon DynamoDB
  • Amazon Elastic Container Service
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic File System
  • Amazon Elastic MapReduce
  • Amazon ElastiCache
  • Amazon Glacier
  • Amazon Redshift
  • Amazon Relational Database Service
  • Amazon Simple Queue Service
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service
  • Amazon Virtual Private Cloud
  • Amazon WorkSpaces
  • AWS CloudFormation
  • AWS CloudTrail
  • AWS Config
  • AWS Database Migration Service
  • AWS Direct Connect
  • AWS Directory Service
  • AWS Elastic Beanstalk
  • AWS Key Management Service
  • AWS Lambda
  • AWS Snowball
  • AWS Storage Gateway
  • Elastic Load Balancing
  • VM Import/Export

Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

  • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
  • Save data in an Amazon S3
  • Query data using Amazon Redshift Spectrum.

We use the following services:

Serverless architecture for capturing and analyzing Aurora data changes

Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

Creating an Aurora database

First, create a database by following these steps in the Amazon RDS console:

  1. Sign in to the AWS Management Console, and open the Amazon RDS console.
  2. Choose Launch a DB instance, and choose Next.
  3. For Engine, choose Amazon Aurora.
  4. Choose a DB instance class. This example uses a small, since this is not a production database.
  5. In Multi-AZ deployment, choose No.
  6. Configure DB instance identifier, Master username, and Master password.
  7. Launch the DB instance.

After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

The following screenshot shows the MySQL Workbench configuration:

Next, create a table in the database by running the following SQL statement:

Create Table
CREATE TABLE Sales (
InvoiceID int NOT NULL AUTO_INCREMENT,
ItemID int NOT NULL,
Category varchar(255),
Price double(10,2), 
Quantity int not NULL,
OrderDate timestamp,
DestinationState varchar(2),
ShippingType varchar(255),
Referral varchar(255),
PRIMARY KEY (InvoiceID)
)

You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

#!/usr/bin/python
import MySQLdb
import random
import datetime

db = MySQLdb.connect(host="AURORA_CNAME",
                     user="DBUSER",
                     passwd="DBPASSWORD",
                     db="DB")

states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
"IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
"NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
"WA","WV","WI","WY")

shipping_types = ("Free", "3-Day", "2-Day")

product_categories = ("Garden", "Kitchen", "Office", "Household")
referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")

for i in range(0,10):
    item_id = random.randint(1,100)
    state = states[random.randint(0,len(states)-1)]
    shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
    product_category = product_categories[random.randint(0,len(product_categories)-1)]
    quantity = random.randint(1,4)
    referral = referrals[random.randint(0,len(referrals)-1)]
    price = random.randint(1,100)
    order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()

    data_order = (item_id, product_category, price, quantity, order_date, state,
    shipping_type, referral)

    add_order = ("INSERT INTO Sales "
                   "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                   ShippingType, Referral) "
                   "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")

    cursor = db.cursor()
    cursor.execute(add_order, data_order)

    db.commit()

cursor.close()
db.close() 

The following screenshot shows how the table appears with the sample data:

Sending data from Amazon Aurora to Amazon S3

There are two methods available to send data from Amazon Aurora to Amazon S3:

  • Using a Lambda function
  • Using SELECT INTO OUTFILE S3

To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

Creating a Kinesis data delivery stream

The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

To create a delivery stream:

  1. Open the Kinesis Data Firehose console
  2. Choose Create delivery stream.
  3. For Delivery stream name, type AuroraChangesToS3.
  4. For Source, choose Direct PUT.
  5. For Record transformation, choose Disabled.
  6. For Destination, choose Amazon S3.
  7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
  8. Enter a prefix if needed, and choose Next.
  9. For Data compression, choose GZIP.
  10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
  11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

 

Creating a Lambda function

Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

To create the Lambda function:

  1. Open the AWS Lambda console.
  2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
  3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
  4. Choose Author from scratch.
  5. Give your function a name and select Python 3.6 for Runtime
  6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
  7. Choose Next on the trigger selection screen.
  8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
  9. Choose File -> Save in the code editor and then choose Save.
import boto3
import json

firehose = boto3.client('firehose')
stream_name = ‘AuroraChangesToS3’


def Kinesis_publish_message(event, context):
    
    firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
    event['Category'], event['Price'], event['Quantity'],
    event['OrderDate'], event['DestinationState'], event['ShippingType'], 
    event['Referral']))
    
    firehose_data = {'Data': str(firehose_data)}
    print(firehose_data)
    
    firehose.put_record(DeliveryStreamName=stream_name,
    Record=firehose_data)

Note the Amazon Resource Name (ARN) of this Lambda function.

Giving Aurora permissions to invoke a Lambda function

To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

Creating a stored procedure and a trigger in Amazon Aurora

Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
DELIMITER ;;
CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
									IN Category varchar(255), 
									IN Price double(10,2),
                                    IN Quantity int(11),
                                    IN OrderDate timestamp,
                                    IN DestinationState varchar(2),
                                    IN ShippingType varchar(255),
                                    IN Referral  varchar(255)) LANGUAGE SQL 
BEGIN
  CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
     CONCAT('{ "ItemID" : "', ItemID, 
            '", "Category" : "', Category,
            '", "Price" : "', Price,
            '", "Quantity" : "', Quantity, 
            '", "OrderDate" : "', OrderDate, 
            '", "DestinationState" : "', DestinationState, 
            '", "ShippingType" : "', ShippingType, 
            '", "Referral" : "', Referral, '"}')
     );
END
;;
DELIMITER ;

Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

DROP TRIGGER IF EXISTS TR_Sales_CDC;
 
DELIMITER ;;
CREATE TRIGGER TR_Sales_CDC
  AFTER INSERT ON Sales
  FOR EACH ROW
BEGIN
  SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
  , New.DestinationState, New.ShippingType, New.Referral
  INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral;
  CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral);
END
;;
DELIMITER ;

If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

Querying data in Amazon Redshift

In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

Next, connect to the Amazon Redshift cluster, and create an external schema and database:

create external schema if not exists spectrum_schema
from data catalog 
database 'spectrum_db' 
region 'us-east-1'
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
create external database if not exists;

Don’t forget to replace the IAM role in the statement.

Then create an external table within the database:

 CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Price DOUBLE PRECISION,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://{BUCKET_NAME}/CDC/'

Query the table, and it should contain data. This is a fact table.

select top 10 * from spectrum_schema.ecommerce_sales

 

Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

CREATE TABLE date_dimension (
  d_datekey           integer       not null sortkey,
  d_dayofmonth        integer       not null,
  d_monthnum          integer       not null,
  d_dayofweek                varchar(10)   not null,
  d_prettydate        date       not null,
  d_quarter           integer       not null,
  d_half              integer       not null,
  d_year              integer       not null,
  d_season            varchar(10)   not null,
  d_fiscalyear        integer       not null)
diststyle all;

Populate the table with data:

copy date_dimension from 's3://reparmar-lab/2016dates' 
iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
DELIMITER ','
dateformat 'auto';

The date dimension table should look like the following:

Querying data in local and external tables using Amazon Redshift

Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

select sum(quantity*price) as total_sales, date_dimension.d_season
from spectrum_schema.ecommerce_sales 
join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
group by date_dimension.d_season

You get the following results:

Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

Enter the database connection details, validate the connection, and create the data source.

Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

On the next screen, choose Edit.

In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

Final notes

Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

Keep the following in mind:

  • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
  • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
  • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


About the Authors

Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

 

 

 

Now Open AWS EU (Paris) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-eu-paris-region/

Today we are launching our 18th AWS Region, our fourth in Europe. Located in the Paris area, AWS customers can use this Region to better serve customers in and around France.

The Details
The new EU (Paris) Region provides a broad suite of AWS services including Amazon API Gateway, Amazon Aurora, Amazon CloudFront, Amazon CloudWatch, CloudWatch Events, Amazon CloudWatch Logs, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon ECS, Amazon Elastic Block Store (EBS), Amazon EMR, Amazon ElastiCache, Amazon Elasticsearch Service, Amazon Glacier, Amazon Kinesis Streams, Polly, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Route 53, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, AWS Elastic Beanstalk, AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS Lambda, AWS Marketplace, AWS OpsWorks Stacks, AWS Personal Health Dashboard, AWS Server Migration Service, AWS Service Catalog, AWS Shield Standard, AWS Snowball, AWS Snowball Edge, AWS Snowmobile, AWS Storage Gateway, AWS Support (including AWS Trusted Advisor), Elastic Load Balancing, and VM Import.

The Paris Region supports all sizes of C5, M5, R4, T2, D2, I3, and X1 instances.

There are also four edge locations for Amazon Route 53 and Amazon CloudFront: three in Paris and one in Marseille, all with AWS WAF and AWS Shield. Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

The Paris Region will benefit from three AWS Direct Connect locations. Telehouse Voltaire is available today. AWS Direct Connect will also become available at Equinix Paris in early 2018, followed by Interxion Paris.

All AWS infrastructure regions around the world are designed, built, and regularly audited to meet the most rigorous compliance standards and to provide high levels of security for all AWS customers. These include ISO 27001, ISO 27017, ISO 27018, SOC 1 (Formerly SAS 70), SOC 2 and SOC 3 Security & Availability, PCI DSS Level 1, and many more. This means customers benefit from all the best practices of AWS policies, architecture, and operational processes built to satisfy the needs of even the most security sensitive customers.

AWS is certified under the EU-US Privacy Shield, and the AWS Data Processing Addendum (DPA) is GDPR-ready and available now to all AWS customers to help them prepare for May 25, 2018 when the GDPR becomes enforceable. The current AWS DPA, as well as the AWS GDPR DPA, allows customers to transfer personal data to countries outside the European Economic Area (EEA) in compliance with European Union (EU) data protection laws. AWS also adheres to the Cloud Infrastructure Service Providers in Europe (CISPE) Code of Conduct. The CISPE Code of Conduct helps customers ensure that AWS is using appropriate data protection standards to protect their data, consistent with the GDPR. In addition, AWS offers a wide range of services and features to help customers meet the requirements of the GDPR, including services for access controls, monitoring, logging, and encryption.

From Our Customers
Many AWS customers are preparing to use this new Region. Here’s a small sample:

Societe Generale, one of the largest banks in France and the world, has accelerated their digital transformation while working with AWS. They developed SG Research, an application that makes reports from Societe Generale’s analysts available to corporate customers in order to improve the decision-making process for investments. The new AWS Region will reduce latency between applications running in the cloud and in their French data centers.

SNCF is the national railway company of France. Their mobile app, powered by AWS, delivers real-time traffic information to 14 million riders. Extreme weather, traffic events, holidays, and engineering works can cause usage to peak at hundreds of thousands of users per second. They are planning to use machine learning and big data to add predictive features to the app.

Radio France, the French public radio broadcaster, offers seven national networks, and uses AWS to accelerate its innovation and stay competitive.

Les Restos du Coeur, a French charity that provides assistance to the needy, delivering food packages and participating in their social and economic integration back into French society. Les Restos du Coeur is using AWS for its CRM system to track the assistance given to each of their beneficiaries and the impact this is having on their lives.

AlloResto by JustEat (a leader in the French FoodTech industry), is using AWS to to scale during traffic peaks and to accelerate their innovation process.

AWS Consulting and Technology Partners
We are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in France. Here’s a partial list:

AWS Premier Consulting PartnersAccenture, Capgemini, Claranet, CloudReach, DXC, and Edifixio.

AWS Consulting PartnersABC Systemes, Atos International SAS, CoreExpert, Cycloid, Devoteam, LINKBYNET, Oxalide, Ozones, Scaleo Information Systems, and Sopra Steria.

AWS Technology PartnersAxway, Commerce Guys, MicroStrategy, Sage, Software AG, Splunk, Tibco, and Zerolight.

AWS in France
We have been investing in Europe, with a focus on France, for the last 11 years. We have also been developing documentation and training programs to help our customers to improve their skills and to accelerate their journey to the AWS Cloud.

As part of our commitment to AWS customers in France, we plan to train more than 25,000 people in the coming years, helping them develop highly sought after cloud skills. They will have access to AWS training resources in France via AWS Academy, AWSome days, AWS Educate, and webinars, all delivered in French by AWS Technical Trainers and AWS Certified Trainers.

Use it Today
The EU (Paris) Region is open for business now and you can start using it today!

Jeff;

 

Now Available – Amazon Aurora with PostgreSQL Compatibility

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-amazon-aurora-with-postgresql-compatibility/

Late last year I told you about our plans to add PostgreSQL compatibility to Amazon Aurora. We launched the private beta shortly after that announcement, and followed it up earlier this year with an open preview. We’ve received lots of great feedback during the beta and the preview and have done our best to make sure that the product meets your needs and exceeds your expectations!

Now Generally Available
I am happy to report that Amazon Aurora with PostgreSQL Compatibility is now generally available and that you can use it today in four AWS Regions, with more to follow. It is compatible with PostgreSQL 9.6.3 and scales automatically to support up to 64 TB of storage, with 6-way replication behind the scenes to improve performance and availability.

Just like Amazon Aurora with MySQL compatibility, this edition is fully managed and is very easy to set up and to use. On the performance side, you can expect up to 3x the throughput that you’d get if you ran PostgreSQL on your own (you can read Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases to learn more about how we did this).

You can launch a PostgreSQL-compatible Amazon Aurora instance from the RDS Console by selecting Amazon Aurora as the engine and PostgreSQL-compatible as the edition, and clicking on Next:

Then choose your instance class, single or Multi-AZ deployment (good for dev/test and production, respectively), set the instance name, and the administrator credentials, and click on Next:

You can choose between six instance classes (2 to 64 vCPUs and 15.25 to 488 GiB of memory):

The db.r4 instance class is new addition to Aurora and to RDS, and gives you an additional size at the top-end. The db.r4.16xlarge will give you additional write performance, and may allow you to use a single Aurora database instead of two or more sharded databases.

You can also set many advanced options on the next page, starting with network options such as the VPC and public accessibility:

You can set the cluster name and other database options. Encryption is easy to use and enabled by default; you can use the built-in default master key or choose one of your own:

You can also set failover behavior, the retention period for snapshot backups, and choose to enable collection of detailed (OS-level) metrics via Enhanced Monitoring:

After you have set it up to your liking, click on Launch DB Instance to proceed!

The new instances (primary and secondary since I specified Multi-AZ) are up and running within minutes:

Each PostgreSQL-compatible instance publishes 44 metrics to CloudWatch automatically:

With enhanced monitoring enabled, each instance collects additional per-instance and per-process metrics. It can be enabled when the instance is launched, or afterward, via Modify Instance. Here are some of the metrics collected when enhanced monitoring is enabled:

Clicking on Manage Graphs lets you choose which metrics are shown:

Per-process metrics are also available:

You can scale your read capacity by creating up to 15 Aurora replicas:

The cluster provides a single reader endpoint that you can access in order to load-balance requests across the replicas:

Performance Insights
As I noted earlier, Performance Insights is turned on automatically. This Amazon Aurora feature is wired directly into the database engine and allows you to look deep inside of each query, seeing the database resources that it uses and how they contribute to the overall response time. Here’s the initial view:

I can slice the view by SQL query in order to see how many concurrent copies of each query are running:

There are more views and options than I can fit in this post; to learn more take a look at Using Performance Insights.

Migrating to Amazon Aurora with PostgreSQL Compatibility
AWS Database Migration Service and the Schema Conversion Tool are ready to help you to move data stored in commercial and open-source databases to Amazon Aurora. The Schema Conversion Tool will perform a quick assessment of your database schemas and your code in order to help you to choose between MySQL and PostgreSQL. Our new, limited-time, Free DMS program allows you to use DMS and SCT to migrate to Aurora at no cost, with access to several types of DMS Instances for up to 6 months.

If you are already using PostgreSQL, you will be happy to hear that we support a long list of extensions including PostGIS and dblink.

Available Now
You can use Amazon Aurora with PostgreSQL Compatibility today in the US East (Northern Virginia), EU (Ireland), US West (Oregon), and US East (Ohio) Regions, with others to follow as soon as possible.

Jeff;

AWS Cost Explorer Update – Better Filtering & Grouping, Report Management, RI Reports

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cost-explorer-update-better-filtering-grouping-report-management-ri-reports/

Our customers use Cost Explorer to better understand and manage their AWS spending, making heavy use of the reporting, analytics, and visualization tools that it provides. We launched Cost Explorer in 2014 with a focus on simplicity – single click signup, preconfigured default views, and a clean user interface (take a look back at The New AWS Cost Explorer to see where we started). The Cost Explorer has been very popular and we’ve received a lot of great feedback from our customers.

Last week we launched a major upgrade to Cost Explorer. We’ve redesigned the user interface to optimize many common workflows including filtering, report management, selection of date ranges, and grouping of data. We have also included some default reports to make it easier for you to explore the costs related to your use of Reserved Instances.

Looking at Cost Explorer
Since pictures are reportedly worth 1000 words, let’s take a closer look! Cost Explorer is part of the Billing Dashboard so I can start there:

Here’s the Billing Dashboard. I click on Cost Explorer to move ahead:

I can open up Cost Explorer or access one of three preconfigured views. I’ll go for the first option:

The default report shows my EC2 costs and usage (running hours) for the past 3 months:

I can use the Group By menu to break the costs down by EC2 instance type:

I have many other grouping options:

The filtering options are now easier to access and to edit. Here’s the full set:

I can explore my EC2 costs in any set of desired regions:

I can filter and then group by instance type to see how my spending breaks down:

I can click on Download CSV and then process the data locally:

I can also exclude certain instance types from the report. Here’s how I exclude my m4.xlarge, t2.micro, and t2.nano usage:

Report Management
Cost Explorer allows me to customize my existing reports and to create new reports from scratch. I can click on Save As to save my customized report with a new name:

I can see and manage all of my reports on the Saved Reports page (The padlock denotes a default report that cannot be edited and then overwritten):

When I click on New report I can start from a template:

After I click on Create Report, I set up my date range and filters as desired, and click on Save As. I created a report that displays my year-to-date usage of several AWS database services (Amazon Redshift, DynamoDB Accelerator (DAX), Amazon Relational Database Service (RDS), and AWS Database Migration Service):

All of my reports are accessible from the Reports menu so I can check on my costs with a click:

We also simplified the process of selecting a range of dates for a report, including options to select common date ranges:

Reserved Instance Reports
Cost Explorer also includes a pair of reports that will help you to understand and optimize your usage of Reserved Instances. I don’t own an RI’s so I used screen shots supplied by the team.

The RI Utilization report allows you to see how much of your purchased RI capacity is being put to use (the dashed red line represents a utilization target that you can specify):

The RI Coverage report tells you how much of your EC2 usage is being handled by Reserved Instances (this time, the dashed red line represents the desired amount of coverage):

I hope you have enjoyed this tour of the updated Cost Explorer. It is available now and you can start using it today!

Jeff;

AWS Migration Hub – Plan & Track Enterprise Application Migration

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-migration-hub-plan-track-enterprise-application-migration/

About once a week, I speak to current and potential AWS customers in our Seattle Executive Briefing Center. While I generally focus on our innovation process, we sometimes discuss other topics, including application migration. When enterprises decide to migrate their application portfolios they want to do it in a structured, orderly fashion. These portfolios typically consist of hundreds of complex Windows and Linux applications, relational databases, and more. Customers find themselves eager yet uncertain as to how to proceed. After spending time working with these customers, we have learned that their challenges generally fall in to three major categories:

Discovery – They want to make sure that they have a deep and complete understanding of all of the moving parts that power each application.

Server & Database Migration – They need to transfer on-premises workloads and database tables to the cloud.

Tracking / Management – With large application portfolios and multiple migrations happening in parallel, they need to track and manage progress in an application-centric fashion.

Over the last couple of years we have launched a set of tools that address the first two challenges. The AWS Application Discovery Service automates the process of discovering and collecting system information, the AWS Server Migration Service takes care of moving workloads to the cloud, and the AWS Database Migration Service moves relational databases, NoSQL databases, and data warehouses with minimal downtime. Partners like Racemi and CloudEndure also offer migration tools of their own.

New AWS Migration Hub
Today we are bringing this collection of AWS and partner migration tools together in the AWS Migration Hub. The hub provides access to the tools that I mentioned above, guides you through the migration process, and tracks the status of each migration, all in accord with the methodology and tenets described in our Migration Acceleration Program (MAP).

Here’s the main screen. It outlines the migration process (discovery, migration, and tracking):

Clicking on Start discovery reveals the flow of the migration process:

It is also possible to skip the Discovery step and begin the migration immediately:

The Servers list is populated using data from an AWS migration service (Server Migration Service or Database Migration Service), partner tools, or using data collected by the AWS Application Discovery Service:

I can on Group as application to create my first application:

Once I identify some applications to migrate, I can track them in the Migrations section of the Hub:

The migration tools, if authorized, automatically send status updates and results back to Migration Hub, for display on the migration status page for the application. Here you can see that Racemi DynaCenter and CloudEndure Migration have played their parts in the migration:

I can track the status of my migrations by checking the Migration Hub Dashboard:

Migration Hub works with migration tools from AWS and our Migration Partners; see the list of integrated partner tools to learn more:

Available Now
AWS Migration Hub can manage migrations in any AWS Region that has the necessary migration tools available; the hub itself runs in the US West (Oregon) Region. There is no charge for the Hub; you pay only for the AWS services that you consume in the course of the migration.

If you are ready to begin your migration to the cloud and are in need of some assistance, please take advantage of the services offered by our Migration Acceleration Partners. These organizations have earned their migration competency by repeatedly demonstrating their ability to deliver large-scale migration.

Jeff;

AWS HIPAA Eligibility Update (July 2017) – Eight Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-update-july-2017-eight-additional-services/

It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

In May we announced that we added Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon Simple Queue Service (SQS) to our list of HIPAA eligible services and discussed our how customers and partners are putting them to use.

Eight More Eligible Services
Today I am happy to share the news that we are adding another eight services to the list:

Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).

AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.

AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.

Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.

Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.

AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.

Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.

Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.

Additional HIPAA Resources
Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:

Keep in Touch
In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.

Jeff;

AWS Adds 12 More Services to Its PCI DSS Compliance Program

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/aws-adds-12-more-services-to-its-pci-dss-compliance-program/

Twelve more AWS services have obtained Payment Card Industry Data Security Standard (PCI DSS) compliance, giving you more options, flexibility, and functionality to process and store sensitive payment card data in the AWS Cloud. The services were audited by Coalfire to ensure that they meet strict PCI DSS standards.

The newly compliant AWS services are:

AWS now offers 42 services that meet PCI DSS standards, putting administrators in better control of their frameworks and making workloads more efficient and cost effective.

For more information about the AWS PCI DSS compliance program, see Compliance Resources, AWS Services in Scope by Compliance Program, and PCI DSS Compliance.

– Sara

Four HIPAA Eligible Services Recently Added to the AWS Business Associate Agreement

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/four-hipaa-eligible-services-recently-added-to-the-aws-business-associate-agreement/

HIPAA logo

We are pleased to announce that the following four AWS services have been added in recent weeks to the AWS Business Associate Agreement (BAA):

As with all HIPAA Eligible Services covered under the BAA, Protected Health Information (PHI) must be encrypted while at rest or in transit. See Architecting for HIPAA Security and Compliance on Amazon Web Services, which explains how you can configure each AWS HIPAA Eligible Service to store, process, and transmit PHI.

For more details, see the full AWS Blog post.

– Chad

Roundup of AWS HIPAA Eligible Service Announcements

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/roundup-of-aws-hipaa-eligible-service-announcements/

At AWS we have had a number of HIPAA eligible service announcements. Patrick Combes, the Healthcare and Life Sciences Global Technical Leader at AWS, and Aaron Friedman, a Healthcare and Life Sciences Partner Solutions Architect at AWS, have written this post to tell you all about it.

-Ana


We are pleased to announce that the following AWS services have been added to the BAA in recent weeks: Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon SQS. All four of these services facilitate moving data into and through AWS, and we are excited to see how customers will be using these services to advance their solutions in healthcare. While we know the use cases for each of these services are vast, we wanted to highlight some ways that customers might use these services with Protected Health Information (PHI).

As with all HIPAA-eligible services covered under the AWS Business Associate Addendum (BAA), PHI must be encrypted while at-rest or in-transit. We encourage you to reference our HIPAA whitepaper, which details how you might configure each of AWS’ HIPAA-eligible services to store, process, and transmit PHI. And of course, for any portion of your application that does not touch PHI, you can use any of our 90+ services to deliver the best possible experience to your users. You can find some ideas on architecting for HIPAA on our website.

Amazon API Gateway
Amazon API Gateway is a web service that makes it easy for developers to create, publish, monitor, and secure APIs at any scale. With PHI now able to securely transit API Gateway, applications such as patient/provider directories, patient dashboards, medical device reports/telemetry, HL7 message processing and more can securely accept and deliver information to any number and type of applications running within AWS or client presentation layers.

One particular area we are excited to see how our customers leverage Amazon API Gateway is with the exchange of healthcare information. The Fast Healthcare Interoperability Resources (FHIR) specification will likely become the next-generation standard for how health information is shared between entities. With strong support for RESTful architectures, FHIR can be easily codified within an API on Amazon API Gateway. For more information on FHIR, our AWS Healthcare Competency partner, Datica, has an excellent primer.

AWS Direct Connect
Some of our healthcare and life sciences customers, such as Johnson & Johnson, leverage hybrid architectures and need to connect their on-premises infrastructure to the AWS Cloud. Using AWS Direct Connect, you can establish private connectivity between AWS and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.

In addition to a hybrid-architecture strategy, AWS Direct Connect can assist with the secure migration of data to AWS, which is the first step to using the wide array of our HIPAA-eligible services to store and process PHI, such as Amazon S3 and Amazon EMR. Additionally, you can connect to third-party/externally-hosted applications or partner-provided solutions as well as securely and reliably connect end users to those same healthcare applications, such as a cloud-based Electronic Medical Record system.

AWS Database Migration Service (DMS)
To date, customers have migrated over 20,000 databases to AWS through the AWS Database Migration Service. Customers often use DMS as part of their cloud migration strategy, and now it can be used to securely and easily migrate your core databases containing PHI to the AWS Cloud. As your source database remains fully operational during the migration with DMS, you minimize downtime for these business-critical applications as you migrate your databases to AWS. This service can now be utilized to securely transfer such items as patient directories, payment/transaction record databases, revenue management databases and more into AWS.

Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service (SQS) is a message queueing service for reliably communicating among distributed software components and microservices at any scale. One way that we envision customers using SQS with PHI is to buffer requests between application components that pass HL7 or FHIR messages to other parts of their application. You can leverage features like SQS FIFO to ensure your messages containing PHI are passed in the order they are received and delivered in the order they are received, and available until a consumer processes and deletes it. This is important for applications with patient record updates or processing payment information in a hospital.

Let’s get building!
We are beyond excited to see how our customers will use our newly HIPAA-eligible services as part of their healthcare applications. What are you most excited for? Leave a comment below.

Near Zero Downtime Migration from MySQL to DynamoDB

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/near-zero-downtime-migration-from-mysql-to-dynamodb/

Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS.

However, migrations can have two common issues:

  • Service outage due to downtime, especially when customer service must be seamlessly available 24/7/365
  • Different key design between RDBMS and DynamoDB

This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.

AWS services

I’ve included sample code that uses the following AWS services:

  • AWS Database Migration Service (AWS DMS) can migrate your data to and from most widely used commercial and open-source databases. It supports homogeneous and heterogeneous migrations between different database platforms.
  • Amazon EMR is a managed Hadoop framework that helps you process vast amounts of data quickly. Build EMR clusters easily with preconfigured software stacks that include Hive and other business software.
  • Amazon Kinesis can continuously capture and retain a vast amount of data such as transaction, IT logs, or clickstreams for up to 7 days.
  • AWS Lambda helps you run your code without provisioning or managing servers. Your code can be automatically triggered by other AWS services such Amazon Kinesis Streams.

Migration solutions

Here are the two options I describe in this post:

  1. Use AWS DMS

AWS DMS supports migration to a DynamoDB table as a target. You can use object mapping to restructure original data to the desired structure of the data in DynamoDB during migration.

  1. Use EMR, Amazon Kinesis, and Lambda with custom scripts

Consider this method when more complex conversion processes and flexibility are required. Fine-grained user control is needed for grouping MySQL records into fewer DynamoDB items, determining attribute names dynamically, adding business logic programmatically during migration, supporting more data types, or adding parallel control for one big table.

After the initial load/bulk-puts are finished, and the most recent real-time data is caught up by the CDC (change data capture) process, you can change the application endpoint to DynamoDB.

The method of capturing changed data in option 2 is covered in the AWS Database post Streaming Changes in a Database with Amazon Kinesis. All code in this post is available in the big-data-blog GitHub repo, including test codes.

Solution architecture

The following diagram shows the overall architecture of both options.

Option 1:  Use AWS DMS

This section discusses how to connect to MySQL, read the source data, and then format the data for consumption by the target DynamoDB database using DMS.

Create the replication instance and source and target endpoints

Create a replication instance that has sufficient storage and processing power to perform the migration job, as mentioned in the AWS Database Migration Service Best Practices whitepaper. For example, if your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, consider using one of the larger instances. The service consumes a fair amount of memory and CPU.

As the MySQL user, connect to MySQL and retrieve data from the database with the privileges of SUPER, REPLICATION CLIENT. Enable the binary log and set the binlog_format parameter to ROW for CDC in the MySQL configuration. For more information about how to use DMS, see Getting Started  in the AWS Database Migration Service User Guide.

mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'welcome1';
mysql> GRANT all ON <database name>.* TO 'repl'@'%';
mysql> GRANT SUPER,REPLICATION CLIENT  ON *.* TO 'repl'@'%';

Before you begin to work with a DynamoDB database as a target for DMS, make sure that you create an IAM role for DMS to assume, and grant access to the DynamoDB target tables. Two endpoints must be created to connect the source and target. The following screenshot shows sample endpoints.

The following screenshot shows the details for one of the endpoints, source-mysql.

Create a task with an object mapping rule

In this example, assume that the MySQL table has a composite primary key (customerid + orderid + productid). You are going to restructure the key to the desired structure of the data in DynamoDB, using an object mapping rule.

In this case, the DynamoDB table has the hash key that is a combination of the customerid and orderid columns, and the sort key is the productid column. However, the partition key should be decided by the user in an actual migration, based on data ingestion and access pattern. You would usually use high-cardinality attributes. For more information about how to choose the right DynamoDB partition key, see the Choosing the Right DynamoDB Partition Key AWS Database blog post.

DMS automatically creates a corresponding attribute on the target DynamoDB table for the quantity column from the source table because rule-action is set to map-record-to-record and the column is not listed in the exclude-columns attribute list. For more information about map-record-to-record and map-record-to-document, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

Migration starts immediately after the task is created, unless you clear the Start task on create option. I recommend enabling logging to make sure that you are informed about what is going on with the migration task in the background.

The following screenshot shows the task creation page.

You can use the console to specify the individual database tables to migrate and the schema to use for the migration, including transformations. On the Guided tab, use the Where section to specify the schema, table, and action (include or exclude). Use the Filter section to specify the column name in a table and the conditions to apply.

Table mappings also can be created in JSON format. On the JSON tab, check Enable JSON editing.

Here’s an example of an object mapping rule that determines where the source data is located in the target. If you copy the code, replace the values of the following attributes. For more examples, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

  • schema-name
  • table-name
  • target-table-name
  • mapping-parameters
  • attribute-mappings
{
  "rules": [
   {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "2",
      "rule-action": "map-record-to-record",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
 
      },
      "target-table-name": "purchase",
      "mapping-parameters": {
        "partition-key-name": "customer_orderid",
        "sort-key-name": "productid",
        "exclude-columns": [
          "customerid",
          "orderid"           
        ],
        "attribute-mappings": [
          {
            "target-attribute-name": "customer_orderid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${customerid}|${orderid}"
          },
          {
            "target-attribute-name": "productid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${productid}"
          }
        ]
      }
    }
  ]
}

Start the migration task

If the target table specified in the target-table-name property does not exist in DynamoDB, DMS creates the table according to data type conversion rules for source and target data types. There are many metrics to monitor the progress of migration. For more information, see Monitoring AWS Database Migration Service Tasks.

The following screenshot shows example events and errors recorded by CloudWatch Logs.

DMS replication instances that you used for the migration should be deleted once all migration processes are completed. Any CloudWatch logs data older than the retention period is automatically deleted.

Option 2: Use EMR, Amazon Kinesis, and Lambda

This section discusses an alternative option using EMR, Amazon Kinesis, and Lambda to provide more flexibility and precise control. If you have a MySQL replica in your environment, it would be better to dump data from the replica.

Change the key design

When you decide to change your database from RDMBS to NoSQL, you need to find a more suitable key design for NoSQL, for performance as well as cost-effectiveness.

Similar to option #1, assume that the MySQL source has a composite primary key (customerid + orderid + productid). However, for this option, group the MySQL records into fewer DynamoDB items by customerid (hash key) and orderid (sort key). Also, remove the last column (productid) of the composite key by converting the record values productid column in MySQL to the attribute name in DynamoDB, and setting the attribute value as quantity.

This conversion method reduces the number of items. You can retrieve the same amount of information with fewer read capacity units, resulting in cost savings and better performance. For more information about how to calculate read/write capacity units, see Provisioned Throughput.

Migration steps

Option 2 has two paths for migration, performed at the same time:

  • Batch-puts: Export MySQL data, upload it to Amazon S3, and import into DynamoDB.
  • Real-time puts: Capture changed data in MySQL, send the insert/update/delete transaction to Amazon Kinesis Streams, and trigger the Lambda function to put data into DynamoDB.

To keep the data consistency and integrity, capturing and feeding data to Amazon Kinesis Streams should be started before the batch-puts process. The Lambda function should stand by and Streams should retain the captured data in the stream until the batch-puts process on EMR finishes. Here’s the order:

  1. Start real-time puts to Amazon Kinesis Streams.
  2. As soon as real-time puts commences, start batch-puts.
  3. After batch-puts finishes, trigger the Lambda function to execute put_item from Amazon Kinesis Streams to DynamoDB.
  4. Change the application endpoints from MySQL to DynamoDB.

Step 1:  Capture changing data and put into Amazon Kinesis Streams

Firstly, create an Amazon Kinesis stream to retain transaction data from MySQL. Set the Data retention period value based on your estimate for the batch-puts migration process. For data integrity, the retention period should be enough to hold all transactions until batch-puts migration finishes. However you do not necessarily need to select the maximum retention period. It depends on the amount of data to migrate.

In the MySQL configuration, set binlog_format to ROW to capture transactions by using the BinLogStreamReader module. The log_bin parameter must be set as well to enable the binlog. For more information, see the Streaming Changes in a Database with Amazon Kinesis AWS Database blog post.

 

[mysqld]
secure-file-priv = ""
log_bin=/data/binlog/binlog
binlog_format=ROW
server-id = 1
tmpdir=/data/tmp

The following sample code is a Python example that captures transactions and sends them to Amazon Kinesis Streams.

 

#!/usr/bin/env python
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (
  DeleteRowsEvent,
  UpdateRowsEvent,
  WriteRowsEvent,
)

def main():
  kinesis = boto3.client("kinesis")

  stream = BinLogStreamReader(
    connection_settings= {
      "host": "<host IP address>",
      "port": <port number>,
      "user": "<user name>",
      "passwd": "<password>"},
    server_id=100,
    blocking=True,
    resume_stream=True,
    only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])

  for binlogevent in stream:
    for row in binlogevent.rows:
      event = {"schema": binlogevent.schema,
      "table": binlogevent.table,
      "type": type(binlogevent).__name__,
      "row": row
      }

      kinesis.put_record(StreamName="<Amazon Kinesis stream name>", Data=json.dumps(event), PartitionKey="default")
      print json.dumps(event)

if __name__ == "__main__":
main()

The following code is sample JSON data generated by the Python script. The type attribute defines the transaction recorded by that JSON record:

  • WriteRowsEvent = INSERT
  • UpdateRowsEvent = UPDATE
  • DeleteRowsEvent = DELETE
{"table": "purchase_temp", "row": {"values": {"orderid": "orderidA1", "quantity": 100, "customerid": "customeridA74187", "productid": "productid1"}}, "type": "WriteRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"before_values": {"orderid": "orderid1", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}, "after_values": {"orderid": "orderid1", "quantity": 99, "customerid": "customerid74187", "productid": "productid1"}}, "type": "UpdateRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"values": {"orderid": "orderid100", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}}, "type": "DeleteRowsEvent", "schema": "test"}

Step 2. Dump data from MySQL to DynamoDB

The easiest way is to use DMS, which recently added Amazon S3 as a migration target. For an S3 target, both full load and CDC data is written to CSV format. However, CDC is not a good fit as UPDATE and DELETE statements are not supported. For more information, see Using Amazon S3 as a Target for AWS Database Migration Service.

Another way to upload data to Amazon S3 is to use the INTO OUTFILE SQL clause and aws s3 sync CLI command in parallel with your own script. The degree of parallelism depends on your server capacity and local network bandwidth. You might find a third-party tool useful, such as pt-archiver (part of the Percona Toolkit see the appendix for details).

SELECT * FROM purchase WHERE <condition_1>
INTO OUTFILE '/data/export/purchase/1.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
SELECT * FROM purchase WHERE <condition_2>
INTO OUTFILE '/data/export/purchase/2.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
...
SELECT * FROM purchase WHERE <condition_n>
INTO OUTFILE '/data/export/purchase/n.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';

I recommend the aws s3 sync command for this use case. This command works internally with the S3 multipart upload feature. Pattern matching can exclude or include particular files. In addition, if the sync process crashes in the middle of processing, you do not need to upload the same files again. The sync command compares the size and modified time of files between local and S3 versions, and synchronizes only local files whose size and modified time are different from those in S3. For more information, see the sync command in the S3 section of the AWS CLI Command Reference.

$ aws s3 sync /data/export/purchase/ s3://<your bucket name>/purchase/ 
$ aws s3 sync /data/export/<other path_1>/ s3://<your bucket name>/<other path_1>/
...
$ aws s3 sync /data/export/<other path_n>/ s3://<your bucket name>/<other path_n>/ 

After all data is uploaded to S3, put it into DynamoDB. There are two ways to do this:

  • Use Hive with an external table
  • Write MapReduce code

Hive with an external table

Create a Hive external table against the data on S3 and insert it into another external table against the DynamoDB table, using the org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler property. To improve productivity and the scalability, consider using Brickhouse, which is a collection of UDFs for Hive.

The following sample code assumes that the Hive table for DynamoDB is created with the products column, which is of type ARRAY<STRING >.  The productid and quantity columns are aggregated, grouping by customerid and orderid, and inserted into the products column with the CollectUDAF columns provided by Brickhouse.

hive> DROP TABLE purchase_ext_s3; 
--- To read data from S3 
hive> CREATE EXTERNAL TABLE purchase_ext_s3 (
customerid string,
orderid    string,
productid  string,
quantity   string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LOCATION 's3://<your bucket name>/purchase/';

Hive> drop table purchase_ext_dynamodb ; 
--- To connect to DynamoDB table  
Hive> CREATE EXTERNAL TABLE purchase_ext_dynamodb (
      customerid STRING, orderid STRING, products ARRAY<STRING>)
      STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
      TBLPROPERTIES ("dynamodb.table.name" = "purchase", 
      "dynamodb.column.mapping" = "customerid:customerid,orderid:orderid,products:products");

--- Batch-puts to DynamoDB using Brickhouse 
hive> add jar /<jar file path>/brickhouse-0.7.1-SNAPSHOT.jar ; 
hive> create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
hive> INSERT INTO purchase_ext_dynamodb 
select customerid as customerid , orderid as orderid
       ,collect(concat(productid,':' ,quantity)) as products
      from purchase_ext_s3
      group by customerid, orderid; 

Unfortunately, the MAP, LIST, BOOLEAN, and NULL data types are not supported by the  DynamoDBStorageHandler class, so the ARRAY<STRING> data type has been chosen. The products column of ARRAY<STRING> data type in Hive is matched to the StringSet type attribute in DynamoDB. The sample code mostly shows how Brickhouse works, and only for those who want to aggregate multiple records into one StringSet type attribute in DynamoDB.

Python MapReduce with Hadoop Streaming

A mapper task reads each record from the input data on S3, and maps input key-value pairs to intermediate key-value pairs. It divides source data from S3 into two parts (key part and value part) delimited by a TAB character (“\t”). Mapper data is sorted in order by their intermediate key (customerid and orderid) and sent to the reducer. Records are put into DynamoDB in the reducer step.

#!/usr/bin/env python
import sys
 
# get all lines from stdin
for line in sys.stdin:
    line = line.strip()
    cols = line.split(',')
# divide source data into Key and attribute part.
# example output : “cusotmer1,order1	product1,10”
    print '%s,%s\t%s,%s' % (cols[0],cols[1],cols[2],cols[3] )

Generally, the reduce task receives the output produced after map processing (which is key/list-of-values pairs) and then performs an operation on the list of values against each key.

In this case, the reducer is written in Python and is based on STDIN/STDOUT/hadoop streaming. The enumeration data type is not available. The reducer receives data sorted and ordered by the intermediate key set in the mapper, customerid and orderid (cols[0],cols[1]) in this case, and stores all attributes for the specific key in the item_data dictionary. The attributes in the item_data dictionary are put, or flushed, into DynamoDB every time a new intermediate key comes from sys.stdin.

#!/usr/bin/env python
import sys
import boto.dynamodb
 
# create connection to DynamoDB
current_keys = None
conn = boto.dynamodb.connect_to_region( '<region>', aws_access_key_id='<access key id>', aws_secret_access_key='<secret access key>')
table = conn.get_table('<dynamodb table name>')
item_data = {}

# input comes from STDIN emitted by Mapper
for line in sys.stdin:
    line = line.strip()
    dickeys, items  = line.split('\t')
    products = items.split(',')
    if current_keys == dickeys:
       item_data[products[0]]=products[1]  
    else:
        if current_keys:
          try:
              mykeys = current_keys.split(',') 
              item = table.new_item(hash_key=mykeys[0],range_key=mykeys[1], attrs=item_data )
              item.put() 
          except Exception ,e:
              print 'Exception occurred! :', e.message,'==> Data:' , mykeys
        item_data = {}
        item_data[products[0]]=products[1]
        current_keys = dickeys

# put last data
if current_keys == dickeys:
   print 'Last one:' , current_keys #, item_data
   try:
       mykeys = dickeys.split(',')
       item = table.new_item(hash_key=mykeys[0] , range_key=mykeys[1], attrs=item_data )
       item.put()
   except Exception ,e:
print 'Exception occurred! :', e.message, '==> Data:' , mykeys

To run the MapReduce job, connect to the EMR master node and run a Hadoop streaming job. The hadoop-streaming.jar file location or name could be different, depending on your EMR version. Exception messages that occur while reducers run are stored at the directory assigned as the –output option. Hash key and range key values are also logged to identify which data causes exceptions or errors.

$ hadoop fs -rm -r s3://<bucket name>/<output path>
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
           -input s3://<bucket name>/<input path> -output s3://<bucket name>/<output path>\
           -file /<local path>/mapper.py -mapper /<local path>/mapper.py \
           -file /<local path>/reducer.py -reducer /<local path>/reducer.py

In my migration experiment using the above scripts, with self-generated test data, I found the following results, including database size and the time taken to complete the migration.

Server MySQL instance m4.2xlarge
EMR cluster

master : 1 x m3.xlarge

core  : 2 x m4.4xlarge

DynamoDB 2000 write capacity unit
Data Number of records 1,000,000,000
Database file size (.ibc) 100.6 GB
CSV files size 37 GB
Performance (time) Export to CSV 6 min 10 sec
Upload to S3 (sync) 3 min 30 sec
Import to DynamoDB depending on write capacity unit

 

The following screenshot shows the performance results by write capacity.

Note that the performance result is flexible and can vary depending on the server capacity, network bandwidth, degree of parallelism, conversion logic, program language, and other conditions. All provisioned write capacity units are consumed by the MapReduce job for data import, so the more you increase the size of the EMR cluster and write capacity units of DynamoDB table, the less time it takes to complete. Java-based MapReduce code would be more flexible for function and MapReduce framework.

Step 3: Amazon Lambda function updates DynamoDB by reading data from Amazon Kinesis

In the Lambda console, choose Create a Lambda function and the kinesis-process-record-python blueprint. Next, in the Configure triggers page, select the stream that you just created.

The Lambda function must have an IAM role with permissions to read from Amazon Kinesis and put items into DynamoDB.

The Lambda function can recognize the transaction type of the record by looking up the type attribute. The transaction type determines the method for conversion and update.

For example, when a JSON record is passed to the function, the function looks up the type attribute. It also checks whether an existing item in the DynamoDB table has the same key with the incoming record. If so, the existing item must be retrieved and saved in a dictionary variable (item, in this case). Apply a new update information command to the item dictionary before it is put back into DynamoDB table. This prevents the existing item from being overwritten by the incoming record.

from __future__ import print_function

import base64
import json
import boto3

print('Loading function')
client = boto3.client('dynamodb')

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    for record in event['Records']:
        # Amazon Kinesis data is base64-encoded so decode here
        payload = base64.b64decode(record['kinesis']['data'])
        print("Decoded payload: " + payload)
        data = json.loads(payload)
        
        # user logic for data triggered by WriteRowsEvent
        if data["type"] == "WriteRowsEvent":
            my_table = data["table"]
            my_hashkey = data["row"]["values"]["customerid"]
            my_rangekey = data["row"]["values"]["orderid"]
            my_productid = data["row"]["values"]["productid"]
            my_quantity = str( data["row"]["values"]["quantity"] )
            try:
                response = client.get_item( Key={'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey}} ,TableName = my_table )
                if 'Item' in response:
                    item = response['Item']
                    item[data["row"]["values"]["productid"]] = {"S":my_quantity}
                    result1 = client.put_item(Item = item , TableName = my_table )
                else:
                    item = { 'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey} , my_productid :{"S":my_quantity}  }
                    result2 = client.put_item( Item = item , TableName = my_table )
            except Exception, e:
                print( 'WriteRowsEvent Exception ! :', e.message  , '==> Data:' ,data["row"]["values"]["customerid"]  , data["row"]["values"]["orderid"] )
        
        # user logic for data triggered by UpdateRowsEvent
        if data["type"] == "UpdateRowsEvent":
            my_table = data["table"]
            
        # user logic for data triggered by DeleteRowsEvent    
        if data["type"] == "DeleteRowsEvent":
            my_table = data["table"]
            
            
    return 'Successfully processed {} records.'.format(len(event['Records']))

Step 4:  Switch the application endpoint to DynamoDB

Application codes need to be refactored when you change from MySQL to DynamoDB. The following simple Java code snippets focus on the connection and query part because it is difficult to cover all cases for all applications. For more information, see Programming with DynamoDB and the AWS SDKs.

Query to MySQL

The following sample code shows a common way to connect to MySQL and retrieve data.

import java.sql.* ;
...
try {
    Connection conn =  DriverManager.getConnection("jdbc:mysql://<host name>/<database name>" , "<user>" , "<password>");
    stmt = conn.createStatement();
    String sql = "SELECT quantity as quantity FROM purchase WHERE customerid = '<customerid>' and orderid = '<orderid>' and productid = '<productid>'";
    ResultSet rs = stmt.executeQuery(sql);

    while(rs.next()){ 
       int quantity  = rs.getString("quantity");   //Retrieve by column name 
       System.out.print("quantity: " + quantity);  //Display values 
       }
} catch (SQLException ex) {
    // handle any errors
    System.out.println("SQLException: " + ex.getMessage());}
...
==== Output ====
quantity:1
Query to DynamoDB

To retrieve items from DynamoDB, follow these steps:

  1. Create an instance of the DynamoDB
  2. Create an instance of the Table
  3. Add the withHashKey and withRangeKeyCondition methods to an instance of the QuerySpec
  4. Execute the query method with the querySpec instance previously created. Items are retrieved as JSON format, so use the getJSON method to look up a specific attribute in an item.
...
DynamoDB dynamoDB = new DynamoDB( new AmazonDynamoDBClient(new ProfileCredentialsProvider()));

Table table = dynamoDB.getTable("purchase");

QuerySpec querySpec = new QuerySpec()
        .withHashKey("customerid" , "customer1")  // hashkey name and its value 
        .withRangeKeyCondition(new RangeKeyCondition("orderid").eq("order1") ) ; // Ranage key and its condition value 

ItemCollection<QueryOutcome> items = table.query(querySpec); 

Iterator<Item> iterator = items.iterator();          
while (iterator.hasNext()) {
Item item = iterator.next();
System.out.println(("quantity: " + item.getJSON("product1"));   // 
}
...
==== Output ====
quantity:1

Conclusion

In this post, I introduced two options for seamlessly migrating data from MySQL to DynamoDB and minimizing downtime during the migration. Option #1 used DMS, and option #2 combined EMR, Amazon Kinesis, and Lambda. I also showed you how to convert the key design in accordance with database characteristics to improve read/write performance and reduce costs. Each option has advantages and disadvantages, so the best option depends on your business requirements.

The sample code in this post is not enough for a complete, efficient, and reliable data migration code base to be reused across many different environments. Use it to get started, but design for other variables in your actual migration.

I hope this post helps you plan and implement your migration and minimizes service outages. If you have questions or suggestions, please leave a comment below.

Appendix

To install the Percona Toolkit:

# Install Percona Toolkit

$ wget https://www.percona.com/downloads/percona-toolkit/3.0.2/binary/redhat/6/x86_64/percona-toolkit-3.0.2-1.el6.x86_64.rpm

$ yum install perl-IO-Socket-SSL

$ yum install perl-TermReadKey

$ rpm -Uvh percona-toolkit-3.0.2-1.el6.x86_64.rpm

# run pt-archiver

Example command:

$ pt-archiver –source h=localhost,D=blog,t=purchase –file ‘/data/export/%Y-%m-%d-%D.%t’  –where “1=1” –limit 10000 –commit-each

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

 


Converging Data Silos to Amazon Redshift Using AWS DMS

 

AWS Hot Startups – April 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-april-2017/

Spring is here, the flowers are blooming and Tina Barr is back with more great startups for you to check out!

-Ana


Welcome back to another month of hot AWS-powered startups! Today we have three exciting startups:

  • Beekeeper – simplifying employee communication in the workplace.
  • Betterment – making investing easier for everyone.
  • ClearSlide – a leading sales engagement platform.

Be sure to check out our March hot startups in case you missed them.

Beekeeper (Zurich, Switzerland)
Beekeeper logoFlavio Pfaffhauser and Christian Grossmann, both graduates of ETH Zurich, were passionate about building a technology that would connect and bring people together. What started as a student’s social community soon turned into Beekeeper – a communication platform for the workplace that allows employees to interact wherever they are. As Flavio and Christian learned how to build a social platform that engaged people properly, businesses began requesting a platform that could be adapted to their specific processes and needs. The platform started with the concept of helping people feel as if they are sitting right next to each other, whether they’re at a desk or in the field. Founded in 2012, Beekeeper is focused on improving information sharing, communication and peer collaboration, and the company strongly believes that listening to employees is crucial for organizations.

The “Mobile First, Desktop Friendly” platform has a simple and intuitive interface that easily integrates multiple operating systems into one ecosystem. The interface can be styled and customized to match a company’s brand and identity. Employees can connect with their colleagues anytime and anywhere with private and group chats, video and file sharing, and feedback surveys. With Beekeeper’s analytical dashboard leadership teams can identify trending topics of discussion and track employee engagement and app usage in real-time. Beekeeper is currently connecting users in 137 countries across industries including hospitality, construction, transportation, and more.

Beekeeper likes using AWS because it allows their engineers to focus on the things that really matter; solving customer issues. The company builds its infrastructure using services like Amazon EC2, Amazon S3, and Amazon RDS, all of which allow the technical teams to offload administrative tasks. Amazon Elastic Transcoder and Amazon QuickSight are used to build analytical dashboards and Amazon Redshift for data warehousing.

Check out the Beekeeper blog to keep up with their latest news!

Betterment (New York, NY)
Betterment logo
Betterment is on a mission to make investing easier and more accessible for everyone, no matter their financial goal. In 2008, Jon Stein founded Betterment with the intent to reinvent the industry and save future investors from making the same common mistakes he had been making. At that time, most people only had a couple of options when it came to investing their money – either do it yourself or hire another person to do it for you. Unfortunately, financial advisors are sometimes paid to recommend certain investments even if it’s not what is best for their clients. Betterment only chooses investments that are in their customer’s best interest and align with their financial goals. Today, they are the largest, independent online investment advisor managing more than $8 billion in assets for over 240,000 customers.

Betterment uses technology to make investing easier and more efficient, while also helping to increase after-tax returns. They offer a wide range of financial planning services that are personalized to their customer’s life goals. To start an investment plan, customers can input their age, retirement status, and annual income and Betterment will recommend how much money to invest and which type of account is the right choice. They will invest and manage it in a way that many traditional investment services can’t at a lower cost.

The engineers at Betterment are constantly working to build industry-changing technology as quickly as possible to help customers maximize their money. AWS gives Betterment the flexibility to easily provision infrastructure and offload functions to various services that once required entire teams to manage. When they first started in the cloud, Betterment was using standard implementations of Amazon EC2, Amazon RDS, and Amazon S3. Since they’ve gone all in with AWS, they have been leveraging services like Amazon Redshift, AWS Lambda, AWS Database Migration Service, Amazon Kinesis, Amazon DynamoDB, and more. Today, they are using over 20 AWS services to develop, test, and deploy features and enhancements on a daily basis.

Learn more about Betterment here.

ClearSlide (San Francisco, CA)
ClearSlide is one of today’s leading sales engagement platforms, offering a complete and integrated tool that makes every customer interaction successful. Since their founding in 2009, ClearSlide has looked for ways to improve customer experiences and have developed numerous enablement tools for sales leaders and teams, marketing, customer support teams, and more. The platform puts content, communication channels, and insights at their customer’s fingertips to help drive better decisions and manage opportunities. ClearSlide serves thousands of companies including Comcast, the Sacramento Kings, The Economist, and so far their customers have generated over 750 million minutes of engagement!

ClearSlide offers a solution for all parts of the sales process. For sales leaders, ClearSlide provides engagement dashboards to improve deal visibility, coaching, and sales forecast accuracy. For marketing and sales enablement teams, they guide sellers to the right content, at the right time, in the right context, and provide insight to maximize content ROI. For sales reps, ClearSlide integrates communications, content, and analytics in a single platform experience. Communications can be made across email, in-person or online meetings, web, or social. Today, ClearSlide customers report a 10-20% increase in closed deals, 25% decrease in onboarding time for new reps, and a 50-80% reduction in selling costs.

ClearSlide uses a range of AWS services, but Amazon EC2 and Amazon RDS have made the biggest impact on their business. EC2 enables them to easily scale compute capacity, which is critical for a fast-growing startup. It also provides consistency during deployment – from development and integration to staging and production. RDS reduces overhead and allows ClearSlide to scale their database infrastructure. Since AWS takes care of time-consuming database management tasks, ClearSlide sees a reduction in operations costs and can focus on being more strategic with their customers.

Watch this video to learn how LiveIntent reduced sales cycles by 22% using ClearSlide. Get all the latest updates by following them on Twitter!

Thanks for checking out another month of awesome AWS-powered startups!

-Tina

 

AWS Week in Review – March 6, 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-6-2017/

This edition includes all of our announcements, content from all of our blogs, and as much community-generated AWS content as I had time for!

Monday

March 6

Tuesday

March 7

Wednesday

March 8

Thursday

March 9

Friday

March 10

Saturday

March 11

Sunday

March 12

Jeff;

 

AWS Database Migration Service – 20,000 Migrations and Counting

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-database-migration-service-20000-migrations-and-counting/

I first wrote about AWS Database Migration Service just about a year ago in my AWS Database Migration Service post. At that time I noted that over 1,000 AWS customers had already made use of the service as part of their move to AWS.

As a quick recap, AWS Database Migration Service and Schema Conversion Tool (SCT) help our customers migrate their relational data from expensive, proprietary databases and data warehouses (either on premises on in the cloud, but with restrictive licensing terms either way) to more cost-effective cloud-based databases and data warehouses such as Amazon Aurora, Amazon Redshift, MySQL, MariaDB, and PostgreSQL, with minimal downtime along the way. Our customers tell us that they love the flexibility and the cost-effective nature of these moves. For example, moving to Amazon Aurora gives them access to a database that is MySQL and PostgreSQL compatible, at 1/10th the cost of a commercial database. Take a peek at our AWS Database Migration Services Customer Testimonials to see how Expedia, Thomas Publishing, Pega, and Veoci have made use of the service.

20,000 Unique Migrations
I’m pleased to be able to announce that our customers have already used AWS Database Migration Service to migrate 20,000 unique databases to AWS and that the pace continues to accelerate (we reached 10,000 migrations in September of 2016).

We’ve added many new features to DMS and SCT over the past year. Here’s a summary:

Learn More
Here are some resources that will help you to learn more and to get your own migrations underway, starting with some recent webinars:

Migrating From Sharded to Scale-Up – Some of our customers implemented a scale-out strategy in order to deal with their relational workload, sharding their database across multiple independent pieces, each running on a separate host. As part of their migration, these customers often consolidate two or more shards onto a single Aurora instance, reducing complexity, increasing reliability, and saving money along the way. If you’d like to do this, check out the blog post, webinar recording, and presentation.

Migrating From Oracle or SQL Server to Aurora – Other customers migrate from commercial databases such as Oracle or SQL Server to Aurora. If you would like to do this, check out this presentation and the accompanying webinar recording.

We also have plenty of helpful blog posts on the AWS Database Blog:

  • Reduce Resource Consumption by Consolidating Your Sharded System into Aurora – “You might, in fact, save bunches of money by consolidating your sharded system into a single Aurora instance or fewer shards running on Aurora. That is exactly what this blog post is all about.”
  • How to Migrate Your Oracle Database to Amazon Aurora – “This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) to facilitate and simplify migrating your commercial database to Amazon Aurora. In this case, we focus on migrating from Oracle to the MySQL-compatible Amazon Aurora.”
  • Cross-Engine Database Replication Using AWS Schema Conversion Tool and AWS Database Migration Service – “AWS SCT makes heterogeneous database migrations easier by automatically converting source database schema. AWS SCT also converts the majority of custom code, including views and functions, to a format compatible with the target database.”
  • Database Migration—What Do You Need to Know Before You Start? – “Congratulations! You have convinced your boss or the CIO to move your database to the cloud. Or you are the boss, CIO, or both, and you finally decided to jump on the bandwagon. What you’re trying to do is move your application to the new environment, platform, or technology (aka application modernization), because usually people don’t move databases for fun.”
  • How to Script a Database Migration – “You can use the AWS DMS console or the AWS CLI or the AWS SDK to perform the database migration. In this blog post, I will focus on performing the migration with the AWS CLI.”

The documentation includes five helpful walkthroughs:

There’s also a hands-on lab (you will need to register in order to participate).

See You at a Summit
The DMS team is planning to attend and present at many of our upcoming AWS Summits and would welcome the opportunity to discuss your database migration requirements in person.

Jeff;

 

 

Amazon RDS – 2016 in Review

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-rds-2016-in-review/

Even though we published 294 posts on this blog last year, I left out quite a number of worthwhile launches! Today I would like to focus on Amazon Relational Database Service (RDS) and recap all of the progress that the teams behind this family of services made in 2016. The team focused on four major areas last year:

  • High Availability
  • Enhanced Monitoring
  • Simplified Security
  • Database Engine Updates

Let’s take a look at each of these areas…

High Availability
Relational databases are at the heart of many types of applications. In order to allow our customers to build applications that are highly available, RDS has offered multi-AZ support since early 2010 (read Amazon RDS – Multi-AZ Deployments For Enhanced Availability & Reliability for more info). Instead of spending weeks setting up multiple instances, arranging for replication, writing scripts to detect network, instance, & network issues, making failover decisions, and bringing a new secondary instance online, you simply opt for Multi-AZ Deployment when you create the Database Instance. RDS also makes it easy for you to create cross-region read replicas.

Here are some of the other enhancements that we made in 2016 in order to help you to achieve high availability:

Enhanced Monitoring
We announced the first big step toward enhanced monitoring at the end of 2015 (New – Enhanced Monitoring for Amazon RDS) with support for MySQL, MariaDB, and Amazon Aurora and then made additional announcements in 2016:

Simplified Security
We want to make it as easy and simple as possible for you to use encryption to protect your data, whether it is at rest or in motion. Here are the enhancements that we made in this area last year:

Database Engine Updates
The open source community and the vendors of commercial databases add features and produce new releases at a rapid pace and we track their work very closely, aiming to update RDS as quickly as possible after each significant release. Here’s what we did in 2016:

Stay Tuned
We’ve already made some big announcements this year (you can find them in the AWS What’s New for 2017) with plenty more in store including the recently announced PostgreSQL-compatible version of Aurora, so stay tuned! You may also want to subscribe to the AWS Database Blog for detailed posts that will show you how to get the most from RDS, Amazon Aurora, and Amazon ElastiCache.

Jeff;

PS – This post does not include all of the enhancements that we made to AWS Database Migration Service or the Schema Conversion Tool last year. I’m working on another post on that topic.

AWS Online Tech Talks – February 2017

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-blog-february-2017-online-techtalks-series/

The New Year is underway, so there is no better time to dive into learning more about the latest AWS services. Each month, we have a series of webinars targeting best practices and new service features in AWS Cloud.

 

February Online Tech Talks (formerly known as Monthly Webinar Series)

I am excited to share the webinars schedule for the month of February. Remember all webinars noted are free, but they may fill up quickly so be sure to register ahead of time. Webinars are typically one hour in length and scheduled times are in Pacific Time (PT) time zone.

 

Webinars featured this month are as follows:

Tuesday, February 14

Mobile

10:30 AM – 11:30 AM: Test your Android App with Espresso and AWS Device Farm

 

Wednesday, February 15

Big Data

9:00 AM – 10:00 AM: Amazon Elasticsearch Service with Elasticsearch 5 and Kibana 5

Mobile

12:00 Noon – 1:00 PM: Deep Dive on AWS Mobile Hub for Enterprise Mobile Applications

 

Thursday, February 16

Security

9:00 AM – 10:00 AM: DNS DDoS mitigation using Amazon Route 53 and AWS Shield

 

Tuesday, February 21

Storage

9:00 AM – 10:00 AM: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS

Databases

10:30 AM – 11:30 AM: Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Service

IoT

12:00 Noon – 1:00 PM: Getting Started with AWS IoT

 

Wednesday, February 22

IoT

10:30 AM – 11:30 AM: Best Practices with IoT Security

Databases

12:00 Noon – 1:00 PM: Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migration Service

 

Wednesday, February 23

Enterprise

8:00 AM – 9:00 AM: How to Prepare for AWS Certification and Advance your Career

Storage

10:30 AM – 11:30 AM: Deep Dive on Elastic File System

12:00 Noon – 1:00 PM: Optimize MySQL Workloads with Amazon Elastic Block Store

 

Wednesday, February 24

Big Data

9:00 AM – 10:00 AM: Deep Dive of Flink & Spark on Amazon EMR

10:30 AM – 11:30 AM: Deep Dive on Amazon Redshift

 

The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These technical sessions are led by AWS solutions architects and engineers and feature live demonstrations & customer examples. You can check out the AWS online series here and the AWS on-demand webinar series on the AWS YouTube channel.