Tag Archives: Amazon RDS

Enabling job accounting for HPC with AWS ParallelCluster and Amazon RDS

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/enabling-job-accounting-for-hpc-with-aws-parallelcluster-and-amazon-rds/

This post is written by Nicola Venuti, HPC Specialist SA, and contributed to by Rex Chen, Software Development Engineer.

Introduction

Accounting, reporting, and advanced analytics used for data-driven planning and decision making are key areas of focus for High Performance Computing (HPC) Administrators. In the cloud, these areas are more relevant to the costs of the services, which directly impact budgeting and forecasting of expenses. With the growth of new HPC services that perform analyses and corrective actions, you can better optimize for performance, which reduces cost.

Solution Overview

In this blog post, we walk through an easy way to collect accounting information for evert job and step executed in a cluster with job scheduling. This post uses a new feature in the latest version (2.6.0) of AWS ParallelCluster, which makes this process easier than before, and Slurm.  Accounting records are saved into a relational database for both currently executing jobs and jobs, which have already terminated.

Prerequisites

This tutorial assumes you already have a cluster in AWS ParallelCluster. If  you don’t, refer to the AWS ParallelCluster documentation, a getting started blog post, or a how-to blog post.

Solution

Choose your architecture

There are two common architectures to save job accounting information into a database:

  1. Installing and directly managing a DBMS in the master node of your cluster (or in an additional EC2 instance dedicated to it)
  2. Using a fully managed service like Amazon Relational Database Service (RDS)

While the first option might appear to be the most economical solution, it requires heavy lifting. You must install and manage the database, which is not a core part of running your HPC workloads.  Alternatively, Amazon RDS reduces this burden of installing updates, managing security patches, and properly allocating resources.  Additionally, Amazon RDS Free Tier can get you started with a managed database service in the cloud for free. Refer to the hyperlink for a list of free resources.

Amazon RDS is my preferred choice, and the following sections implement this architecture. Bear in mind, however, that the settings and the customizations required in the AWS ParallelCluster environment are the same, regardless of which architecture you prefer.

 

Set up and configure your database

Now, with your architecture determined, let’s configure it.  First, go to Amazon RDS’ console.  Select the same Region where your AWS ParallelCluster is deployed, then click on Create Database.

There are two database instances to consider: Amazon Aurora and MySQL.

Amazon Aurora has many benefits compared to MySQL. However, in this blog post, I use MySQL to show how to build your HPC accounting database within the Free-tier offering.

The following steps are the same regardless of your database choice. So, if you’re interested in one of the many features that differentiate Amazon Aurora from MySQL, feel free to use. Check out Amazon Aurora’s landing page to learn more about its benefits, such as its faster performance and cost effectiveness.

To configure your database, you must complete the following steps:

  1. Name the database
  2. Establish credential settings
  3. Select the DB instance size
  4. Identify storage type
  5. Allocate amount of storage

The following images show the settings that I chose for storage options and the “Free tier” template.  Feel free to change it accordingly to the scope and the usage you expect.

Make sure you select the corresponding VPC to wherever your “compute fleet” is deployed by AWS ParallelCluster, and wherever the Security Group of your compute fleet is selected.  You can access information for your “compute fleet” in your AWS ParallelCluster config file. The Security Group should look something like this: “parallelcluster-XXX-ComputeSecurityGroup-XYZ”.

At this stage, you can click on Create database and wait until the Database status moves from the Creating to Available in the Amazon RDS Dashboard.

The last step for this section is to grant privileges on the database.

  1. Connect to your database. Use the master node of your AWS ParallelCluster as a client.
  2. Install the MySQL client by running sudo yum install mysql on AmazonLinux and CentOS and sudo apt-get install mysql-client on Ubuntu.
  3. Connect to your MySQL RDS database using the following code: mysql --host=<your_rds_endpoint> --port=3306 -u admin -p The following screenshot shows how to find your RDS endpoint and port.

 

4. Run GRANT ALL ON `%`.* TO [email protected]`%`; to grant the required privileges.

The following code demonstrates these steps together:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 251
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> GRANT ALL ON `%`.* TO [email protected]`%`;

Note: typically this command is run as GRANT ALL ON *.* TO 'admin'@'%';   With Amazon RDS , for security reasons, this is not possible as the master account does not have access to the MySQL database. Using *.*  triggers an error. To work around this, I use the _ and % wildcards that are permitted. To look at the actual grants, you can run the following: SHOW GRANTS;

Enable Slurm Database logging

Now, your database is fully configured. The next step is to enable Slurm as a workload manager.

A few steps must occur to let Slurm log its job accounting information on an external database. The following code demonstrates the steps you must make.

  1. Add the DB configuration file, slurmdbd.conf after /opt/slurm/etc/ 
  2. Slurm’s slurm.conf file requires a few modifications. These changes are noted after the following code examples.

 

Note: You do not need to configure each and every compute node because AWS ParallelCluster installs Slurm in a shared directory. All of these nodes share this directory, and, thus the same configuration files with the master node of your cluster.

Below, you can find two example configuration files that you can use just by modifying a few parameters accordingly to your setup.

For more information about all the possible settings of configuration parameters, please refer to the official Slurm documentation, and in particular to the accounting section.

Add the DB configuration file

#
## Sample /opt/slurm/etc/slurmdbd.conf
#
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no
AuthType=auth/munge
DbdHost=ip-10-0-16-243  #YOUR_MASTER_IP_ADDRESS_OR_NAME
DbdPort=6819
DebugLevel=info
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
SlurmUser=slurm
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageUser=admin
StoragePass=password
StorageHost=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com # Endpoint from RDS console
StoragePort=3306                                                                # Port from RDS console

See below for key values that you should plug into the example configuration file:

  • DbdHost: the name of the machine where the Slurm Database Daemon is executed. This is typically the master node of your AWS ParallelCluster. You can run hostname -s on your master node to get this value.
  • DbdPort: The port number that the Slurm Database Daemon (slurmdbd) listens to for work. 6819 is the default value.
  • StorageUser: Define the user name used to connect to the database. This has been defined during the Amazon RDS configuration as shown in the second step of the previous section.
  • StoragePass: Define the password used to gain access to the database. Defined as the user name during the Amazon RDS configuration.
  • StorageHost: Define the name of the host running the database. You can find this value in the Amazon RDS console, under “Connectivity & security”.
  • StoragePort: Define the port on which the database is listening. You can find this value in the Amazon RDS console, under “Connectivity & security”. (see the screenshot below for more information).

Modify the file

Add the following lines at the end of the slurm configuration file:

#
## /opt/slurm/etc/slurm.conf
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
#
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=ip-10-0-16-243
AccountingStorageUser=admin
AccountingStoragePort=6819

Modify the following:

  • AccountingStorageHost: The hostname or address of the host where SlurmDBD executes. In our case this is again the master node of our AWS ParallelCluster, you can get this value by running hostname -s again.
  • AccountingStoragePort: The network port that SlurmDBD accepts communication on. It must be the same as DbdPort specified in /opt/slurm/etc/slurmdbd.conf
  • AccountingStorageUser: it must be the same as in /opt/slurm/etc/slurmdbd.conf (specified in the “Credential Settings” of your Amazon RDS database).

 

Restart the Slurm service and start the SlurmDB demon on the master node

 

Depending on the operating system you are running, this would look like:

  • Amazon Linux / Amazon Linux 2
[[email protected]]$ sudo /etc/init.d/slurm restart                                                                                                                                                                                                                                             
stopping slurmctld:                                        [  OK  ]
slurmctld is stopped
slurmctld is stopped
starting slurmctld:                                        [  OK  ]
[[email protected]]$ 
[[email protected]]$ sudo /opt/slurm/sbin/slurmdbd
  • CentOS7 and Ubuntu 16/18
[[email protected] ~]$ sudo /opt/slurm/sbin/slurmdbd
[[email protected] ~]$ sudo systemctl restart slurmctld

Note: even if you have jobs running, restarting the daemons will not affect them.

Check to see if your cluster is already in the Slurm Database:

/opt/slurm/bin/sacctmgr list cluster

And if it is not (see below):

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
[[email protected]]$ 

You can add it as follows:

sudo /opt/slurm/bin/sacctmgr add cluster parallelcluster

You should now see something like the following:

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+     10.0.16.243         6817  8704         1                                                                                           normal           
[[email protected]]$ 

At this stage, you should be all set with your AWS ParallelCluster accounting configured to be stored in the Amazon RDS database.

Replicate the process on multiple clusters

The same database instance can be easily used for multiple clusters to log its accounting data in. To do this, repeat the last configuration step for your clusters built using AWS ParallelCluster that you want to share the same database.

The additional steps to follow are:

  • Ensure that all the clusters are in the same VPC (or, if you prefer to use multiple VPCs, you can choose to set up VPC-Peering)
  • Add the SecurityGroup of your new compute fleets (“parallelcluster-XXX-ComputeSecurityGroup-XYZ”) to your RDS database
  • Change the cluster name parameter at the very top of the file. This is in addition to the slurm configuration file ( /opt/slurm/etc/slurm.conf) editing explained prior.  By default, your cluster is called “parallelcluster.” You may want to change that to clearly identify other clusters using the same database. For instance: ClusterName=parallelcluster2

Once these additional steps are complete, you can run /opt/slurm/bin/sacctmgr list cluster  again. Now, you should see two (or multiple) clusters:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+                            0  8704         1                                                                                           normal           
parallelc+     10.0.16.129         6817  8704         1                                                                                           normal           
[[email protected] ]#   

If you want to see the full name of your clusters, run the following:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster format=cluster%30
                       Cluster 
------------------------------ 
               parallelcluster 
              parallelcluster2 
[[email protected] ]# 

Note: If you check the Slurm logs (under /var/log/slurm*), you may see this error:

error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout

This error refers to default parameters that Amazon RDS sets for you on your MySQL database. You can change them by setting new “group parameters” as explained in the official documentation and in this support article. Please also note that the innodb_buffer_pool_size is related to the amount of memory available on your instance, so you may want to use a different instance type with higher memory to avoid this warning.

Run your first job and check the accounting

Now that the application is installed and configured, you can test it! Submit a job to Slurm, query your database, and check your job accounting information.

If you are using a brand new cluster, test it with a simple hostname job as follows:

[[email protected]]$ sbatch -N2 <<EOF
> #!/bin/sh
> srun hostname |sort
> srun sleep 10
> EOF
Submitted batch job 31
[[email protected]]$

Immediately after you have submitted the job, you should see it with a state of “pending”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2    PENDING      0:0 
[[email protected]]$ 

And, after a while the job should be “completed”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
38.batch           batch                                1  COMPLETED      0:0 
38.0           hostname                                2  COMPLETED      0:0 
[[email protected]]$

Now that you know your cluster works, you can build complex queries using sacct. See few examples below, and refer to the official documentation for more details:

[[email protected]]$ sacct --format=jobid,elapsed,ncpus,ntasks,state
       JobID    Elapsed      NCPUS   NTasks      State 
------------ ---------- ---------- -------- ---------- 
38             00:00:00          2           COMPLETED 
38.batch       00:00:00          1        1  COMPLETED 
38.0           00:00:00          2        2  COMPLETED 
39             00:00:00          2           COMPLETED 
39.batch       00:00:00          1        1  COMPLETED 
39.0           00:00:00          2        2  COMPLETED 
40             00:00:10          2           COMPLETED 
40.batch       00:00:10          1        1  COMPLETED 
40.0           00:00:00          2        2  COMPLETED 
40.1           00:00:10          2        2  COMPLETED 
[[email protected]]$ sacct --allocations
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
39               sbatch    compute                     2  COMPLETED      0:0 
40               sbatch    compute                     2  COMPLETED      0:0 
[[email protected]]$ sacct -S2020-02-17 -E2020-02-20 -X -ojobid,start,end,state
       JobID               Start                 End      State 
------------ ------------------- ------------------- ---------- 
38           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
39           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
40           2020-02-17T12:27:59 2020-02-17T12:28:09  COMPLETED 
[[email protected]]$ 

If you have configured your cluster(s) for multiple users, you may want to look at the accounting info for all of these.  If you want to configure your clusters with multiple users, follow this blog post. It demonstrates how to configure AWS ParallelCluster with AWS Directory Services to create a multiuser, POSIX-compliant system with centralized authentication.

Each and every user can only look at his own accounting data. However, Slurm admins (or root) can see accounting info for every user. The following code shows accounting data coming from two clusters (parallelcluster and parallelcluster2) and from two users (ec2-user and nicola):

[[email protected] ~]# sacct -S 2020-01-01 --clusters=parallelcluster,parallelcluster2 --format=jobid,elapsed,ncpus,ntasks,state,user,cluster%20                                                                                                                                                        
       JobID    Elapsed      NCPUS   NTasks      State      User              Cluster 
------------ ---------- ---------- -------- ---------- --------- -------------------- 
36             00:00:00          2              FAILED  ec2-user      parallelcluster 
36.batch       00:00:00          1        1     FAILED                parallelcluster 
37             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
37.batch       00:00:00          1        1  COMPLETED                parallelcluster 
37.0           00:00:00          2        2  COMPLETED                parallelcluster 
38             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
38.batch       00:00:00          1        1  COMPLETED                parallelcluster 
38.0           00:00:00          2        2  COMPLETED                parallelcluster 
39             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
39.batch       00:00:00          1        1  COMPLETED                parallelcluster 
39.0           00:00:00          2        2  COMPLETED                parallelcluster 
40             00:00:10          2           COMPLETED  ec2-user      parallelcluster 
40.batch       00:00:10          1        1  COMPLETED                parallelcluster 
40.0           00:00:00          2        2  COMPLETED                parallelcluster 
40.1           00:00:10          2        2  COMPLETED                parallelcluster 
41             00:00:29        144              FAILED  ec2-user      parallelcluster 
41.batch       00:00:29         36        1     FAILED                parallelcluster 
41.0           00:00:00        144      144  COMPLETED                parallelcluster 
41.1           00:00:01        144      144  COMPLETED                parallelcluster 
41.2           00:00:00          3        3  COMPLETED                parallelcluster 
41.3           00:00:00          3        3  COMPLETED                parallelcluster 
42             01:22:03        144           COMPLETED  ec2-user      parallelcluster 
42.batch       01:22:03         36        1  COMPLETED                parallelcluster 
42.0           00:00:01        144      144  COMPLETED                parallelcluster 
42.1           00:00:00        144      144  COMPLETED                parallelcluster 
42.2           00:00:39          3        3  COMPLETED                parallelcluster 
42.3           00:34:55          3        3  COMPLETED                parallelcluster 
43             00:00:11          2           COMPLETED  ec2-user      parallelcluster 
43.batch       00:00:11          1        1  COMPLETED                parallelcluster 
43.0           00:00:01          2        2  COMPLETED                parallelcluster 
43.1           00:00:10          2        2  COMPLETED                parallelcluster 
44             00:00:11          2           COMPLETED    nicola      parallelcluster 
44.batch       00:00:11          1        1  COMPLETED                parallelcluster 
44.0           00:00:01          2        2  COMPLETED                parallelcluster 
44.1           00:00:10          2        2  COMPLETED                parallelcluster 
4              00:00:10          2           COMPLETED    nicola     parallelcluster2 
4.batch        00:00:10          1        1  COMPLETED               parallelcluster2 
4.0            00:00:00          2        2  COMPLETED               parallelcluster2 
4.1            00:00:10          2        2  COMPLETED               parallelcluster2 
[[email protected] ~]# 

You can also directly query your database, and look at the accounting information stored in it or link your preferred BI tool to get insights from your HPC cluster. To do so, run the following code:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| slurm_acct_db      |
+--------------------+
4 rows in set (0.00 sec)

mysql> use slurm_acct_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------------------------------+
| Tables_in_slurm_acct_db                 |
+-----------------------------------------+
| acct_coord_table                        |
| acct_table                              |
| clus_res_table                          |
| cluster_table                           |
| convert_version_table                   |
| federation_table                        |
| parallelcluster_assoc_table             |
| parallelcluster_assoc_usage_day_table   |
| parallelcluster_assoc_usage_hour_table  |
| parallelcluster_assoc_usage_month_table |
| parallelcluster_event_table             |
| parallelcluster_job_table               |
| parallelcluster_last_ran_table          |
| parallelcluster_resv_table              |
| parallelcluster_step_table              |
| parallelcluster_suspend_table           |
| parallelcluster_usage_day_table         |
| parallelcluster_usage_hour_table        |
| parallelcluster_usage_month_table       |
| parallelcluster_wckey_table             |
| parallelcluster_wckey_usage_day_table   |
| parallelcluster_wckey_usage_hour_table  |
| parallelcluster_wckey_usage_month_table |
| qos_table                               |
| res_table                               |
| table_defs_table                        |
| tres_table                              |
| txn_table                               |
| user_table                              |
+-----------------------------------------+
29 rows in set (0.00 sec)

mysql>

Conclusion

You’re finally all set! In this blog post you set up a database using Amazon RDS, configured AWS ParallelCluster and Slurm to enable job accounting with your database, and learned how to query your job accounting history from your database using the sacct command or by running SQL queries.

Deriving insights for your HPC workloads doesn’t end when your workloads finish running. Now, you can better understand and optimize your usage patterns and generate ideas about how to wring more price-performance out of your HPC clusters on AWS. For retrospective analysis, you can easily understand whether specific jobs, projects, or users are responsible for driving your HPC usage on AWS.  For forward-looking analysis, you can better forecast future usage to set budgets with appropriate insight into your costs and your resource consumption.

You can also use these accounting systems to identify users who may require additional training on how to make the most of cloud resources on AWS. Finally, together with your spending patterns, you can better capture and explain the return on investment from all of the valuable HPC work you do. And, this gives you the raw data to analyze how you can get even more value and price-performance out of the work you’re doing on AWS.

 

 

 

 

Amazon Lightsail Database Tips and Tricks

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/amazon-lightsail-database-tips-and-tricks/

This post is contributed by Mike Coleman | Developer Advocate for Lightsail | Twitter: @mikegcoleman

Managed Databases on Amazon Lightsail are affordably priced, and incredibly easy to run. Lightsail databases offer a solid foundation on which to build your application.  You can leverage attractive features like one-click high availability, automatic backups, and a choice of database engines to support your Lightsail apps.

While it’s super simple to do an initial deployment on Amazon Lightsail, I often get questions about how to perform some standard management tasks. Some examples of these tasks are scaling up a database or accessing that database with command line tools. I am also asked how to handle a scenario when you find that you need some of the advanced features found in Amazon Relation Database Service (RDS).

This blog answers these questions and offers general guidance on how to address these issues.

Scale Up Your Database

When I first deploy resources to the cloud, I always choose the least expensive option. Often times, that choice works out and everything runs fine. But sometimes, this results in under sizing resources, which necessitates a move to resources with more horsepower.

If this happens with your Lightsail databases, it’s straightforward to move your database to a larger size. Additionally, you can check the metrics page in the Amazon Lightsail console to see your database performance, and to determine if you need to upgrade.

Let’s walk through how to size up your database.

Start by creating a snapshot of your instance.

  1. Navigate to the Lightsail home page and click databases
  2. Click on the name of your database
  3. From the horizontal menu, click on Snapshots & restoreScreenshot of the snapshot and restore choice
  4. Under Manual Snapshot click + Create snapshotscreenshot of where to hit create snapshot
  5. Give the snapshot a name
  6. Click Create

It takes several minutes for the snapshot creation process to complete. Once the snapshot is available, you can create your new database instance choosing a larger size.

  1. Click the three-dot menu to the right of the snapshot you just created
  2. Choose Create new database
  3. Under Choose your database plan, select either a Standard or High Availability If you’re running a mission critical application, you definitely want to choose the high availability option. Standard is great for test environments or workloads where your application can withstand downtime in the event of a database failure.
  4. Choose the size for your new database instance
  5. Give your database instance a name
  6. Click Create database

The new database is created after several minutes.

Lightsail generates a new password when you create a new database from a snapshot. You can either use this newly generated password, or change it. You can change the password using the following steps:

  1. From the Lightsail home, page click Databases
  2. Scroll down to the Connection details section
  3. If you want to use the auto-generated password, click Show in the password box to display the password
    Otherwise complete steps 4 and 5 to specify a new password.
  4. Under Password, click Change password
  5. Enter a new password and click Save
    It will take a few minutes for the password to update

Now, go into your application. Configure the application to point the new database using the new endpoint, user name, and password values.

Note: It’s out of the scope for this blog to cover how to configure individual applications. Consult your application documentation to see how to do it for your specific application.

Command Line Access

There may be times when you need to work on your database using command line tools. You cannot connect directly to your Lightsail database instance. But, you can access the database remotely from another Lightsail instance.

You can also make your instance accessible via the public internet, and access it remotely from any internet-connected computer. However, I wouldn’t recommend this from a security perspective.

You first must create a new Lightsail instance to get started accessing your Lightsail database via the command line. I recommend basing your instance on Lightsail’s LAMP blueprint because there are MySQL command line tools already installed.

To create a new LAMP instance, do the following:

  1. From the Lightsail home page, click Create Instance
  2. Make sure you create the instance in the same Region as your Lightsail databaseinstance location image
  3. Under Select a blueprint, choose LAMP (PHP 7)blueprint selection
  4. Since you’re only using this instance to run MySQL command line tools, you can choose the smallest instance size
  5. Give your instance a name
  6. Click Create Instance

It takes a few minutes for your new instance to start up.

To check that everything is working correctly, use the MySQL command line interface.

Make sure you have the database user name, password, and endpoint. These can be found by clicking on the name of your database under the Connection details section.

  1. Use either your own SSH client or the built-sin web client to access the Lightsail instance you just created
  2. On the command line, enter the following command substituting the values for your database
mysql \
--host <lightsail database endpoint> \
--user <lightsail database username> \
--password

For example:

mysql \
--host ls-randomchars.us-east-2.rds.amazonaws.com \
--user dbmasteruser \
--password

Notice that you don’t actually put the password on the command line.

3. When prompted enter the password (note that the password will not show up when you enter it)

4. You should now be at the MySQL command prompt

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 87482
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

From here, you can use the command line as you normally would.

Migrating From a Managed Database to Amazon RDS

One of the great things about Lightsail is that it’s easy to get started quickly. It also gives you an easy migration path to more advanced AWS services, should you ever need them. For instance, you might se tup your database on Lightsail, and then realize that it could benefit from read replicas to handle growing traffic. Fortunately, it’s a pretty straightforward process to migrate your data from Lightsail to RDS.

 

Deploy an Amazon RDS database

First, make sure you have an RDS database running the same engine in the same Region as your Lightsail instance, and in your default Amazon VPC. For example, if your Lightsail database is running MySQL in the Oregon Region, RDS should also be running MySQL in the Oregon Region and in the default VPC. If you’re not sure how to create an RDS database, check out their documentation.

Make sure to note the username and password for your new database.

Create a Lightsail Instance

You also need a Lightsail instance with the MySQL command line tools installed. You can set one up by following the instructions in the previous section of this blog.

Enable VPC Peering

To get started, ensure that the Lightsail VPC can communicate. You do this by enabling VPC peering in Lightsail, and modifying the security group for RDS to allow traffic from the Lightsail VPC.

  1. Return to the Lightsail console home page and click Account in the top-right corner. Choose Account from the pop out menu.
  2. Click Advanced on the horizontal menu
  3. Under VPC peering, ensure that the Enable VPC peering box is checked for the region where your database is deployed.
    enable vpc peering screenshot

Adjust the RDS database security group

The next step is to edit the security group for the RDS instance to allow traffic from the Lightsail subnet.

  1. Return to the RDS console home page
  2. Under Resourcesclick on DB Instances
  3. Click on the name of the database you want to migrate data into
  4. Under Connectivity and securityclick on the security group nameconnectivity and security configuration

The security group dialog appears. From here you can add an entry for the Lightsail subnet.

  1. Click the Inbound tab near the bottom of the screen
  2. Click the Edit button
  3. Click Add rule in the pop-up box
  4. From the Type drop-down choose MySQL/Aurora
  5. In the source box, enter 172.26.0.0/16 (this is the CIDR address for the Lightsail subnet)inbound rules
  6. Click Save

Migrate the data from the Lightsail Database to RDS

Now that Lightsail resources can talk with your RDS database, you can do the actual migration.

The initial step is to use mysqldump to export your database information into a file that can be imported into RDS. mysqldump has many options. In this case, you export a database named tasks. Choose the appropriate database for your use case, as well as any other options that make sense.

  1. Use either your own SSH client or the built-in web client to access the Lightsail instance you just created.
  2. Use the following mysqldump command to create a backup of your database to a text file (dump.sql). Substitute the connection values for your Lightsail database. These values  are on the details page of your database under Connection details. The database name must be specific to your environment.
mysqldump \
--host <lightsail database endpoint> \
--user <lightsail database username> \
--databases <database name> \
--password \
> dump.sql

For example:

mysqldump \
--host ls-randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--databases tasks \
--password \
--set-gtid-purged=OFF \
> dump.sql

Now that you have a database backup, you can import that into your RDS instance. You need the connection details from your RDS database. Use the username and password from when you created the database. You can find the endpoint on the details page of your database under Connectivity and security (See the following screenshot for an example).

endpoint and port for connectivity and security

If you are not already, return to the terminal session for the Lightsail instance that has the MySQL tools installed.

To import the data into the RDS database you must provide the contents of the dump.sql file to the mysql command line, too. The cat command lists out the file, and by using | (referred to as a pipe) we can send the output directly from that command into mysql.

cat dump.sql | \
mysql \
--host <RDS database endpoint> \
--user <RDS user> \
--password

For example:

cat dump.sql | \
mysql \
--host database.randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--password

You can also use the mysql command to see if the database was created (this is similar to what we did when we passed in the file in the previous step. Instead, this time we’re using echo to pipe in the command show databases;)

echo "show databases;| \
mysql \
--host <RDS database endpoint> \
--user <RDS user> \
--password

For example:

echo "show databases;" | \
mysql \
--host database.randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--password

From here, you reconfigure your application to access your new RDS database.

Conclusion

In this post I reviewed some common tasks that you might want to do once you created your Amazon Lightsail database. You learned how to scale up the size of your database, how to access it with command line tools, and how to migrate to RDS.

If you’ve not yet deployed a Managed Database on Lightsail why not head over to the Lightsail console and create one now. If you need a bit of guidance to get started, we have a workshop at https://lightsailworkshop.com that will show you how to use Lightsail to deploy a two-tier web application using a MySQL database backend. Please feel free to leave comments and questions for future blog posts.

Halodoc: Building the Future of Tele-Health One Microservice at a Time

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/halodoc-building-the-future-of-tele-health-one-microservice-at-a-time/

Halodoc, a Jakarta-based healthtech platform, uses tele-health and artificial intelligence to connect patients, doctors, and pharmacies. Join builder Adrian De Luca for this special edition of This is My Architecture as he dives deep into the solutions architecture of this Indonesian healthtech platform that provides healthcare services in one of the most challenging traffic environments in the world.

Explore how the company evolved its monolithic backend into decoupled microservices with Amazon EC2 and Amazon Simple Queue Service (SQS), adopted serverless to cost effectively support new user functionality with AWS Lambda, and manages the high volume and velocity of data with Amazon DynamoDB, Amazon Relational Database Service (RDS), and Amazon Redshift.

For more content like this, subscribe to our YouTube channels This is My Architecture, This is My Code, and This is My Model, or visit the This is My Architecture AWS website, which has search functionality and the ability to filter by industry, language, and service.

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

AWS SAM CLI

Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts

October

November

December

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:

Twitch

October

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Urgent & Important – Rotate Your Amazon RDS, Aurora, and DocumentDB Certificates

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/urgent-important-rotate-your-amazon-rds-aurora-and-documentdb-certificates/

You may have already received an email or seen a console notification, but I don’t want you to be taken by surprise!

Rotate Now
If you are using Amazon Aurora, Amazon Relational Database Service (RDS), or Amazon DocumentDB and are taking advantage of SSL/TLS certificate validation when you connect to your database instances, you need to download & install a fresh certificate, rotate the certificate authority (CA) for the instances, and then reboot the instances.

If you are not using SSL/TLS connections or certificate validation, you do not need to make any updates, but I recommend that you do so in order to be ready in case you decide to use SSL/TLS connections in the future. In this case, you can use a new CLI option that rotates and stages the new certificates but avoids a restart.

The new certificate (CA-2019) is available as part of a certificate bundle that also includes the old certificate (CA-2015) so that you can make a smooth transition without getting into a chicken and egg situation.

What’s Happening?
The SSL/TLS certificates for RDS, Aurora, and DocumentDB expire and are replaced every five years as part of our standard maintenance and security discipline. Here are some important dates to know:

September 19, 2019 – The CA-2019 certificates were made available.

January 14, 2020 – Instances created on or after this date will have the new (CA-2019) certificates. You can temporarily revert to the old certificates if necessary.

February 5 to March 5, 2020 – RDS will stage (install but not activate) new certificates on existing instances. Restarting the instance will activate the certificate.

March 5, 2020 – The CA-2015 certificates will expire. Applications that use certificate validation but have not been updated will lose connectivity.

How to Rotate
Earlier this month I created an Amazon RDS for MySQL database instance and set it aside in preparation for this blog post. As you can see from the screen shot above, the RDS console lets me know that I need to perform a Certificate update.

I visit Using SSL/TLS to Encrypt a Connection to a DB Instance and download a new certificate. If my database client knows how to handle certificate chains, I can download the root certificate and use it for all regions. If not, I download a certificate that is specific to the region where my database instance resides. I decide to download a bundle that contains the old and new root certificates:

Next, I update my client applications to use the new certificates. This process is specific to each app and each database client library, so I don’t have any details to share.

Once the client application has been updated, I change the certificate authority (CA) to rds-ca-2019. I can Modify the instance in the console, and select the new CA:

I can also do this via the CLI:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019

The change will take effect during the next maintenance window. I can also apply it immediately:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019 --apply-immediately

After my instance has been rebooted (either immediately or during the maintenance window), I test my application to ensure that it continues to work as expected.

If I am not using SSL and want to avoid a restart, I use --no-certificate-rotation-restart:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019 --no-certificate-rotation-restart

The database engine will pick up the new certificate during the next planned or unplanned restart.

I can also use the RDS ModifyDBInstance API function or a CloudFormation template to change the certificate authority.

Once again, all of this must be completed by March 5, 2020 or your applications may be unable to connect to your database instance using SSL or TLS.

Things to Know
Here are a couple of important things to know:

Amazon Aurora ServerlessAWS Certificate Manager (ACM) is used to manage certificate rotations for this database engine, and no action is necessary.

Regions – Rotation is needed for database instances in all commercial AWS regions except Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia).

Cluster Scaling – If you add more nodes to an existing cluster, the new nodes will receive the CA-2019 certificate if one or more of the existing nodes already have it. Otherwise, the CA-2015 certificate will be used.

Learning More
Here are some links to additional information:

Jeff;

 

ICYMI: Serverless re:Invent re:Cap 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-reinvent-recap-2019/

Thank you for attending re:Invent 2019

In the week before AWS re:Invent 2019 we wrote about a number of service and feature launches leading up to the biggest event of the year for us at AWS. These included new features for AWS Lambda, integrations for AWS Step Functions, and other exciting service and feature launches for related product areas. But this was just the warm-up – AWS re:Invent 2019 itself saw several new serverless or serverless related announcements.

Here’s what’s new.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

AWS Step Functions

Express work flows

AWS Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high performance workloads at a reduced cost.

Amazon EventBridge

EventBridge schema registry and discovery

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon API Gateway

HTTP API

Amazon API Gateway announced the preview of HTTP APIs. With HTTP APIs most customers will see an average cost saving up to 70%, when compared to API Gateway REST APIs. In addition, you will see significant performance improvements in the API Gateway service overhead. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

Databases

Amazon Relational Database Services (RDS) announced a previews of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

RDS Proxy

The RDS proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Amplify

Amplify platform choices

AWS Amplify has expanded their delivery platforms to include iOS and Android. Developers can now build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

Amazon CodeGuru

Whether you are a team of one or an enterprise with thousands of developers, code review can be difficult. At re:Invent 2019, AWS announced a preview of Amazon CodeGuru, a machine learning based service to help make code reviews more effective and aid developers in writing code that is secure, performant, and consistent.

Serverless talks from re:Invent 2019

re:Invent presentation recordings

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

Conclusion

Prior to AWS re:Invent, AWS serverless had many service and feature launches and the pace continued throughout re:Invent itself. As we head towards 2020, follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action

Additionally, the AWS Serverless Developer Advocates will be crossing the globe to tell you more about serverless, and to hear more about what you need. You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Using Amazon RDS Proxy with AWS Lambda

Post Syndicated from George Mao original https://aws.amazon.com/blogs/compute/using-amazon-rds-proxy-with-aws-lambda/

The AWS Serverless platform allows you to build applications that automatically scale in response to demand. During periods of high volume, Amazon API Gateway and AWS Lambda scale automatically in response to incoming load.

Often developers must access data stored in relational databases from Lambda functions. But it can be challenging to ensure that your Lambda invocations do not overload your database with too many connections. The number of maximum concurrent connections for a relational database depends on how it is sized.

This is because each connection consumes memory and CPU resources on the database server. Lambda functions can scale to tens of thousands of concurrent connections, meaning your database needs more resources to maintain connections instead of executing queries.

See the architecture blog post “How to Design your serverless apps for massive scale” for more detail on scaling.

Serverless Architecture with RDS

Serverless Architecture with RDS

This design places high load on your backend relational database because Lambda can easily scale to tens of thousands of concurrent requests. In most cases, relational databases are not designed to accept the same number of concurrent connections.

Database proxy for Amazon RDS

Today, we’re excited to announce the preview for Amazon RDS Proxy. RDS Proxy acts as an intermediary between your application and an RDS database. RDS Proxy establishes and manages the necessary connection pools to your database so that your application creates fewer database connections.

You can use RDS Proxy for any application that makes SQL calls to your database. But in the context of serverless, we focus on how this improves the Lambda experience. The proxy handles all database traffic that normally flows from your Lambda functions directly to the database.

Your Lambda functions interact with RDS Proxy instead of your database instance. It handles the connection pooling necessary for scaling many simultaneous connections created by concurrent Lambda functions. This allows your Lambda applications to reuse existing connections, rather than creating new connections for every function invocation.

The RDS Proxy scales automatically so that your database instance needs less memory and CPU resources for connection management. It also uses warm connection pools to increase performance. With RDS Proxy, you no longer need code that handles cleaning up idle connections and managing connection pools. Your function code is cleaner, simpler, and easier to maintain.

Getting started

The RDS Database proxy is in preview, so there are a few things to keep in mind:

  • We currently support Amazon RDS MySQL or Aurora MySQL, running on MySQL versions 5.6 or 5.7
  • The preview is available in Asia Pacific (Tokyo), EU (Ireland), US East (Ohio), US East (N.Virginia), and US West (Oregon)
  • During the public preview, you should use the AWS Management Console to interact with RDS Proxy
  • Do not use this service for production workloads as you might encounter preview-related changes

Review the preview guide for a detailed description of the service

Prerequisites

Start with an existing database that is either Amazon RDS MySQL or Aurora MySQL. Then, store your database credentials as a secret in AWS Secrets Manager, and create an IAM Policy that allows RDS Proxy to read this secret.

To create the secret:

  1. Sign into AWS Secrets Manager and choose Store a new Secret.
  2. Choose Credentials for RDS Database.
  3. Enter the user name and password.
  4. Select the RDS Database this secret is valid for. Choose Next.

    Store a new secret

    Store a new secret

  5. Enter a Secret Name and choose Next.

    Save the secret

    Save the secret

  6. Accept all defaults and choose Store. Note the ARN assigned to this secret, as you need it later.

    Secret details

    Secret details

  7. Now create an IAM role that allows RDS Proxy to read this secret. RDS Proxy uses this secret to maintain a connection pool to your database. Go to your IAM console and create a new role. Add a policy that provides secretsmanager permissions to the secret you created in the previous step. For example:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetResourcePolicy",
            "secretsmanager:GetSecretValue",
            "secretsmanager:DescribeSecret",
            "secretsmanager:ListSecretVersionIds"
          ],
          "Resource": [
            "arn:aws:secretsmanager:us-east-2:[your-account-number]:secret:gmao-rds-secret-YZ2MMN"
          ]
        },
        {
          "Sid": "VisualEditor1",
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetRandomPassword",
            "secretsmanager:ListSecrets"
          ],
          "Resource": "*"
        }
      ]
    }
  8. Add the following Trust Policy to allow RDS to assume the role. Save the role and take note of the IAM Role ARN, as you need it later.
    {
     "Version": "2012-10-17",
     "Statement": [
      {
       "Sid": "",
       "Effect": "Allow",
       "Principal": {
        "Service": "rds.amazonaws.com"
       },
       "Action": "sts:AssumeRole"
      }
     ]
    }

Create and attach a proxy to a Lambda function

Next, use the Lambda console to Add a Database proxy to a Lambda function.

  1. Sign into the AWS Lambda console and open the Lambda function you would like to enable RDS Proxy.
  2. Scroll to the bottom of your Lambda configuration page and choose Add Database Proxy.

    Add database proxy

    Add database proxy

  3. Follow the Add database proxy wizard, and fill in the Proxy Identifier and select your RDS Database. Then choose the Secrets Manager secret and the IAM role you created earlier. RDS Proxy uses this secret to connect to your database. Choose Add.

    Configure database proxy

    Configure database proxy

  4. Wait a few minutes for the RDS Proxy to provision and the status updates to Available.

    Database proxy available

    Database proxy available

  5. Choose your proxy to view the details. Note the Proxy endpoint. You need this later in the Lambda function code.

    Available Proxy configuration

    Available Proxy configuration

Now the Lambda function has permission to use the configured RDS Proxy, and you are ready to connect to the proxy.

Using the proxy

Instead of connecting directly to the RDS instance, connect to the RDS proxy. To do this, you have two security options. You can use IAM authentication or you can use your native database credentials stored in Secrets Manager. IAM authentication is recommended  because it removes the need to embed or read credentials in your function code. For simplicity, this guide uses the database credentials created earlier in Secrets Manager.

You can use any Lambda-supported programming language. The example below uses Node.js:

let mysql = require('mysql');

let connection;
  
connection = mysql.createConnection({
  host   : process.env['endpoint'],
  user   : process.env['user'],
  password : process.env['password'],
  database : process.env['db']
});

exports.handler = async (event) => {

  console.log("Starting query ...");
  
  connection.connect(function(err) {
    if (err) {
     console.error('error connecting: ' + err.stack);
     return;
    }
    
    console.log('connected as id ' + connection.threadId);
  });

  // Do some work here
  
  connection.end(function(error, results) {
     // The connection is terminated now 
     console.log("Connection ended");
     return "success";
  });
};

You need to package the NodeJS MySQL client module with your function. I use Lambda environment variables to store the connection information. This is the best practice for database configuration settings so you can change these details without updating your code. The endpoint environment variable is the RDS Proxy Endpoint noted earlier. The user and password are the database credentials, and the db variable is the database schema name.

If you choose to authenticate with IAM, make sure that your Lambda execution role includes rds-db:connect permissions as outlined here. The Lambda console automatically does this on your behalf. This option allows you to retrieve a temporary token from IAM to authenticate to the database, instead of using native database credentials.

Conclusion

RDS Proxy helps you manage a large number of connections from Lambda to an RDS database by establishing a warm connection pool to the database. Your Lambda functions can scale to meet your needs and use the RDS Proxy to serve multiple concurrent application requests. This reduces the CPU and Memory requirements for your database, and eliminates the need for connection management logic in your code.

We look forward to your feedback during this preview!

Now Available – Amazon Relational Database Service (RDS) on VMware

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-amazon-relational-database-service-rds-on-vmware/

Last year I told you that we were working to give you Amazon RDS on VMware, with the goal of bringing many of the benefits of Amazon Relational Database Service (RDS) to your on-premises virtualized environments. These benefits include the ability to provision new on-premises databases in minutes, make backups, and restore to a point in time. You get automated management of your on-premises databases, without having to provision and manage the database engine.

Now Available
Today, I am happy to report that Amazon RDS on VMware is available for production use, and that you can start using it today. We are launching with support for Microsoft SQL Server, PostgreSQL, and MySQL.

Here are some important prerequisites:

CompatibilityRDS on VMware works with vSphere clusters that run version 6.5 or better.

Connectivity – Your vSphere cluster must have outbound connectivity to the Internet, and must be able to make HTTPS connections to the public AWS endpoints.

Permissions – You will need to have Administrative privileges (and the skills to match) on the cluster in order to set up RDS on VMware. You will also need to have (or create) a second set of credentials for use by RDS on VMware.

Hardware – The hardware that you use to host RDS on VMware must be listed in the relevant VMware Hardware Compatibility Guide.

Resources – Each cluster must have at least 24 vCPUs, 24 GiB of memory, and 180 GB of storage for the on-premises management components of RDS on VMware, along with additional resources to support the on-premises database instances that you launch.

Setting up Amazon RDS on VMware
Due to the nature of this service, the setup process is more involved than usual and I am not going to walk through it at my usual level of detail. Instead, I am going to outline the process and refer you to the Amazon RDS on VMware User Guide for more information. During the setup process, you will be asked to supply details of your vCenter/ESXi configuration. For best results, I advise a dry-run through the User Guide so that you can find and organize all of the necessary information.

Here are the principal steps, assuming that you already have a running vSphere data center:

Prepare Environment – Check vSphere version, confirm storage device & free space, provision resource pool.

Configure Cluster Control Network – Create a network for control traffic and monitoring. Must be a vSphere distributed port group with 128 to 1022 ports.

Configure Application Network – This is the network that applications, users, and DBAs will use to interact with the RDS on VMware DB instances. It must be a vSphere distributed port group with 128 to 1022 ports, and it must span all of the ESXi hosts that underly the cluster. The network must have an IPv4 subnet large enough to accommodate all of the instances that you expect to launch. In many cases your cluster will already have an Application Network.

Configure Management Network – Configure your ESXi hosts to add a route to the Edge Router (part of RDS on VMware) in the Cluster Control Network

Configure vCenter Credentials – Create a set of credentials for use during the onboarding process.

Configure Outbound Internet Access – Confirm that outbound connections can be made from the Edge Router in your virtual data center to AWS services.

With the preparatory work out of the way, the next step is to bring the cluster onboard by creating a custom (on-premises) Availability Zone and using the installer to install the product. I open the RDS Console, choose the US East (N. Virginia) Region, and click Custom availability zones:

I can see my existing custom AZs and their status. I click Create custom AZ to proceed:

I enter a name for my AZ and for the VPN tunnel between the selected AWS region and my vSphere data center, and then I enter the IP address of the VPN. Then I click Create custom AZ:

My new AZ is visible, in status Unregistered:

To register my vSphere cluster as a Custom AZ, I click Download Installer from the AWS Console to download the RDS on VMware installer. I deploy the installer in my cluster and follow through the guided wizard to fill in the network configurations, AWS credentials, and so forth, then start the installation. After the installation is complete, the status of my custom AZ will change to Active. Behind the scenes, the installer automatically deploys the on-premises components of RDS on VMware and connects the vSphere cluster to the AWS region.

Some of the database engines require me to bring my own media and an on-premises license. I can import the installation media that I have in my data center onto RDS and use it to launch the database engine. For example, here’s my media image for SQL Server Enterprise Edition:

The steps above must be done on a cluster-by-cluster basis. Once a cluster has been set up, multiple Database instances can be launched, based on available compute, storage, and network (IP address) resources.

Using Amazon RDS for VMware
With all of the setup work complete, I can use the same interfaces (RDS Console, RDS CLI, or the RDS APIs) to launch and manage Database instances in the cloud and on my on-premises network.

I’ll use the RDS Console, and click Create database to get started. I choose On-premises and pick my custom AZ, then choose a database engine:

I enter a name for my instance, another name for the master user, and enter (or let RDS assign) a password:

Then I pick the DB instance class (the v11 in the names refers to version 11 of the VMware virtual hardware definition) and click Create database:

Here’s a more detailed look at some of the database instance sizes. As is the case with cloud-based instance sizes, the “c” instances are compute-intensive, the “r” instances are memory-intensive, and the “m” instances are general-purpose:

The status of my new database instance starts out as Creating, and progresses though Backing-up and then to Available:

Once it is ready, the endpoint is available in the console:

On-premises applications can use this endpoint to connect to the database instance across the Application Network.

Before I wrap up, let’s take a look at a few other powerful features of RDS on VMware: Snapshot backups, point-in-time restores, and the power to change the DB instance class.

Snapshot backups are a useful companion to the automated backups taken daily by RDS on VMware. I simply select Take snapshot from the Action menu:

To learn more, read Creating a DB Snapshot.

Point in time recovery allows me to create a fresh on-premises DB instance based on the state of an existing one at an earlier point in time. To learn more, read Restoring a DB Instance to a Specified Time.

I can change the DB instance class in order to scale up or down in response to changing requirements. I select Modify from the Action menu, choose the new class, and click Submit:

The modification will be made during the maintenance window for the DB instance.

A few other features that I did not have the space to cover include renaming an existing DB instance (very handy for disaster recovery), and rebooting a DB instance.

Available Now
Amazon RDS on VMware is available now and you can start using it today in the US East (N. Virginia) Region.

Jeff;

 

Things to Consider When You Build REST APIs with Amazon API Gateway

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/things-to-consider-when-you-build-rest-apis-with-amazon-api-gateway/

A few weeks ago, we kicked off this series with a discussion on REST vs GraphQL APIs. This post will dive deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.

Request Rate (a.k.a. “TPS”)

Request rate is the first thing you should consider when designing REST APIs. By default, API Gateway allows for up to 10,000 requests per second. You should use the built in Amazon CloudWatch metrics to review how your API is being used. The Count metric in particular can help you review the total number of API requests in a given period.

It’s important to understand the actual request rate that your architecture is capable of supporting. For example, consider this architecture:

REST API 1

This API accepts GET requests to retrieve a user’s cart by using a Lambda function to perform SQL queries against a relational database managed in RDS.  If you receive a large burst of traffic, both API Gateway and Lambda will scale in response to the traffic. However, relational databases typically have limited memory/cpu capacity and will quickly exhaust the total number of connections.

As an API architect, you should design your APIs to protect your down stream applications.  You can start by defining API Keys and requiring your clients to deliver a key with incoming requests. This lets you track each application or client who is consuming your API.  This also lets you create Usage Plans and throttle your clients according to the plan you define.  For example, you if you know your architecture is capable of of sustaining 200 requests per second, you should define a Usage plan that sets a rate of 200 RPS and optionally configure a quota to allow a certain number of requests by day, week, or month.

Additionally, API Gateway lets you define throttling settings for the whole stage or per method. If you know that a GET operation is less resource intensive than a POST operation you can override the stage settings and set different throttling settings for each resource.

Integrations and Design patterns

The example above describes a synchronous, tightly coupled architecture where the request must wait for a response from the backend integration (RDS in this case). This results in system scaling characteristics that are the lowest common denominator of all components. Instead, you should look for opportunities to design an asynchronous, loosely coupled architecture. A decoupled architecture separates the data ingestion from the data processing and allows you to scale each system separately. Consider this new architecture:

REST API 2

This architecture enables ingestion of orders directly into a highly scalable and durable data store such as Amazon Simple Queue Service (SQS).  Your backend can process these orders at any speed that is suitable for your business requirements and system ability.  Most importantly,  the health of the backend processing system does not impact your ability to continue accepting orders.

Security

Security with API Gateway falls into three major buckets, and I’ll outline them below. Remember, you should enable all three options to combine multiple layers of security.

Option 1 (Application Firewall)

You can enable AWS Web Application Firewall (WAF) for your entire API. WAF will inspect all incoming requests and block requests that fail your inspection rules. For example, WAF can inspect requests for SQL Injection, Cross Site Scripting, or whitelisted IP addresses.

Option 2 (Resource Policy)

You can apply a Resource Policy that protects your entire API. This is an IAM policy that is applied to your API and you can use this to white/black list client IP ranges or allow AWS accounts and AWS principals to access your API.

Option 3 (AuthZ)

  1. IAM:This AuthZ option requires clients to sign requests with the AWS v4 signing process. The associated IAM role or user must have permissions to perform the execute-api:Invoke action against the API.
  2. Cognito: This AuthZ option requires clients to login into Cognito and then pass the returned ID or Access JWT token in the Authentication header.
  3. Lambda Auth: This AuthZ option is the most flexible and lets you execute a Lambda function to perform any custom auth strategy needed. A common use case for this is OpenID Connect.

A Couple of Tips

Tip #1: Use Stage variables to avoid hard coding your backend Lambda and HTTP integrations. For example, you probably have multiple stages such as “QA” and “PROD” or “V1” and “V2.” You can define the same variable in each stage and specify different values. For example, you might an API that executes a Lambda function. In each stage, define the same variable called functionArn. You can reference this variable as your Lambda ARN during your integration configuration using this notation: ${stageVariables.functionArn}. API Gateway will inject the corresponding value for the stage dynamically at runtime, allowing you to execute different Lambda functions by stage.

Tip #2: Use Path and Query variables to inject dynamic values into your HTTP integrations. For example, your cart API may define a userId Path variable that is used to lookup a user’s cart: /cart/profile/{userId}. You can inject this variable directly into your backend HTTP integration URL settings like this: http://myapi.someds.com/cart/profile/{userId}

Summary

This post covered strategies you should use to ensure your REST API architectures are scalable and easy to maintain.  I hope you’ve enjoyed this post and our next post will cover GraphQL API architectures with AWS AppSync.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

How to Design Your Serverless Apps for Massive Scale

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/how-to-design-your-serverless-apps-for-massive-scale/

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Scaling Considerations

In general, developers in a “serverful” world need to be worried about how many total requests can be served throughout the day, week, or month, and how quickly their system can scale. As you move into the serverless world, the most important question you should understand becomes: “What is the concurrency that your system is designed to handle?”

The AWS Serverless platform allows you to scale very quickly in response to demand. Below is an example of a serverless design that is fully synchronous throughout the application. During periods of extremely high demand, Amazon API Gateway and AWS Lambda will scale in response to your incoming load. This design places extremely high load on your backend relational database because Lambda can easily scale from thousands to tens of thousands of concurrent requests. In most cases, your relational databases are not designed to accept the same number of concurrent connections.

Serverless at scale-1

This design risks bottlenecks at your relational database and may cause service outages. This design also risks data loss due to throttling or database connection exhaustion.

Cloud Native Design

Instead, you should consider decoupling your architecture and moving to an asynchronous model. In this architecture, you use an intermediary service to buffer incoming requests, such as Amazon Kinesis or Amazon Simple Queue Service (SQS). You can configure Kinesis or SQS as out of the box event sources for Lambda. In design below, AWS will automatically poll your Kinesis stream or SQS resource for new records and deliver them to your Lambda functions. You can control the batch size per delivery and further place throttles on a per Lambda function basis.

Serverless at scale - 2

This design allows you to accept extremely high volume of requests, store the requests in a durable datastore, and process them at the speed which your system can handle.

Conclusion

Serverless computing allows you to scale much quicker than with server-based applications, but that means application architects should always consider the effects of scaling to your downstream services. Always keep in mind cost, speed, and reliability when you’re building your serverless applications.

Our next post in this series will discuss the different ways to invoke your Lambda functions and how to design your applications appropriately.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

How to securely provide database credentials to Lambda functions by using AWS Secrets Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/security/how-to-securely-provide-database-credentials-to-lambda-functions-by-using-aws-secrets-manager/

As a solutions architect at AWS, I often assist customers in architecting and deploying business applications using APIs and microservices that rely on serverless services such as AWS Lambda and database services such as Amazon Relational Database Service (Amazon RDS). Customers can take advantage of these fully managed AWS services to unburden their teams from infrastructure operations and other undifferentiated heavy lifting, such as patching, software maintenance, and capacity planning.

In this blog post, I’ll show you how to use AWS Secrets Manager to secure your database credentials and send them to Lambda functions that will use them to connect and query the backend database service Amazon RDS—without hardcoding the secrets in code or passing them through environment variables. This approach will help you secure last-mile secrets and protect your backend databases. Long living credentials need to be managed and regularly rotated to keep access into critical systems secure, so it’s a security best practice to periodically reset your passwords. Manually changing the passwords would be cumbersome, but AWS Secrets Manager helps by managing and rotating the RDS database passwords.

Solution overview

This is sample code: you’ll use an AWS CloudFormation template to deploy the following components to test the API endpoint from your browser:

  • An RDS MySQL database instance on a db.t2.micro instance
  • Two Lambda functions with necessary IAM roles and IAM policies, including access to AWS Secrets Manager:
    • LambdaRDSCFNInit: This Lambda function will execute immediately after the CloudFormation stack creation. It will create an “Employees” table in the database, where it will insert three sample records.
    • LambdaRDSTest: This function will query the Employees table and return the record count in an HTML string format
  • RESTful API with “GET” method on AWS API Gateway

Here’s the high level setup of the AWS services that will be created from the CloudFormation stack deployment:
 

Figure 1: Solution architecture

Figure 1: Architecture diagram

  1. Clients call the RESTful API hosted on AWS API Gateway
  2. The API Gateway executes the Lambda function
  3. The Lambda function retrieves the database secrets using the Secrets Manager API
  4. The Lambda function connects to the RDS database using database secrets from Secrets Manager and returns the query results

You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/AWS-SecretsManager-Lambda-RDS-blog.

Deploying the sample solution

Set up the sample deployment by selecting the Launch Stack button below. If you haven’t logged into your AWS account, follow the prompts to log in.

By default, the stack will be deployed in the us-east-1 region. If you want to deploy this stack in any other region, download the code from the above GitHub link, place the Lambda code zip file in a region-specific S3 bucket and make the necessary changes in the CloudFormation template to point to the right S3 bucket. (Please refer to the AWS CloudFormation User Guide for additional details on how to create stacks using the AWS CloudFormation console.)
 
Select this image to open a link that starts building the CloudFormation stack

Next, follow these steps to execute the stack:

  1. Leave the default location for the template and select Next.
     
    Figure 2: Keep the default location for the template

    Figure 2: Keep the default location for the template

  2. On the Specify Details page, you’ll see the parameters pre-populated. These parameters include the name of the database and the database user name. Select Next on this screen
     
    Figure 3: Parameters on the "Specify Details" page

    Figure 3: Parameters on the “Specify Details” page

  3. On the Options screen, select the Next button.
  4. On the Review screen, select both check boxes, then select the Create Change Set button:
     
    Figure 4: Select the check boxes and "Create Change Set"

    Figure 4: Select the check boxes and “Create Change Set”

  5. After the change set creation is completed, choose the Execute button to launch the stack.
  6. Stack creation will take between 10 – 15 minutes. After the stack is created successfully, select the Outputs tab of the stack, then select the link.
     
    Figure 5:  Select the link on the "Outputs" tab

    Figure 5: Select the link on the “Outputs” tab

    This action will trigger the code in the Lambda function, which will query the “Employee” table in the MySQL database and will return the results count back to the API. You’ll see the following screen as output from the RESTful API endpoint:
     

    Figure 6:   Output from the RESTful API endpoint

    Figure 6: Output from the RESTful API endpoint

At this point, you’ve successfully deployed and tested the API endpoint with a backend Lambda function and RDS resources. The Lambda function is able to successfully query the MySQL RDS database and is able to return the results through the API endpoint.

What’s happening in the background?

The CloudFormation stack deployed a MySQL RDS database with a randomly generated password using a secret resource. Now that the secret resource with randomly generated password has been created, the CloudFormation stack will use dynamic reference to resolve the value of the password from Secrets Manager in order to create the RDS instance resource. Dynamic references provide a compact, powerful way for you to specify external values that are stored and managed in other AWS services, such as Secrets Manager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value, keeping the database password safe. The CloudFormation template also creates a Lambda function to do automatic rotation of the password for the MySQL RDS database every 30 days. Native credential rotation can improve security posture, as it eliminates the need to manually handle database passwords through the lifecycle process.

Below is the CloudFormation code that covers these details:


#This is a Secret resource with a randomly generated password in its SecretString JSON.
MyRDSInstanceRotationSecret:
    Type: AWS::SecretsManager::Secret
    Properties:
    Description: 'This is my rds instance secret'
    GenerateSecretString:
        SecretStringTemplate: !Sub '{"username": "${!Ref RDSUserName}"}'
        GenerateStringKey: 'password'
        PasswordLength: 16
        ExcludeCharacters: '"@/\'
    Tags:
    -
        Key: AppNam
        Value: MyApp

#This is a RDS instance resource. Its master username and password use dynamic references to resolve values from
#SecretsManager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value
#We use a ref to the Secret resource logical id in order to construct the dynamic reference, since the Secret name is being
#generated by CloudFormation
MyDBInstance2:
    Type: AWS::RDS::DBInstance
    Properties:
    AllocatedStorage: 20
    DBInstanceClass: db.t2.micro
    DBName: !Ref RDSDBName
    Engine: mysql
    MasterUsername: !Ref RDSUserName
    MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:password}}' ]]
    MultiAZ: False
    PubliclyAccessible: False      
    StorageType: gp2
    DBSubnetGroupName: !Ref myDBSubnetGroup
    VPCSecurityGroups:
    - !Ref RDSSecurityGroup
    BackupRetentionPeriod: 0
    DBInstanceIdentifier: 'rotation-instance'

#This is a SecretTargetAttachment resource which updates the referenced Secret resource with properties about
#the referenced RDS instance
SecretRDSInstanceAttachment:
    Type: AWS::SecretsManager::SecretTargetAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    TargetId: !Ref MyDBInstance2
    TargetType: AWS::RDS::DBInstance
#This is a RotationSchedule resource. It configures rotation of password for the referenced secret using a rotation lambda
#The first rotation happens at resource creation time, with subsequent rotations scheduled according to the rotation rules
#We explicitly depend on the SecretTargetAttachment resource being created to ensure that the secret contains all the
#information necessary for rotation to succeed
MySecretRotationSchedule:
    Type: AWS::SecretsManager::RotationSchedule
    DependsOn: SecretRDSInstanceAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    RotationLambdaARN: !GetAtt MyRotationLambda.Arn
    RotationRules:
        AutomaticallyAfterDays: 30

#This is a lambda Function resource. We will use this lambda to rotate secrets
#For details about rotation lambdas, see https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html     https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html
#The below example assumes that the lambda code has been uploaded to a S3 bucket, and that it will rotate a mysql database password
MyRotationLambda:
    Type: AWS::Serverless::Function
    Properties:
    Runtime: python2.7
    Role: !GetAtt MyLambdaExecutionRole.Arn
    Handler: mysql_secret_rotation.lambda_handler
    Description: 'This is a lambda to rotate MySql user passwd'
    FunctionName: 'cfn-rotation-lambda'
    CodeUri: 's3://devsecopsblog/code.zip'      
    Environment:
        Variables:
        SECRETS_MANAGER_ENDPOINT: !Sub 'https://secretsmanager.${AWS::Region}.amazonaws.com' 

Verifying the solution

To be certain that everything is set up properly, you can look at the Lambda code that’s querying the database table by following the below steps:

  1. Go to the AWS Lambda service page
  2. From the list of Lambda functions, click on the function with the name scm2-LambdaRDSTest-…
  3. You can see the environment variables at the bottom of the Lambda Configuration details screen. Notice that there should be no database password supplied as part of these environment variables:
     
    Figure 7: Environment variables

    Figure 7: Environment variables

    
        import sys
        import pymysql
        import boto3
        import botocore
        import json
        import random
        import time
        import os
        from botocore.exceptions import ClientError
        
        # rds settings
        rds_host = os.environ['RDS_HOST']
        name = os.environ['RDS_USERNAME']
        db_name = os.environ['RDS_DB_NAME']
        helperFunctionARN = os.environ['HELPER_FUNCTION_ARN']
        
        secret_name = os.environ['SECRET_NAME']
        my_session = boto3.session.Session()
        region_name = my_session.region_name
        conn = None
        
        # Get the service resource.
        lambdaClient = boto3.client('lambda')
        
        
        def invokeConnCountManager(incrementCounter):
            # return True
            response = lambdaClient.invoke(
                FunctionName=helperFunctionARN,
                InvocationType='RequestResponse',
                Payload='{"incrementCounter":' + str.lower(str(incrementCounter)) + ',"RDBMSName": "Prod_MySQL"}'
            )
            retVal = response['Payload']
            retVal1 = retVal.read()
            return retVal1
        
        
        def openConnection():
            print("In Open connection")
            global conn
            password = "None"
            # Create a Secrets Manager client
            session = boto3.session.Session()
            client = session.client(
                service_name='secretsmanager',
                region_name=region_name
            )
            
            # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
            # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
            # We rethrow the exception by default.
            
            try:
                get_secret_value_response = client.get_secret_value(
                    SecretId=secret_name
                )
                print(get_secret_value_response)
            except ClientError as e:
                print(e)
                if e.response['Error']['Code'] == 'DecryptionFailureException':
                    # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InternalServiceErrorException':
                    # An error occurred on the server side.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidParameterException':
                    # You provided an invalid value for a parameter.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidRequestException':
                    # You provided a parameter value that is not valid for the current state of the resource.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'ResourceNotFoundException':
                    # We can't find the resource that you asked for.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
            else:
                # Decrypts secret using the associated KMS CMK.
                # Depending on whether the secret is a string or binary, one of these fields will be populated.
                if 'SecretString' in get_secret_value_response:
                    secret = get_secret_value_response['SecretString']
                    j = json.loads(secret)
                    password = j['password']
                else:
                    decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
                    print("password binary:" + decoded_binary_secret)
                    password = decoded_binary_secret.password    
            
            try:
                if(conn is None):
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
                elif (not conn.open):
                    # print(conn.open)
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
        
            except Exception as e:
                print (e)
                print("ERROR: Unexpected error: Could not connect to MySql instance.")
                raise e
        
        
        def lambda_handler(event, context):
            if invokeConnCountManager(True) == "false":
                print ("Not enough Connections available.")
                return False
        
            item_count = 0
            try:
                openConnection()
                # Introducing artificial random delay to mimic actual DB query time. Remove this code for actual use.
                time.sleep(random.randint(1, 3))
                with conn.cursor() as cur:
                    cur.execute("select * from Employees")
                    for row in cur:
                        item_count += 1
                        print(row)
                        # print(row)
            except Exception as e:
                # Error while opening connection or processing
                print(e)
            finally:
                print("Closing Connection")
                if(conn is not None and conn.open):
                    conn.close()
                invokeConnCountManager(False)
        
            content =  "Selected %d items from RDS MySQL table" % (item_count)
            response = {
                "statusCode": 200,
                "body": content,
                "headers": {
                    'Content-Type': 'text/html',
                }
            }
            return response        
        

In the AWS Secrets Manager console, you can also look at the new secret that was created from CloudFormation execution by following the below steps:

  1. Go to theAWS Secret Manager service page with appropriate IAM permissions
  2. From the list of secrets, click on the latest secret with the name MyRDSInstanceRotationSecret-…
  3. You will see the secret details and rotation information on the screen, as shown in the following screenshot:
     
    Figure 8: Secret details and rotation information

    Figure 8: Secret details and rotation information

Conclusion

In this post, I showed you how to manage database secrets using AWS Secrets Manager and how to leverage Secrets Manager’s API to retrieve the secrets into a Lambda execution environment to improve database security and protect sensitive data. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and ongoing maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Secrets Manager Forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ramesh Adabala

Ramesh is a Solution Architect on the Southeast Enterprise Solution Architecture team at AWS.

Introducing AWS Solutions: Expert architectures on demand

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/architecture/introducing-aws-solutions-expert-architectures-on-demand/

AWS Solutions Architects are on the front line of helping customers succeed using our technologies. Our team members leverage their deep knowledge of AWS technologies to build custom solutions that solve specific problems for clients. But many customers want to solve common technical problems that don’t require custom solutions, or they want a general solution they can use as a reference to build their own custom solution. For these customers, we offer AWS Solutions: vetted, technical reference implementations built by AWS Solutions Architects and AWS Partner Network partners. AWS Solutions are designed to help customers solve common business and technical problems, or they can be customized for specific use cases.

AWS Solutions are built to be operationally effective, performant, reliable, secure, and cost-effective; and incorporate architectural frameworks such as the Well-Architected Framework. Every AWS Solution comes with a detailed architecture diagram, a deployment guide, and instructions for both manual and automated deployment.

Here are some Solutions we are particularly excited about.

Media2Cloud

We released the Media2Cloud solution in January 2019. This solution helps customers migrate their existing video archives to the cloud. Media2Cloud sets up a serverless end-to-end workflow to ingest your videos and establish metadata, proxy videos, and image thumbnails.

Because it can be a challenging and slow process to migrate your existing video archives to the cloud, the Media2Cloud solution builds the following architecture.

Media2Cloud architecture

The solution leverages the Media Analysis Solution to analyze and extract valuable metadata from your video archives using Amazon Rekognition, Amazon Transcribe, and Amazon Comprehend.

The solution also includes a simple web interface that helps make it easier to get started ingesting your videos to the AWS Cloud. This solution is set up to integrate with AWS Partner Network partners to help customers migrate their video archives to the cloud.

AWS Instance Scheduler

In October 2018, we updated the AWS Instance Scheduler, a solution that enables customers to easily configure custom start and stop schedules for their Amazon EC2 and Amazon RDS instances.

When you deploy the solution’s template, the solution builds the following architecture.

AWS Instance Scheduler

 

For customers who leave all of their instances running at full utilization, this solution can result in up to 70% cost savings for those instances that are only necessary during regular business.

The Instance Scheduler solution gives you the flexibility to automatically manage multiple schedules as necessary, configure multiple start and stop schedules by either deploying multiple Instance Schedulers or modifying individual resource tags, and review Instance Scheduler metrics to better assess your Instance capacity and usage, and to calculate your cost savings.

AWS Connected Vehicle Solution

In January 2018, we updated the AWS Connected Vehicle Solution, a solution that provides secure vehicle connectivity to the AWS Cloud. This solution includes capabilities for local computing within vehicles, sophisticated event rules, and data processing and storage. The solution also allows you to implement a core framework for connected vehicle services that allows you to focus on developing new functionality rather than managing infrastructure.

When you deploy the solution’s template, the solution builds the following architecture.

Connected Vehicle solution

You can build upon this framework to address a variety of use cases such as voice interaction, navigation and other location-based services, remote vehicle diagnostics and health monitoring, predictive analytics and required maintenance alerts, media streaming services, vehicle safety and security services, head unit applications, and mobile applications.

These are just some of our current offerings. Other notable Solutions include AWS WAF Security Automations, Machine Learning for Telecommunication, and AWS Landing Zone. In the coming months, we plan to continue expanding our portfolio of AWS Solutions to address common business and technical problems that our customers face. Visit our homepage to keep up to date with the latest AWS Solutions.

Pick the Right Tool for your IT Challenge

Post Syndicated from Markus Ostertag original https://aws.amazon.com/blogs/aws/pick-the-right-tool-for-your-it-challenge/

This guest post is by AWS Community Hero Markus Ostertag. As CEO of the Munich-based ad-tech company Team Internet AG, Markus is always trying to find the best ways to leverage the cloud, loves to work with cutting-edge technologies, and is a frequent speaker at AWS events and the AWS user group Munich that he co-founded in 2014.

Picking the right tools or services for a job is a huge challenge in IT—every day and in every kind of business. With this post, I want to share some strategies and examples that we at Team Internet used to leverage the huge “tool box” of AWS to build better solutions and solve problems more efficiently.

Use existing resources or build something new? A hard decision

The usual day-to-day work of an IT engineer, architect, or developer is building a solution for a problem or transferring a business process into software. To achieve this, we usually tend to use already existing architectures or resources and build an “add-on” to it.

With the rise of microservices, we all learned that modularization and decoupling are important for being scalable and extendable. This brought us to a different type of software architecture. In reality, we still tend to use already existing resources, like the same database of existing (maybe not fully used) Amazon EC2 instances, because it seems easier than building up new stuff.

Stacks as “next level microservices”?

We at Team Internet are not using the vocabulary of microservices but tend to speak about stacks and building blocks for the different use cases. Our approach is matching the idea of microservices to everything, including the database and other resources that are necessary for the specific problem we need to address.

It’s not about “just” dividing the software and code into different modules. The whole infrastructure is separated based on different needs. Each of those parts of the full architecture is our stack, which is as independent as possible from everything else in the whole system. It only communicates loosely with the other stacks or parts of the infrastructure.

Benefits of this mindset = independence and flexibility

  • Choosing the right parts. For every use case, we can choose the components or services that are best suited for the specific challenges and don’t need to work around limitations. This is especially true for databases, as we can choose from the whole palette instead of trying to squeeze requirements into a DBMS that isn’t built for that. We can differentiate the different needs of workloads like write-heavy vs. read-heavy or structured vs. unstructured data.
  • Rebuilding at will. We’re flexible in rebuilding whole stacks as they’re only loosely coupled. Because of this, a team can build a proof-of-concept with new ideas or services and run them in parallel on production workload without interfering or harming the production system.
  • Lowering costs. Because the operational overhead of running multiple resources is done by AWS (“No undifferentiated heavy lifting”), we just need to look at the service pricing. Most of the price schemes at AWS are supporting the stacks. For databases, you either pay for throughput (Amazon DynamoDB) or per instance (Amazon RDS, etc.). On the throughput level, it’s simple as you just split the throughput you did on one table to several tables without any overhead. On the instance level, the pricing is linear so that an r4.xlarge is half the price of an r4.2xlarge. So why not run two r4.xlarge and split the workload?
  • Designing for resilience. This approach also helps your architecture to be more reliable and resilient by default. As the different stacks are independent from each other, the scaling is much more granular. Scaling on larger systems is often provided with a higher “security buffer,” and failures (hardware, software, fat fingers, etc.) only happen on a small part of the whole system.
  • Taking ownership. A nice side effect we’re seeing now as we use this methodology is the positive effect on ownership and responsibility for our teams. Because of those stacks, it is easier to pinpoint and fix issues but also to be transparent and clear on who is responsible for which stack.

Benefits demand efforts, even with the right tool for the job

Every approach has its downsides. Here, it is obviously the additional development and architecture effort that needs to be taken to build such systems.

Therefore, we decided to always have the goal of a perfect system with independent stacks and reliable and loosely coupled processes between them in our mind. In reality, we sometimes break our own rules and cheat here and there. Even if we do, to have this approach helps us to build better systems and at least know exactly at what point we take a risk of losing the benefits. I hope the explanation and insights here help you to pick the right tool for the job.

Amazon RDS Update – Console Update, RDS Recommendations, Performance Insights, M5 Instances, MySQL 8, MariaDB 10.3, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-rds-update-console-update-rds-recommendations-performance-insights-m5-instances-mysql-8-mariadb-10-3-and-more/

It is time for a quick Amazon RDS update. I’ve got lots of news to share:

Console Update – The RDS Console has a fresh, new look.

RDS Recommendations – You now get recommendations that will help you to configure your database instances per our best practices.

Performance Insights for MySQL – You can peer deep inside of MySQL and understand more about how your queries are processed.

M5 Instances – You can now use MySQL and MariaDB on M5 instances.

MySQL 8.0 – You can now use MySQL 8.0 in production form.

MariaDB 10.3 – You can now use MariaDB 10.3 in production form.

Let’s take a closer look…

Console Update
The RDS Console took on a fresh, new look earlier this year. We made it available to you in preview form during development, and it is now the standard experience for all AWS users. You can see an overview of your RDS resources at a glance, create a new database, access documentation, and more, all from the home page:

You also get direct access to Performance Insights and to the new RDS Recommendations.

RDS Recommendations
We want to make it easy for you to take our best practices in to account when you configure your RDS database instances, even as those practices improve. The new RDS Recommendations feature will periodically check your configuration, usage, and performance data and display recommended changes and improvements, focusing on performance, stability, and security. It works with all of the database engines, and is very easy to use. Open the RDS Console and click Recommendations to get started:

I can see all of the recommendations at a glance:

I can open a recommendation to learn more:

I have four options that I can take with respect to this recommendation:

Fix Immediately – I select some database instances and click Apply now.

Fix Later – I select some database instances and click Schedule for the next maintenance window.

Dismiss – I select some database instances and click Dismiss to indicate that I do not want to make any changes, and to acknowledge that I have seen the recommendation.

Defer – If I do nothing, the recommendations remain active and I can revisit them at another time.

Other recommendations may include other options, or might require me to take some other actions. For example, the procedure for enabling encryption depends on the database engine:

RDS Recommendations are available today at no charge in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Sydney), and Asia Pacific (Singapore) Regions. We plan to add additional recommendations over time, and also expect to make the recommendations available via an API.

Performance Insights for MySQL
I can now peek inside of MySQL to see which queries, hosts, and users are consuming the most time, and why:

You can identify expensive SQL queries and other bottlenecks with a couple of clicks, looking back across the timeframe of your choice: an hour, a day, a week, or even longer.

This feature was first made available for PostgreSQL (both RDS and Aurora) and is now available for MySQL (again, both RDS and Aurora). To learn more, read Using Amazon RDS Performance Insights.

M5 Instances
The M5 instances deliver improved price/performance compared to M4 instances, and offer up to 10 Gbps of dedicated network bandwidth for database storage.

You can now launch M5 instances (including the new high-end m5.24xlarge) when using RDS for MySQL and RDS for MariaDB. You can scale up to these new instance types by modifying your existing DB instances:

MySQL 8
Version 8 of MySQL is now available on Amazon RDS. This version of MySQL offers better InnoDB performance, JSON improvements, better GIS support (new spatial datatypes, indexes, and functions), common table expressions to reduce query complexity, window functions, atomic DDLs for faster online schema modification, and much more (read the documentation to learn more).

MariaDB 10.3
Version 10.3 of MariaDB is now available on Amazon RDS. This version of MariaDB includes a new temporal data processing feature, improved Oracle compatibility, invisible columns, performance enhancements including instant ADD COLUMN operations & fast-fail DDL operations, and much more (read the documentation for a detailed list).

Available Now
All of the new features, engines, and instance types are available now and you can start using them today!

Jeff;

 

 

How to map out your migration of Oracle PeopleSoft to AWS

Post Syndicated from Ashok Shanmuga Sundaram original https://aws.amazon.com/blogs/architecture/how-to-map-out-your-migration-of-oracle-peoplesoft-to-aws/

Oracle PeopleSoft Enterprise is a widely used enterprise resource planning (ERP) application. Customers run production deployments of various PeopleSoft applications on AWS, including PeopleSoft Human Capital Management (HCM), Financials and Supply Chain Management (FSCM), Interactive Hub (IAH), and Customer Relationship Management (CRM).

We published a whitepaper on Best Practices for Running Oracle PeopleSoft on AWS in December 2017. It provides architectural guidance and outlines best practices for high availability, security, scalability, and disaster recovery for running Oracle PeopleSoft applications on AWS.

It also covers highly available, scalable, and cost-effective multi-region reference architectures for deploying PeopleSoft applications on AWS, like the one illustrated below.

While migrating your Oracle PeopleSoft applications to AWS, here are some things to keep in mind:

  • Multi-AZ deployments – Deploy your PeopleSoft servers and database across multiple Availability Zones (AZs) for high availability. AWS AZs allow you to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
  • Use Amazon Relational Database Service (Amazon RDS) to deploy your PeopleSoft databaseAmazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, allowing you to focus on your applications and business. Deploying an RDS for Oracle Database in multiple AZs simplifies creating a highly available architecture because you’ll have built-in support for automated failover from your primary database to a synchronously replicated secondary database in an alternative AZ.
  • Migration of large databases – Migrating large databases to Amazon RDS within a small downtime window requires careful planning:
    • We recommend that you take a point-in-time export of your database, transfer it to AWS, import it into Amazon RDS, and then apply the delta changes from on-premises.
    • Use AWS Direct Connect or AWS Snowball to transfer the export dump to AWS.
    • Use AWS Database Migration Service to apply the delta changes and sync the on-premises database with the Amazon RDS instance.
  • AWS Infrastructure Event Management (IEM) – Take advantage of AWS IEM to mitigate risks and help ensure a smooth migration. IEM is a highly focused engagement where AWS experts provide you with architectural and operational guidance, assist you in reviewing and fine-tuning your migration plan, and provide real-time support for your migration.
  • Cost optimization – There are a number of ways you can optimize your costs on AWS, including:
    • Use reserved instances for environments that are running most of the time, like production environments. A Reserved Instance is an EC2 offering that provides you with a significant discount (up to 75%) on EC2 usage compared to On-Demand pricing when you commit to a one-year or three-year term.
    • Shut down resources that are not in use. For example, development and test environments are typically used for only eight hours a day during the work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40 hours vs. 168 hours). Use the AWS Instance Scheduler to automatically start and stop your Amazon EC2 and Amazon RDS instances based on a schedule.

The Configuring Amazon RDS as an Oracle PeopleSoft Database whitepaper has detailed instructions on configuring a backend Amazon RDS database for your Oracle PeopleSoft deployment on AWS. After you read the whitepaper, I recommend these other resources as your next step:

  • For a real-world case study on migrating a large Oracle database to AWS, check out this blog post about how AFG migrated their mission-critical Oracle Siebel CRM system running on Oracle Exadata on-premises to Amazon RDS for Oracle.
  • For more information on running Oracle Enterprise Solutions on AWS, check out this re:Invent 2017 video.
  • You can find more Oracle on AWS resources here and here.

About the author

Ashok Shanmuga Sundaram is a partner solutions architect with the Global System Integrator (GSI) team at Amazon Web Services. He works with the GSIs to provide guidance on enterprise cloud adoption, migration and strategy.

In the Works – Amazon RDS on VMware

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-amazon-rds-on-vmware/

Database administrators spend a lot of time provisioning hardware, installing and patching operating systems and databases, and managing backups. All of this undifferentiated heavy lifting keeps the lights on but often takes time away from higher-level efforts that have a higher return on investment. For many years, Amazon Relational Database Service (RDS) has taken care of this heavy lifting, and simplified the use of MariaDB, Microsoft SQL Server, MySQL, Oracle, and PostgreSQL in the cloud. AWS customers love the high availability, scalability, durability, and management simplicity of RDS.

Earlier this week we announced that we are working to bring the benefits of RDS to on-premises virtualized environments, to hybrid environments, and to VMware Cloud on AWS. You will be able to provision new on-premises database instances in minutes with a couple of clicks, make backups to on-premises or cloud-based storage, and to establish read replicas running on-premises or in the AWS cloud. Amazon RDS on vSphere will take care of OS and database patching, and will let you migrate your on-premises databases to AWS with a single click.

Inside Amazon RDS on VMware
I sat down with the development team to learn more about Amazon RDS on VMware. Here’s a quick summary of what I learned:

Architecture – Your vSphere environment is effectively a private, local AWS Availability Zone (AZ), connected to AWS across a VPN tunnel running over the Internet or a AWS Direct Connect connection. You will be able to create Multi-AZ instances of RDS that span vSphere clusters.

Backups – Backups can make use of local (on-premises storage) or AWS, and are subject to both local and AWS retention policies. Backups are portable, and can be used to create an in-cloud Amazon RDS instance. Point in Time Recovery (PITR) will be supported, as long as you restore to the same environment.

Management – You will be able to manage your Amazon RDS on vSphere instances from the Amazon RDS Console and from vCenter. You will also be able to use the Amazon RDS CLI and the Amazon RDS APIs.

Regions – We’ll be launching in the US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Frankfurt) Regions, with more to come over time.

Register for the Preview
If you would like to be among the first to take Amazon RDS on VMware for a spin, you can Register for the Preview. I’ll have more information (and a hands-on blog post) in the near future, so stay tuned!

Jeff;

 

Migrate RDBMS or On-Premise data to EMR Hive, S3, and Amazon Redshift using EMR – Sqoop

Post Syndicated from Nivas Shankar original https://aws.amazon.com/blogs/big-data/migrate-rdbms-or-on-premise-data-to-emr-hive-s3-and-amazon-redshift-using-emr-sqoop/

This blog post shows how our customers can benefit by using the Apache Sqoop tool. This tool is designed to transfer and import data from a Relational Database Management System (RDBMS) into AWS – EMR Hadoop Distributed File System (HDFS), transform the data in Hadoop, and then export the data into a Data Warehouse (e.g. in Hive or Amazon Redshift).

To demonstrate the Sqoop tool, this post uses Amazon RDS for MySQL as a source and imports data in the following three scenarios:

  • Scenario 1AWS EMR (HDFS -> Hive and HDFS)
  • Scenario 2Amazon S3 (EMFRS), and then to EMR-Hive
  • Scenario 3 — S3 (EMFRS), and then to Redshift

 

These scenarios help customers initiate the data transfer simultaneously, so that the transfer can run more expediently and cost efficient than a traditional ETL tool. Once the script is developed, customers can reuse it to transfer a variety of RDBMS data sources into EMR-Hadoop. Examples of these data sources are PostgreSQL, SQL Server, Oracle, and MariaDB.

We can also simulate the same steps for an on-premise RDBMS. This requires us to have the correct JDBC driver installed, and a network connection set up between the Corporate Data Center and the AWS Cloud environment. In this scenario, consider using either the AWS Direct Connect or AWS Snowball methods, based upon the data load volume and network constraints.

Prerequisites

To complete the procedures in this post, you need to perform the following tasks.

Step 1 — Launch an RDS Instance

By using the AWS Management Console or AWS CLI commands, launch MySQL instances with the desired capacity. The following example use the T2.Medium class with default settings.

To call the right services, copy the endpoint and use the following JDBC connection string exactly as shown. This example uses the US East (N. Virginia) us-east-1 AWS Region.

jdbc:mysql:// <<Connection string>>.us-east-1.rds.amazonaws.com.us-east-1.rds.amazonaws.com:3306/sqoopblog

Step 2 — Test the connection and load sample data into RDS – MySQL

First, I used open source data sample from this location: https://bulkdata.uspto.gov/data/trademark/casefile/economics/2016/

Second, I loaded the following two tables:

Third, I used MySQL Workbench tool to load sample tables and the Import/Export wizard to load data.  This loads data automatically and creates the table structure.

Download Steps:

The following steps can help download the MySQL Database Engine and load above mentioned data source into tables: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html#CHAP_GettingStarted.Connecting.MySQL

I used the following instructions on a Mac:

Step A: Install Homebrew Step B: Install MySQL
Homebrew is open source software package management system; At the time of this blog post, Homebrew has MySQL version 5.7.15 as the default formula in its main repository.Enter the following command: $ brew info MySQL
To install Homebrew, open terminal, and enter:Expected output: MySQL: stable 8.0.11 (bottled)
$ /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”To install MySQL, enter: $ brew install MySQL
Homebrew then downloads and installs command line tools for Xcode 8.0 as part of the installation process)

Fourth, when the download is complete, provide the connection string, port, SID to the connection parameter. In main console, click MySQL connections (+) sign à new connection window and provide connection parameter Name, hostname – RDS endpoint, port, username and password.

Step 3 — Launch EMR Cluster

Open the EMR console, choose Advanced option, and launch the cluster with the following options set:

Step 4 — Test the SSH access and install MySQL-connector-java-version-bin.jar in the EMR folder

a. From the security groups for Master – click link and edit inbound rule to allow your PC or laptop IP to access the Master cluster.

b. Download the MySQL JDBC driver to your local PC from the following location: http://www.mysql.com/downloads/connector/j/5.1.html

c. Unzip the folder and copy the latest version of MySQL Connector available. (In my example, the version I use is MySQL-connector-java-5.1.46-bin.jar file).

d. Copy the file to the /var/lib/sqoop/ directory EMR master cluster. (Note: Because EMR Master doesn’t allow public access to the master node, I had to do a manual download from a local PC. I then used FileZila (Cross platform FTP application) to push the file.)

e. From your terminal, SSH to Master cluster, navigate to the /usr/lib/sqoop directory and copy this JAR file.

Note: This driver copy can be automated by using a bootstrap script to copy the driver file into an S3 path, and then transferring it into a master node. An example script would be:

aws s3 cp s3://mybucket/myfilefolder/ MySQL-connector-java-5.1.46-bin.jar  /usr/lib/sqoop/  

Or, with temporary internet access to download file directly into Master node, copy code below:

wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
tar -xvzf mysql-connector-java-5.1.46.tar.gz
sudo cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar /usr/lib/sqoop/ 

In the Master node directory /usr/lib/sqoop, it should look like below.

  1. Before you begin working with EMR, you need at least two AWS Identity and Access Management (IAM) service roles with sufficient permissions to access the resources in your account.
  2. Amazon RDS and EMR Master and Slave clusters must have access to connect and then initiate the importing and exporting of data from MySQL RDS instances. For example, I am editing the RDS MySQL instance security group to allow an incoming connection from the EMR nodes – the Master security group and Slave Security group.

Step 5 — Test the connection to MySQL RDS from EMR

After you are logged in, run the following command in the EMR master cluster to validate your connection. It also checks the MySQL RDS login and runs the sample query to check table record count.

sqoop eval --connect "jdbc:mysql:// <<connection String>>.us-east-1.rds.amazonaws.com:3306/sqoopblog"  --query "	
select count(*) from sqoopblog.event" --username admin -P

Note: This record count in the previous sample query should match with MySQL tables, as shown in the following example:

Import data into EMR – bulk load

To import data into EMR, you must first import the full data as a text file, by using the following query:

sqoop import --connect "jdbc:mysql://<<ConnectionString>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --table event --target-dir /user/hadoop/EVENT --username admin -P -m 1

After the import completes, validate the extract file in respective hadoop directory location.

As shown in the previous example, the original table was not partitioned. Hence, it is extracted as one file and imported into the Hadoop folder. If this had been a larger table, it would have caused performance issues.

To address this issue, I show how performance increases if we select a partition table and use the direct method to export faster and more efficiently. I updated the event table, EVENTS_PARTITION, with the EVENT_DT column as the KEY Partition. I then copied the original table data into this table.  In addition, I used the direct method to take advantage of utilizing MySQL partitions to optimize the efficiency of simultaneous data transfer.

Copy data and run stats.

Run the following query in MySQL Workbench to copy data and run stats:

insert into sqoopblog.event_partition select * from sqoopblog.event

analyze table sqoopblog.event_partition

After running the query in MySQL workbench, run the following Sqoop command in the EMR master node:

sqoop import --connect "jdbc:mysql:// <<ConnectionString>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --table event_partition --target-dir /user/hadoop/EVENTSPARTITION --username admin -P --split-by event_dt

This example shows the performance improvement for the same command with the added argument option, which is a partitioned table.  It also shows the data file split into four parts. Number of map reduce tasks automatically creates 4 based on table partition stats.

We can also use the m 10 argument to increase the map tasks, which equals to the number of input splits

sqoop import --connect "jdbc:mysql:// <<ConnectionString>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --table event_partition --target-dir /user/hadoop/EVENTSPARTITION --username admin -P --split-by event_dt -m 10

Note: You can also split more data extract files during the import process by increasing the map reduce engine argument ( -m <<desired #> , as shown in the above sample code. Make sure that the extract files align with partition distribution, otherwise the output files will be out of order.

Consider the following additional options, if required to import selective columns.

In the following example, add the – COLUMN argument to the selective field.

sqoop import --connect "jdbc:mysql://<<connection String>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --table event_partition --columns "EVENT_CD,SERIAL_NO,EVENT_DT" --target-dir /user/hadoop/EVENTSSELECTED --split-by EVENT_DT --username admin -P -m 5

For scenario 2, we will import the table data file into S3 bucket. Before you do, make sure that the EMR-EC2 instance group has added security to the S3 bucket. Run the following command in the EMR master cluster:

sqoop import --connect "jdbc:mysql:// <<connection String>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --table event_partition --target-dir s3://nivasblog/sqoopblog/ --username admin -P -m 1 --fields-terminated-by '\t' --lines-terminated-by '\n' --as-textfile 

Import as Hive table – Full Load

Now, let’s try creating a hive table directly from the Sqoop command. This is a more efficient way to create hive tables dynamically, and we can later alter this table as an external table for any additional requirements. With this method, customers can save time creating and transforming data into hive through an automated approach.

sqoop import --connect "jdbc:mysql://<<connection String>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --table event_partition  --hive-import --create-hive-table --hive-table HIVEIMPORT1 --delete-target-dir --target-dir /user/hadoop/EVENTSHIVE1 --split-by EVENT_DT --hive-overwrite -m 4

Now, let’s try a direct method to see how significantly the load performance and import time improves.

sqoop import --connect "jdbc:mysql://<<connection string>>us-east-1.rds.amazonaws.com:3306/sqoopblog"  --username admin -P --table event_partition  --hive-import --create-hive-table --hive-table HIVEIMPORT2 --delete-target-dir --target-dir /user/hadoop/nivas/EVENTSHIVE2 --split-by EVENT_DT --hive-overwrite --direct

Following are additional optional to consider.

In the following example, add the COLUMN argument to the selective field and import into EMR as hive table.

sqoop import --connect "jdbc:mysql://<<connection string>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --table event_partition  --columns "event_cd,serial_no,event_dt" --hive-import --create-hive-table --hive-table HIVESELECTED --delete-target-dir --target-dir /user/hadoop/nivas/EVENTSELECTED --split-by EVENT_DT --hive-overwrite –direct

Perform a free-form query and import into EMR as a hive table.

sqoop import --connect "jdbc:mysql:// <<connection string>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --query "select a.serial_no, a.event_cd, a.event_dt, b.us_class_cd, b.class_id from event_partition a, us_class b where a.serial_no=b.serial_no AND \$CONDITIONS" --hive-import --create-hive-table --hive-table HIVEQUERIED --delete-target-dir --target-dir /user/hadoop/EVENTSQUERIED -m 1 --hive-overwrite -direct

– For scenario 2, create a hive table manually from the S3 location.  The following sample creates an external table from the S3 location. Run the select statement to check data counts.

Import note: Using Sqoop version 1.4.7, you can directly create hive tables by using scripts, as shown in the following sample code.  This feature is supported in EMR 5.15.0.

sqoop import --connect "jdbc:mysql://<<connection string>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --table event_partition  --hive-import --target-dir s3n://nivasblog/sqoopblog/1/ --create-hive-table --hive-table s3table --split-by EVENT_DT --fields-terminated-by '\t' --lines-terminated-by '\n' --as-textfile

For the previous code samples, validate in Hive or Hue, and confirm the table records.

Import the full schema table into Hive.

Note: Create a Hive database in Hue or Hive first, and then run the following command in the EMR master cluster.

sqoop import-all-tables --connect "jdbc:mysql://<<Connection string>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --hive-database sqoopimport --create-hive-table --hive-import --compression-codec=snappy --hive-overwrite –direct

Import as Hive table – Incremental Load

Now, let’s try loading into Hive a sample incremental data feed for the partition table with the event date as the key. Use the following Sqoop command on an incremental basis.

In addition to initial data in table called EVENT_BASETABLE. I loaded the incremental data into  EVENT_BASETABLE table. Let’s follow below steps and command to do incremental updates by sqoop, and import into Hive.

sqoop import --connect "jdbc:mysql:// <<Connection string>>.us-east-1.rds.amazonaws.com:3306/sqoopblog" --username admin -P --table event_partition --target-dir /user/hadoop/INCRTAB --split-by event_dt -m 1 --check-column event_dt --incremental lastmodified --last-value '2018-06-29'

Once the incremental extracts are loaded into the Hadoop directory, you can create temporary, or incremental, tables in Hive and insert them into the main tables.

CREATE TABLE incremental_table (event_cd text, event_dt date, event_seq int(11),event_type_cd text,serial_no int(11)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION '/user/hadoop/INCRTAB'

Insert into default.hiveimport1 select * from default.incremental_table

Alternatively, you can also perform the –query argument to do the incremental operation by joining various tables and condition arguments, and then inserting them into the main table.

--query "select * from EVENT_BASETABLE where modified_date > {last_import_date} AND $CONDITIONS"

All of these steps have been created as a Sqoop job to automate the flow.

Export data to Redshift

Now that data is imported into EMR- HDFS, S3 data store, let’s see how to use the Sqoop command to export data back into the Datawarehouse layer. In this case, we will use the Redshift cluster and demonstrate with an example.

Download the following JDBC API that our SQL client tool or application uses. If you’re not sure, download the latest version of the JDBC 4.2 API driver.

The class name for this driver is 1.2.15.1025/RedshiftJDBC41-1.2.15.1025.jar

com.amazon.redshift.jdbc42.Driver.

Copy this JAR file into the EMR master cluster node. SSH to Master cluster, navigate to /usr/lib/sqoop directory and copy this JAR file.

Note: Because EMR Master doesn’t allow public access to the master node, I had to do a manual download from a local PC. Also, I used FileZila to push the file.

Log in to the EMR master cluster and run this Sqoop command to copy the S3 data file into the Redshift cluster.

Launch the Redshift cluster. This example uses ds2.xLarge(Storage Node).

After Redshift launches, and the security group is associated with the EMR cluster to allow a connection, run the Sqoop command in EMR master node. This exports the data from the S3 location (shown previously in the Code 6 command) into the Redshift cluster as a table.

I created a table structure in Redshift as shown in the following example.

DROP TABLE IF EXISTS sqoopexport CASCADE;

CREATE TABLE sqoopexport
(
   event_cd       varchar(25)   NOT NULL,
   event_dt       varchar(25),
   event_seq      varchar(25)   NOT NULL,
   event_type_cd  varchar(25)   NOT NULL,
   serial_no      varchar(25)   NOT NULL
);

COMMIT;

When the table is created, run the following command to import data into the Redshift table.

sqoop export --connect jdbc:redshift://<<Connection String>>.us-east-1.redshift.amazonaws.com:5439/sqoopexport --table sqoopexport --export-dir s3://nivastest1/events/ --driver com.amazon.redshift.jdbc42.Driver --username admin -P --input-fields-terminated-by '\t'

This command inserts the data records into the table.

For more information, see Loading Data from Amazon EMR.

For information about how to copy data back into RDBMS, see Use Sqoop to Transfer Data from Amazon EMR to Amazon RDS.

Summary

You’ve learned how to use Apache Sqoop on EMR to transfer data from RDBMS to an EMR cluster. You created an EMR cluster with Sqoop, processed a sample dataset on Hive, built sample tables in MySQL-RDS, and then used Sqoop to import the data into EMR. You also created a Redshift cluster and exported data from S3 using Sqoop.

You proved that Sqoop can perform data transfer in parallel, so execution is quick and more cost effective. You also simplified ETL data processing from the source to a target layer.

The advantages of Sqoop are:

  • Fast and parallel data transfer into EMR — taking advantage of EMR compute instances to do an import process by removing external tool dependencies.
  • An import process by using a direct-to-MySQL expediting query and pull performance into EMR Hadoop and S3.
  • An Import Sequential dataset from Source system (Provided tables have primary keys) maintained simplifying growing need to migrate on-premised RDBMS data without re-architect

Sqoop pain points include.

  • Automation by developer/Operations team community. This requires automating through workflow/job method either using Airflow support for Sqoop or other tools.
  • For those tables doesn’t have primary keys and maintains legacy tables dependencies, will have challenges importing data incrementally. Recommendation is to do one-time migration through Sqoop bulk transfer and re-architect your source ingestion mechanism.
  • Import/Export follows JDBC connection and doesn’t support other methods like ODBC or API calls.

If you have questions or suggestions, please comment below.

 


Additional Reading

If you found this post useful, be sure to check out Use Sqoop to transfer data from Amazon EMR to Amazon RDS and Seven tips for using S3DistCp on AMazon EMR to move data efficiently between HDFS and Amazon S3.

 


About the Author

Nivas Shankar is a Senior Big Data Consultant at Amazon Web Services. He helps and works closely with enterprise customers building big data applications on the AWS platform. He holds a Masters degree in physics and is highly passionate about theoretical physics concepts. He enjoys spending time with his wife and two adorable kids. In his spare time, he takes his kids to tennis and football practice.