Tag Archives: mysql

Introducing the new Serverless LAMP stack

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/introducing-the-new-serverless-lamp-stack/

This is the first in a series of posts for PHP developers. The series will explain how to use serverless technologies with PHP. It covers the available tools, frameworks and strategies to build serverless applications, and why now is the right time to start.

In future posts, I demonstrate how to use AWS Lambda for web applications built with PHP frameworks such as Laravel and Symphony. I show how to move from using Lambda as a replacement for web hosting functionality to a decoupled, event-driven approach. I cover how to combine multiple Lambda functions of minimal scope with other serverless services to create performant scalable microservices.

In this post, you learn how to use PHP with Lambda via the custom runtime API. Visit this GitHub repository for the sample code.

The Serverless LAMP stack

The Serverless LAMP stack

The challenges with traditional PHP applications

Scalability is an inherent challenge with the traditional LAMP stack. A scalable application is one that can handle highly variable levels of traffic. PHP applications are often scaled horizontally, by adding more web servers as needed. This is managed via a load balancer, which directs requests to various web servers. Each additional server brings additional overhead with networking, administration, storage capacity, backup and restore systems, and an update to asset management inventories. Additionally, each horizontally scaled server runs independently. This can result in configuration synchronization challenges.

Horizontal scaling with traditional LAMP stack applications.

Horizontal scaling with traditional LAMP stack applications.

New storage challenges arise as each server has its own disks and filesystem, often requiring developers to add a mechanism to handle user sessions. Using serverless technologies, scalability is managed for the developer.

If traffic surges, the services scale to meet the demand without having to deploy additional servers. This allows applications to quickly transition from prototype to production.

The serverless LAMP architecture

A traditional web application can be split in to two components:

  • The static assets (media files, css, js)
  • The dynamic application (PHP, MySQL)

A serverless approach to serving these two components is illustrated below:

The serverless LAMP stack

The serverless LAMP stack

All requests for dynamic content (anything excluding /assets/*) are forwarded to Amazon API Gateway. This is a fully managed service for creating, publishing, and securing APIs at any scale. It acts as the “front door” to the PHP application, routing requests downstream to Lambda functions. The Lambda functions contain the business logic and interaction with the MySQL database. You can pass the input to the Lambda function as any combination of request headers, path variables, query string parameters, and body.

Notable AWS features for PHP developers

Amazon Aurora Serverless

During re:Invent 2017, AWS announced Aurora Serverless, an on-demand serverless relational database with a pay-per-use cost model. This manages the responsibility of relational database provisioning and scaling for the developer.

Lambda Layers and custom runtime API.

At re:Invent 2018, AWS announced two new Lambda features. These enable developers to build custom runtimes, and share and manage common code between functions.

Improved VPC networking for Lambda functions.

In September 2019, AWS announced significant improvements in cold starts for Lambda functions inside a VPC. This results in faster function startup performance and more efficient usage of elastic network interfaces, reducing VPC cold starts.

Amazon RDS Proxy

At re:Invent 2019, AWS announced the launch of a new service called Amazon RDS Proxy. A fully managed database proxy that sits between your application and your relational database. It efficiently pools and shares database connections to improve the scalability of your application.

 

Significant moments in the serverless LAMP stack timeline

Significant moments in the serverless LAMP stack timeline

Combining these services, it is now it is possible to build secure and performant scalable serverless applications with PHP and relational databases.

Custom runtime API

The custom runtime API is a simple interface to enable Lambda function execution in any programming language or a specific language version. The custom runtime API requires an executable text file called a bootstrap. The bootstrap file is responsible for the communication between your code and the Lambda environment.

To create a custom runtime, you must first compile the required version of PHP in an Amazon Linux environment compatible with the Lambda execution environment .To do this, follow these step-by-step instructions.

The bootstrap file

The file below is an example of a basic PHP bootstrap file. This example is for explanation purposes as there is no error handling or abstractions taking place. To ensure that you handle exceptions appropriately, consult the runtime API documentation as you build production custom runtimes.

#!/opt/bin/php
<?PHP

// This invokes Composer's autoloader so that we'll be able to use Guzzle and any other 3rd party libraries we need.
require __DIR__ . '/vendor/autoload.php;

// This is the request processing loop. Barring unrecoverable failure, this loop runs until the environment shuts down.
do {
    // Ask the runtime API for a request to handle.
    $request = getNextRequest();

    // Obtain the function name from the _HANDLER environment variable and ensure the function's code is available.
    $handlerFunction = array_slice(explode('.', $_ENV['_HANDLER']), -1)[0];
    require_once $_ENV['LAMBDA_TASK_ROOT'] . '/src/' . $handlerFunction . '.php;

    // Execute the desired function and obtain the response.
    $response = $handlerFunction($request['payload']);

    // Submit the response back to the runtime API.
    sendResponse($request['invocationId'], $response);
} while (true);

function getNextRequest()
{
    $client = new \GuzzleHttp\Client();
    $response = $client->get('http://' . $_ENV['AWS_LAMBDA_RUNTIME_API'] . '/2018-06-01/runtime/invocation/next');

    return [
      'invocationId' => $response->getHeader('Lambda-Runtime-Aws-Request-Id')[0],
      'payload' => json_decode((string) $response->getBody(), true)
    ];
}

function sendResponse($invocationId, $response)
{
    $client = new \GuzzleHttp\Client();
    $client->post(
    'http://' . $_ENV['AWS_LAMBDA_RUNTIME_API'] . '/2018-06-01/runtime/invocation/' . $invocationId . '/response',
       ['body' => $response]
    );
}

The #!/opt/bin/php declaration instructs the program loader to use the PHP binary compiled for Amazon Linux.

The bootstrap file performs the following tasks, in an operational loop:

  1. Obtains the next request.
  2. Executes the code to handle the request.
  3. Returns a response.

Follow these steps to package the bootstrap and compiled PHP binary together into a `runtime.zip`.

Libraries and dependencies

The runtime bootstrap uses an HTTP-based local interface. This retrieves the event payload for each Lambda function invocation and returns back the response from the function. This bootstrap file uses Guzzle, a popular PHP HTTP client, to make requests to the custom runtime API. The Guzzle package is installed using Composer package manager. Installing packages in this way creates a mechanism for incorporating additional libraries and dependencies as the application evolves.

Follow these steps to create and package the runtime dependencies into a `vendors.zip` binary.

Lambda Layers provides a mechanism to centrally manage code and data that is shared across multiple functions. When a Lambda function is configured with a layer, the layer’s contents are put into the /opt directory of the execution environment. You can include a custom runtime in your function’s deployment package, or as a layer. Lambda executes the bootstrap file in your deployment package, if available. If not, Lambda looks for a runtime in the function’s layers. There are several open source PHP runtime layers available today, most notably:

The following steps show how to publish the `runtime.zip` and `vendor.zip` binaries created earlier into Lambda layers and use them to build a Lambda function with a PHP runtime:

  1.  Use the AWS Command Line Interface (CLI) to publish layers from the binaries created earlier
    aws lambda publish-layer-version \
        --layer-name PHP-example-runtime \
        --zip-file fileb://runtime.zip \
        --region eu-west-1

    aws lambda publish-layer-version \
        --layer-name PHP-example-vendor \
        --zip-file fileb://vendors.zip \
        --region eu-west-1

  2. Make note of each command’s LayerVersionArn output value (for example arn:aws:lambda:eu-west-1:XXXXXXXXXXXX:layer:PHP-example-runtime:1), which you’ll need for the next steps.

Creating a PHP Lambda function

You can create a Lambda function via the AWS CLI, the AWS Serverless Application Model (SAM), or directly in the AWS Management Console. To do this using the console:

  1. Navigate to the Lambda section  of the AWS Management Console and choose Create function.
  2. Enter “PHPHello” into the Function name field, and choose Provide your own bootstrap in the Runtime field. Then choose Create function.
  3. Right click on bootstrap.sample and choose Delete.
  4. Choose the layers icon and choose Add a layer.
  5. Choose Provide a layer version ARN, then copy and paste the ARN of the custom runtime layer from in step 1 into the Layer version ARN field.
  6. Repeat steps 6 and 7 for the vendor ARN.
  7. In the Function Code section, create a new folder called src and inside it create a new file called index.php.
  8. Paste the following code into index.php:
    //index function
    function index($data)
    {
     return "Hello, ". $data['name'];
    }
    
  9. Insert “index” into the Handler input field. This instructs Lambda to run the index function when invoked.
  10. Choose Save at the top right of the page.
  11. Choose Test at the top right of the page, and  enter “PHPTest” into the Event name field. Enter the following into the event payload field and then choose Create:{ "name": "world"}
  12. Choose Test and Select the dropdown next to the execution result heading.

You can see that the event payload “name” value is used to return “hello world”. This is taken from the $data['name'] parameter provided to the Lambda function. The log output provides details about the actual duration, billed duration, and amount of memory used to execute the code.

Conclusion

This post explains how to create a Lambda function with a PHP runtime using Lambda Layers and the custom runtime API. It introduces the architecture for a serverless LAMP stack that scales with application traffic.

Lambda allows for functions with mixed runtimes to interact with each other. Now, PHP developers can join other serverless development teams focusing on shipping code. With serverless technologies, you no longer have to think about restarting webhosts, scaling or hosting.

Start building your own custom runtime for Lambda.

Enabling job accounting for HPC with AWS ParallelCluster and Amazon RDS

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/enabling-job-accounting-for-hpc-with-aws-parallelcluster-and-amazon-rds/

This post is written by Nicola Venuti, HPC Specialist SA, and contributed to by Rex Chen, Software Development Engineer.

Introduction

Accounting, reporting, and advanced analytics used for data-driven planning and decision making are key areas of focus for High Performance Computing (HPC) Administrators. In the cloud, these areas are more relevant to the costs of the services, which directly impact budgeting and forecasting of expenses. With the growth of new HPC services that perform analyses and corrective actions, you can better optimize for performance, which reduces cost.

Solution Overview

In this blog post, we walk through an easy way to collect accounting information for evert job and step executed in a cluster with job scheduling. This post uses a new feature in the latest version (2.6.0) of AWS ParallelCluster, which makes this process easier than before, and Slurm.  Accounting records are saved into a relational database for both currently executing jobs and jobs, which have already terminated.

Prerequisites

This tutorial assumes you already have a cluster in AWS ParallelCluster. If  you don’t, refer to the AWS ParallelCluster documentation, a getting started blog post, or a how-to blog post.

Solution

Choose your architecture

There are two common architectures to save job accounting information into a database:

  1. Installing and directly managing a DBMS in the master node of your cluster (or in an additional EC2 instance dedicated to it)
  2. Using a fully managed service like Amazon Relational Database Service (RDS)

While the first option might appear to be the most economical solution, it requires heavy lifting. You must install and manage the database, which is not a core part of running your HPC workloads.  Alternatively, Amazon RDS reduces this burden of installing updates, managing security patches, and properly allocating resources.  Additionally, Amazon RDS Free Tier can get you started with a managed database service in the cloud for free. Refer to the hyperlink for a list of free resources.

Amazon RDS is my preferred choice, and the following sections implement this architecture. Bear in mind, however, that the settings and the customizations required in the AWS ParallelCluster environment are the same, regardless of which architecture you prefer.

 

Set up and configure your database

Now, with your architecture determined, let’s configure it.  First, go to Amazon RDS’ console.  Select the same Region where your AWS ParallelCluster is deployed, then click on Create Database.

There are two database instances to consider: Amazon Aurora and MySQL.

Amazon Aurora has many benefits compared to MySQL. However, in this blog post, I use MySQL to show how to build your HPC accounting database within the Free-tier offering.

The following steps are the same regardless of your database choice. So, if you’re interested in one of the many features that differentiate Amazon Aurora from MySQL, feel free to use. Check out Amazon Aurora’s landing page to learn more about its benefits, such as its faster performance and cost effectiveness.

To configure your database, you must complete the following steps:

  1. Name the database
  2. Establish credential settings
  3. Select the DB instance size
  4. Identify storage type
  5. Allocate amount of storage

The following images show the settings that I chose for storage options and the “Free tier” template.  Feel free to change it accordingly to the scope and the usage you expect.

Make sure you select the corresponding VPC to wherever your “compute fleet” is deployed by AWS ParallelCluster, and wherever the Security Group of your compute fleet is selected.  You can access information for your “compute fleet” in your AWS ParallelCluster config file. The Security Group should look something like this: “parallelcluster-XXX-ComputeSecurityGroup-XYZ”.

At this stage, you can click on Create database and wait until the Database status moves from the Creating to Available in the Amazon RDS Dashboard.

The last step for this section is to grant privileges on the database.

  1. Connect to your database. Use the master node of your AWS ParallelCluster as a client.
  2. Install the MySQL client by running sudo yum install mysql on AmazonLinux and CentOS and sudo apt-get install mysql-client on Ubuntu.
  3. Connect to your MySQL RDS database using the following code: mysql --host=<your_rds_endpoint> --port=3306 -u admin -p The following screenshot shows how to find your RDS endpoint and port.

 

4. Run GRANT ALL ON `%`.* TO [email protected]`%`; to grant the required privileges.

The following code demonstrates these steps together:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 251
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> GRANT ALL ON `%`.* TO [email protected]`%`;

Note: typically this command is run as GRANT ALL ON *.* TO 'admin'@'%';   With Amazon RDS , for security reasons, this is not possible as the master account does not have access to the MySQL database. Using *.*  triggers an error. To work around this, I use the _ and % wildcards that are permitted. To look at the actual grants, you can run the following: SHOW GRANTS;

Enable Slurm Database logging

Now, your database is fully configured. The next step is to enable Slurm as a workload manager.

A few steps must occur to let Slurm log its job accounting information on an external database. The following code demonstrates the steps you must make.

  1. Add the DB configuration file, slurmdbd.conf after /opt/slurm/etc/ 
  2. Slurm’s slurm.conf file requires a few modifications. These changes are noted after the following code examples.

 

Note: You do not need to configure each and every compute node because AWS ParallelCluster installs Slurm in a shared directory. All of these nodes share this directory, and, thus the same configuration files with the master node of your cluster.

Below, you can find two example configuration files that you can use just by modifying a few parameters accordingly to your setup.

For more information about all the possible settings of configuration parameters, please refer to the official Slurm documentation, and in particular to the accounting section.

Add the DB configuration file

#
## Sample /opt/slurm/etc/slurmdbd.conf
#
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no
AuthType=auth/munge
DbdHost=ip-10-0-16-243  #YOUR_MASTER_IP_ADDRESS_OR_NAME
DbdPort=6819
DebugLevel=info
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
SlurmUser=slurm
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageUser=admin
StoragePass=password
StorageHost=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com # Endpoint from RDS console
StoragePort=3306                                                                # Port from RDS console

See below for key values that you should plug into the example configuration file:

  • DbdHost: the name of the machine where the Slurm Database Daemon is executed. This is typically the master node of your AWS ParallelCluster. You can run hostname -s on your master node to get this value.
  • DbdPort: The port number that the Slurm Database Daemon (slurmdbd) listens to for work. 6819 is the default value.
  • StorageUser: Define the user name used to connect to the database. This has been defined during the Amazon RDS configuration as shown in the second step of the previous section.
  • StoragePass: Define the password used to gain access to the database. Defined as the user name during the Amazon RDS configuration.
  • StorageHost: Define the name of the host running the database. You can find this value in the Amazon RDS console, under “Connectivity & security”.
  • StoragePort: Define the port on which the database is listening. You can find this value in the Amazon RDS console, under “Connectivity & security”. (see the screenshot below for more information).

Modify the file

Add the following lines at the end of the slurm configuration file:

#
## /opt/slurm/etc/slurm.conf
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
#
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=ip-10-0-16-243
AccountingStorageUser=admin
AccountingStoragePort=6819

Modify the following:

  • AccountingStorageHost: The hostname or address of the host where SlurmDBD executes. In our case this is again the master node of our AWS ParallelCluster, you can get this value by running hostname -s again.
  • AccountingStoragePort: The network port that SlurmDBD accepts communication on. It must be the same as DbdPort specified in /opt/slurm/etc/slurmdbd.conf
  • AccountingStorageUser: it must be the same as in /opt/slurm/etc/slurmdbd.conf (specified in the “Credential Settings” of your Amazon RDS database).

 

Restart the Slurm service and start the SlurmDB demon on the master node

 

Depending on the operating system you are running, this would look like:

  • Amazon Linux / Amazon Linux 2
[[email protected]]$ sudo /etc/init.d/slurm restart                                                                                                                                                                                                                                             
stopping slurmctld:                                        [  OK  ]
slurmctld is stopped
slurmctld is stopped
starting slurmctld:                                        [  OK  ]
[[email protected]]$ 
[[email protected]]$ sudo /opt/slurm/sbin/slurmdbd
  • CentOS7 and Ubuntu 16/18
[[email protected] ~]$ sudo /opt/slurm/sbin/slurmdbd
[[email protected] ~]$ sudo systemctl restart slurmctld

Note: even if you have jobs running, restarting the daemons will not affect them.

Check to see if your cluster is already in the Slurm Database:

/opt/slurm/bin/sacctmgr list cluster

And if it is not (see below):

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
[[email protected]]$ 

You can add it as follows:

sudo /opt/slurm/bin/sacctmgr add cluster parallelcluster

You should now see something like the following:

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+     10.0.16.243         6817  8704         1                                                                                           normal           
[[email protected]]$ 

At this stage, you should be all set with your AWS ParallelCluster accounting configured to be stored in the Amazon RDS database.

Replicate the process on multiple clusters

The same database instance can be easily used for multiple clusters to log its accounting data in. To do this, repeat the last configuration step for your clusters built using AWS ParallelCluster that you want to share the same database.

The additional steps to follow are:

  • Ensure that all the clusters are in the same VPC (or, if you prefer to use multiple VPCs, you can choose to set up VPC-Peering)
  • Add the SecurityGroup of your new compute fleets (“parallelcluster-XXX-ComputeSecurityGroup-XYZ”) to your RDS database
  • Change the cluster name parameter at the very top of the file. This is in addition to the slurm configuration file ( /opt/slurm/etc/slurm.conf) editing explained prior.  By default, your cluster is called “parallelcluster.” You may want to change that to clearly identify other clusters using the same database. For instance: ClusterName=parallelcluster2

Once these additional steps are complete, you can run /opt/slurm/bin/sacctmgr list cluster  again. Now, you should see two (or multiple) clusters:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+                            0  8704         1                                                                                           normal           
parallelc+     10.0.16.129         6817  8704         1                                                                                           normal           
[[email protected] ]#   

If you want to see the full name of your clusters, run the following:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster format=cluster%30
                       Cluster 
------------------------------ 
               parallelcluster 
              parallelcluster2 
[[email protected] ]# 

Note: If you check the Slurm logs (under /var/log/slurm*), you may see this error:

error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout

This error refers to default parameters that Amazon RDS sets for you on your MySQL database. You can change them by setting new “group parameters” as explained in the official documentation and in this support article. Please also note that the innodb_buffer_pool_size is related to the amount of memory available on your instance, so you may want to use a different instance type with higher memory to avoid this warning.

Run your first job and check the accounting

Now that the application is installed and configured, you can test it! Submit a job to Slurm, query your database, and check your job accounting information.

If you are using a brand new cluster, test it with a simple hostname job as follows:

[[email protected]]$ sbatch -N2 <<EOF
> #!/bin/sh
> srun hostname |sort
> srun sleep 10
> EOF
Submitted batch job 31
[[email protected]]$

Immediately after you have submitted the job, you should see it with a state of “pending”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2    PENDING      0:0 
[[email protected]]$ 

And, after a while the job should be “completed”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
38.batch           batch                                1  COMPLETED      0:0 
38.0           hostname                                2  COMPLETED      0:0 
[[email protected]]$

Now that you know your cluster works, you can build complex queries using sacct. See few examples below, and refer to the official documentation for more details:

[[email protected]]$ sacct --format=jobid,elapsed,ncpus,ntasks,state
       JobID    Elapsed      NCPUS   NTasks      State 
------------ ---------- ---------- -------- ---------- 
38             00:00:00          2           COMPLETED 
38.batch       00:00:00          1        1  COMPLETED 
38.0           00:00:00          2        2  COMPLETED 
39             00:00:00          2           COMPLETED 
39.batch       00:00:00          1        1  COMPLETED 
39.0           00:00:00          2        2  COMPLETED 
40             00:00:10          2           COMPLETED 
40.batch       00:00:10          1        1  COMPLETED 
40.0           00:00:00          2        2  COMPLETED 
40.1           00:00:10          2        2  COMPLETED 
[[email protected]]$ sacct --allocations
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
39               sbatch    compute                     2  COMPLETED      0:0 
40               sbatch    compute                     2  COMPLETED      0:0 
[[email protected]]$ sacct -S2020-02-17 -E2020-02-20 -X -ojobid,start,end,state
       JobID               Start                 End      State 
------------ ------------------- ------------------- ---------- 
38           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
39           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
40           2020-02-17T12:27:59 2020-02-17T12:28:09  COMPLETED 
[[email protected]]$ 

If you have configured your cluster(s) for multiple users, you may want to look at the accounting info for all of these.  If you want to configure your clusters with multiple users, follow this blog post. It demonstrates how to configure AWS ParallelCluster with AWS Directory Services to create a multiuser, POSIX-compliant system with centralized authentication.

Each and every user can only look at his own accounting data. However, Slurm admins (or root) can see accounting info for every user. The following code shows accounting data coming from two clusters (parallelcluster and parallelcluster2) and from two users (ec2-user and nicola):

[[email protected] ~]# sacct -S 2020-01-01 --clusters=parallelcluster,parallelcluster2 --format=jobid,elapsed,ncpus,ntasks,state,user,cluster%20                                                                                                                                                        
       JobID    Elapsed      NCPUS   NTasks      State      User              Cluster 
------------ ---------- ---------- -------- ---------- --------- -------------------- 
36             00:00:00          2              FAILED  ec2-user      parallelcluster 
36.batch       00:00:00          1        1     FAILED                parallelcluster 
37             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
37.batch       00:00:00          1        1  COMPLETED                parallelcluster 
37.0           00:00:00          2        2  COMPLETED                parallelcluster 
38             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
38.batch       00:00:00          1        1  COMPLETED                parallelcluster 
38.0           00:00:00          2        2  COMPLETED                parallelcluster 
39             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
39.batch       00:00:00          1        1  COMPLETED                parallelcluster 
39.0           00:00:00          2        2  COMPLETED                parallelcluster 
40             00:00:10          2           COMPLETED  ec2-user      parallelcluster 
40.batch       00:00:10          1        1  COMPLETED                parallelcluster 
40.0           00:00:00          2        2  COMPLETED                parallelcluster 
40.1           00:00:10          2        2  COMPLETED                parallelcluster 
41             00:00:29        144              FAILED  ec2-user      parallelcluster 
41.batch       00:00:29         36        1     FAILED                parallelcluster 
41.0           00:00:00        144      144  COMPLETED                parallelcluster 
41.1           00:00:01        144      144  COMPLETED                parallelcluster 
41.2           00:00:00          3        3  COMPLETED                parallelcluster 
41.3           00:00:00          3        3  COMPLETED                parallelcluster 
42             01:22:03        144           COMPLETED  ec2-user      parallelcluster 
42.batch       01:22:03         36        1  COMPLETED                parallelcluster 
42.0           00:00:01        144      144  COMPLETED                parallelcluster 
42.1           00:00:00        144      144  COMPLETED                parallelcluster 
42.2           00:00:39          3        3  COMPLETED                parallelcluster 
42.3           00:34:55          3        3  COMPLETED                parallelcluster 
43             00:00:11          2           COMPLETED  ec2-user      parallelcluster 
43.batch       00:00:11          1        1  COMPLETED                parallelcluster 
43.0           00:00:01          2        2  COMPLETED                parallelcluster 
43.1           00:00:10          2        2  COMPLETED                parallelcluster 
44             00:00:11          2           COMPLETED    nicola      parallelcluster 
44.batch       00:00:11          1        1  COMPLETED                parallelcluster 
44.0           00:00:01          2        2  COMPLETED                parallelcluster 
44.1           00:00:10          2        2  COMPLETED                parallelcluster 
4              00:00:10          2           COMPLETED    nicola     parallelcluster2 
4.batch        00:00:10          1        1  COMPLETED               parallelcluster2 
4.0            00:00:00          2        2  COMPLETED               parallelcluster2 
4.1            00:00:10          2        2  COMPLETED               parallelcluster2 
[[email protected] ~]# 

You can also directly query your database, and look at the accounting information stored in it or link your preferred BI tool to get insights from your HPC cluster. To do so, run the following code:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| slurm_acct_db      |
+--------------------+
4 rows in set (0.00 sec)

mysql> use slurm_acct_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------------------------------+
| Tables_in_slurm_acct_db                 |
+-----------------------------------------+
| acct_coord_table                        |
| acct_table                              |
| clus_res_table                          |
| cluster_table                           |
| convert_version_table                   |
| federation_table                        |
| parallelcluster_assoc_table             |
| parallelcluster_assoc_usage_day_table   |
| parallelcluster_assoc_usage_hour_table  |
| parallelcluster_assoc_usage_month_table |
| parallelcluster_event_table             |
| parallelcluster_job_table               |
| parallelcluster_last_ran_table          |
| parallelcluster_resv_table              |
| parallelcluster_step_table              |
| parallelcluster_suspend_table           |
| parallelcluster_usage_day_table         |
| parallelcluster_usage_hour_table        |
| parallelcluster_usage_month_table       |
| parallelcluster_wckey_table             |
| parallelcluster_wckey_usage_day_table   |
| parallelcluster_wckey_usage_hour_table  |
| parallelcluster_wckey_usage_month_table |
| qos_table                               |
| res_table                               |
| table_defs_table                        |
| tres_table                              |
| txn_table                               |
| user_table                              |
+-----------------------------------------+
29 rows in set (0.00 sec)

mysql>

Conclusion

You’re finally all set! In this blog post you set up a database using Amazon RDS, configured AWS ParallelCluster and Slurm to enable job accounting with your database, and learned how to query your job accounting history from your database using the sacct command or by running SQL queries.

Deriving insights for your HPC workloads doesn’t end when your workloads finish running. Now, you can better understand and optimize your usage patterns and generate ideas about how to wring more price-performance out of your HPC clusters on AWS. For retrospective analysis, you can easily understand whether specific jobs, projects, or users are responsible for driving your HPC usage on AWS.  For forward-looking analysis, you can better forecast future usage to set budgets with appropriate insight into your costs and your resource consumption.

You can also use these accounting systems to identify users who may require additional training on how to make the most of cloud resources on AWS. Finally, together with your spending patterns, you can better capture and explain the return on investment from all of the valuable HPC work you do. And, this gives you the raw data to analyze how you can get even more value and price-performance out of the work you’re doing on AWS.

 

 

 

 

DBLog: A Generic Change-Data-Capture Framework

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/dblog-a-generic-change-data-capture-framework-69351fb9099b?source=rss----2615bd06b42e---4

Andreas Andreakis, Ioannis Papapanagiotou

Overview

Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. As transaction logs typically have limited retention, they aren’t guaranteed to contain the full history of changes. Therefore, dumps are needed to capture the full state of a source. There are several open source CDC projects, often using the same underlying libraries, database APIs, and protocols. Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks.

This motivated the development of DBLog, which offers log and dump processing under a generic framework. In order to be supported, a database is required to fulfill a set of features that are commonly available in systems like MySQL, PostgreSQL, MariaDB, and others.

Some of DBLog’s features are:

  • Processes captured log events in-order.
  • Dumps can be taken any time, across all tables, for a specific table or specific primary keys of a table.
  • Interleaves log with dump events, by taking dumps in chunks. This way log processing can progress alongside dump processing. If the process is terminated, it can resume after the last completed chunk without needing to start from scratch. This also allows dumps to be throttled and paused if needed.
  • No locks on tables are ever acquired, which prevent impacting write traffic on the source database.
  • Supports any kind of output, so that the output can be a stream, datastore, or even an API.
  • Designed with High Availability in mind. Hence, downstream consumers receive change events as they occur on a source.

Requirements

In a previous blog post, we discussed Delta, a data enrichment and synchronization platform. The goal of Delta is to keep multiple datastores in sync, where one store is the source of truth (like MySQL) and others are derived stores (like ElasticSearch). One of the key requirements is to have low propagation delays from the source of truth to the destinations and that the flow of events is highly available. These conditions apply regardless if multiple datastores are used by the same team, or if one team is owning data which another team is consuming. In our Delta blog post, we also described use cases beyond data synchronization, such as event processing.

For data synchronization and event processing use cases, we need to fulfill the following requirements, beyond the ability to capture changes in real-time:

  • Capturing the full state. Derived stores (like ElasticSearch) must eventually store the full state of the source. We provide this via dumps from the source database.
  • Triggering repairs at any time. Instead of treating dumps as a one-time setup activity, we aim to enable them at any time: across all tables, on a specific table, or for specific primary keys. This is crucial for repairs downstream when data has been lost or corrupted.
  • Providing high availability for real-time events. The propagation of real-time changes has high availability requirements; it is undesired if the flow of events stops for a longer duration of time (such as minutes or longer). This requirement needs to be fulfilled even when repairs are in progress so that they don’t stall real-time events. We want real-time and dump events to be interleaved so that both make progress.
  • Minimizing database impact. When connecting to a database, it is important to ensure that it is impacted as little as possible in terms of its bandwidth and ability to serve reads and writes for applications. For this reason, it is preferred to avoid using APIs which can block write traffic such as locks on tables. In addition to that, controls must be put in place which allow throttling of log and dump processing, or to pause the processing if needed.
  • Writing events to any output. For streaming technology, Netflix utilizes a variety of options such as Kafka, SQS, Kinesis, and even Netflix specific streaming solutions such as Keystone. Even though having a stream as an output can be a good choice (like when having multiple consumers), it is not always an ideal choice (as if there is only one consumer). We want to provide the ability to directly write to a destination without passing through a stream. The destination may be a datastore or an external API.
  • Supporting Relational Databases. There are services at Netflix that use RDBMS kind of databases such as MySQL or PostgreSQL via AWS RDS. We want to support these systems as a source so that they can provide their data for further consumption.

Existing Solutions

We evaluated a series of existing Open Source offerings, including: Maxwell, SpinalTap, Yelp’s MySQL Streamer, and Debezium. Existing solutions are similar in regard to capturing real-time changes that originate from a transaction log. For example by using MySQL’s binlog replication protocol, or PostgreSQL’s replication slots.

In terms of dump processing, we found that existing solutions have at least one of the following limitations:

  • Stopping log event processing while processing a dump. This limitation applies if log events are not processed while a dump is in progress. As a consequence, if a dump has a large volume, log event processing stalls for an extended period of time. This is an issue when downstream consumers rely on short propagation delays of real-time changes.
  • Missing ability to trigger dumps on demand. Most solutions execute a dump initially during a bootstrap phase or if data loss is detected at the transaction logs. However, the ability to trigger dumps on demand is crucial for bootstrapping new consumers downstream (like a new ElasticSearch index) or for repairs in case of data loss.
  • Blocking write traffic by locking tables. Some solutions use locks on tables to coordinate the dump processing. Depending on the implementation and database, the duration of locking can either be brief or can last throughout the whole dump process [5]. In the latter case, write traffic is blocked until the dump completes. In some cases, a dedicated read replica can be configured in order to avoid impacting writes on the master. However, this strategy does not work for all databases. For example in PostgreSQL RDS, changes can only be captured from the master.
  • Using proprietary database features. We found that some solutions use advanced database features that are not transferable to other systems, such as: using MySQL’s blackhole engine or getting a consistent snapshot for dumps from the creation of a PostgreSQL replication slot. This prevents code reuse across databases.

Ultimately, we decided to implement a different approach to handle dumps. One which:

  • interleaves log with dump events so that both can make progress
  • allows to trigger dumps at any time
  • does not use table locks
  • uses standardized database features

DBLog Framework

DBLog is a Java-based framework, able to capture changes in real-time and to take dumps. Dumps are taken in chunks so that they interleave with real-time events and don’t stall real-time event processing for an extended period of time. Dumps can be taken any time, via a provided API. This allows downstream consumers to capture the full database state initially or at a later time for repairs.

We designed the framework to minimize database impact. Dumps can be paused and resumed as needed. This is relevant both for recovery after failure and to stop processing if the database reached a bottleneck. We also don’t take locks on tables in order not to impact the application writes.

DBLog allows writing captured events to any output, even if it is another database or API. We use Zookeeper to store state related to log and dump processing, and for leader election. We have built DBLog with pluggability in mind allowing implementations to be swapped as desired (like replacing Zookeeper with something else).

The following subsections explain log and dump processing in more detail.

Log Processing

The framework requires a database to emit an event for each changed row in real-time and in a commit order. A transaction log is assumed to be the origin of those events. The database is sending them to a transport that DBLog can consume. We use the term ‘change log’ for that transport. An event can either be of type: create, update, or delete. For each event, the following needs to be provided: a log sequence number, the column state at the time of the operation, and the schema that applied at the time of the operation.

Each change is serialized into the DBLog event format and is sent to the writer so that it can be delivered to an output. Sending events to the writer is a non-blocking operation, as the writer runs in its own thread and collects events in an internal buffer. Buffered events are written to an output in-order. The framework allows to plugin a custom formatter for serializing events to a custom format. The output is a simple interface, allowing to plugin any desired destination, such as a stream, datastore or even an API.

Dump Processing

Dumps are needed as transaction logs have limited retention, which prevents their use for reconstituting a full source dataset. Dumps are taken in chunks so that they can interleave with log events, allowing both to progress. An event is generated for each selected row of a chunk and is serialized in the same format as log events. This way, a downstream consumer does not need to be concerned if events originate from the log or dumps. Both log and dump events are sent to the output via the same writer.

Dumps can be scheduled any time via an API for all tables, a specific table or for specific primary keys of a table. A dump request per table is executed in chunks of a configured size. Additionally, a delay can be configured to hold back the processing of new chunks, allowing only log event processing during that time. The chunk size and the delay allow to balance between log and dump event processing and both settings can be updated at runtime.

Chunks are selected by sorting a table in ascending primary key order and including rows, where the primary key is greater than the last primary key of the previous chunk. It is required for a database to execute this query efficiently, which typically applies for systems that implement range scans over primary keys.

Figure 1. Chunking a table with 4 columns c1-c4 and c1 as the primary key (pk). Pk column is of type integer and chunk size is 3. Chunk 2 is selected with the condition c1 > 4.

Chunks need to be taken in a way that does not stall log event processing for an extended period of time and which preserves the history of log changes so that a selected row with an older value can not override newer state from log events.

In order to achieve this, we create recognizable watermark events in the change log so that we can sequence the chunk selection. Watermarks are implemented via a table at the source database. The table is stored in a dedicated namespace so that no collisions occur with application tables. Only a single row is contained in the table which stores a UUID field. A watermark is generated by updating this row to a specific UUID. The row update results in a change event which is eventually received through the change log.

By using watermarks, dumps are taken using the following steps:

  1. Briefly pause log event processing.
  2. Generate low watermark by updating the watermark table.
  3. Run SELECT statement for the next chunk and store result-set in-memory, indexed by primary key.
  4. Generate a high watermark by updating the watermark table.
  5. Resume sending received log events to the output. Watch for the low and high watermark events in the log.
  6. Once the low watermark event is received, start removing entries from the result-set for all log event primary keys that are received after the low watermark.
  7. Once the high watermark event is received, send all remaining result-set entries to the output before processing new log events.
  8. Go to step 1 if more chunks present.

The SELECT is assumed to return state from a consistent snapshot, which represents committed changes up to a certain point in history. Or equivalently: the SELECT executed on a specific position of the change log, considering changes up to that point. Databases typically don’t expose the log position which corresponds to a select statement execution (MariaDB is an exception).

The core idea of our approach is to determine a window on the change log which guarantees to contain the SELECT. As the exact selection position is unknown, all selected rows are removed which collide with log events within that window. This ensures that the chunk selection can not override the history of log changes. The window is opened by writing the low watermark, then the selection runs, and finally, the window is closed by writing the high watermark. In order for this to work, the SELECT must read the latest state from the time of the low watermark or later (it is ok if the selection also includes writes that committed after the low watermark write and before the read).

Figures 2a and 2b are illustrating the chunk selection algorithm. We provide an example with a table that has primary keys k1 to k6. Each change log entry represents a create, update, or delete event for a primary key. In figure 2a, we showcase the watermark generation and chunk selection (steps 1 to 4). Updating the watermark table at step 2 and 4 creates two change events (magenta color) which are eventually received via the log. In figure 2b, we focus on the selected chunk rows that are removed from the result set for primary keys that appear between the watermarks (steps 5 to 7).

Figure 2a — The watermark algorithm for chunk selection (steps 1 to 4).
Figure 2b — The watermark algorithm for chunk selection (steps 5–7).

Note that a large count of log events may appear between the low and high watermark, if one or more transactions committed a large set of row changes in between. This is why our approach is briefly pausing log processing during steps 2–4 so that the watermarks are not missed. This way, log event processing can resume event-by-event afterwards, eventually discovering the watermarks, without ever needing to cache log event entries. Log processing is paused only briefly as steps 2–4 are expected to be fast: watermark updates are single write operations and the SELECT runs with a limit.

Once the high watermark is received at step 7, the non-conflicting chunk rows are handed over to the written for in-order delivery to the output. This is a non-blocking operation as the writer runs in a separate thread, allowing log processing to quickly resume after step 7. Afterwards, log event processing continues for events that occur post the high watermark.

In Figure 2c we are depicting the order of writes throughout a chunk selection, by using the same example as figures 2a and 2b. Log events that appear up to the high watermark are written first. Then, the remaining rows from the chunk result (magenta color). And finally, log events that occur after the high watermark.

Figure 2c — Order of output writes. Interleaving log with dump events.

Database support

In order to use DBLog a database needs to provide a change log from a linear history of committed changes and non-stale reads. These conditions are fulfilled by systems like MySQL, PostgreSQL, MariaDB, etc. so that the framework can be used uniformly across these kind of databases.

So far, we added support for MySQL and PostgreSQL. Integrating log events required using different libraries as each database uses a proprietary protocol. For MySQL, we use shyiko/mysql-binlog-connector which implementing the binlog replication protocol in order to receive events from a MySQL host. For PostgreSQL, we are using replication slots with the wal2json plugin. Changes are received via the streaming replication protocol which is implemented by the PostgreSQL jdbc driver. Determining the schema per captured change varies between MySQL and PostgreSQL. In PostgreSQL, wal2json contains the column names and types alongside with the column values. For MySQL schema changes must be tracked which are received as binlog events.

Dump processing was integrated by using SQL and JDBC, only requiring to implement the chunk selection and watermark update. The same code is used for MySQL and PostgreSQL and can be used for other similar databases as well. The dump processing itself has no dependency on SQL or JDBC and allows to integrate databases which fulfill the DBLog framework requirements even if they use different standards.

Figure 3 — DBLog High Level Architecture.

High Availability

DBLog uses active-passive architecture. One instance is active and the others are passive standbys. We leverage Zookeeper for leader election to determine the active instance. The leadership is a lease and is lost if it is not refreshed in time, allowing another instance to take over. We currently deploy one instance per AZ (typically we have 3 AZs), so that if one AZ goes down, an instance in another AZ can continue processing with minimal overall downtime. Passive instances across regions are also possible, though it is recommended to operate in the same region as the database host in order to keep the change capture latencies low.

Production usage

DBLog is the foundation of the MySQL and PostgreSQL Connectors at Netflix, which are used in Delta. Delta is used in production since 2018 for datastore synchronization and event processing use cases in Netflix studio applications. On top of DBLog, the Delta Connectors are using a custom event serializer, so that the Delta event format is used when writing events to an output. Netflix specific streams are used as outputs such as Keystone.

Figure 4— Delta Connector.

Beyond Delta, DBLog is also used to build Connectors for other Netflix data movement platforms, which have their own data formats.

Stay Tuned

DBLog has additional capabilities which are not covered by this blog post, such as:

  • Ability to capture table schemas without using locks.
  • Schema store integration. Storing the schema of each event that is sent to an output and having a reference in the payload of each event to the schema store.
  • Monotonic writes mode. Ensuring that once the state has been written for a specific row, a less recent state can not be written afterward. This way downstream consumers experience state transitions only in a forward direction, without going back-and-forth in time.

We are planning to open source DBLog in 2020 and include additional documentation.

Credits

We would like to thank the following persons for contributing to the development of DBLog: Josh Snyder, Raghuram Onti Srinivasan, Tharanga Gamaethige, and Yun Wang.

References

[1] Das, Shirshanka, et al. “All aboard the Databus!: Linkedin’s scalable consistent change data capture platform.” Proceedings of the Third ACM Symposium on Cloud Computing. ACM, 2012

[2] “About Change Data Capture (SQL Server)”, Microsoft SQL docs, 2019

[3] Kleppmann, Martin, “Using logs to build a solid data infrastructure (or: why dual writes are a bad idea)“, Confluent, 2015

[4] Kleppmann, Martin, Alastair R. Beresford, and Boerge Svingen. “Online event processing.” Communications of the ACM 62.5 (2019): 43–49

[5] https://debezium.io/documentation/reference/0.10/connectors/mysql.html#snapshots


DBLog: A Generic Change-Data-Capture Framework was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to use AWS Secrets Manager to rotate credentials for all Amazon RDS database types, including Oracle

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/how-to-use-aws-secrets-manager-rotate-credentials-amazon-rds-database-types-oracle/

You can now use AWS Secrets Manager to rotate credentials for Oracle, Microsoft SQL Server, or MariaDB databases hosted on Amazon Relational Database Service (Amazon RDS) automatically. Previously, I showed how to rotate credentials for a MySQL database hosted on Amazon RDS automatically with AWS Secrets Manager. With today’s launch, you can use Secrets Manager to automatically rotate credentials for all types of databases hosted on Amazon RDS.

In this post, I review the key features of Secrets Manager. You’ll then learn:

  1. How to store the database credential for the superuser of an Oracle database hosted on Amazon RDS
  2. How to store the Oracle database credential used by an application
  3. How to configure Secrets Manager to rotate both Oracle credentials automatically on a schedule that you define

Key features of Secrets Manager

AWS Secrets Manager makes it easier to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. The key features of this service include the ability to:

  1. Secure and manage secrets centrally. You can store, view, and manage all your secrets centrally. By default, Secrets Manager encrypts these secrets with encryption keys that you own and control. You can use fine-grained IAM policies or resource-based policies to control access to your secrets. You can also tag secrets to help you discover, organize, and control access to secrets used throughout your organization.
  2. Rotate secrets safely. You can configure Secrets Manager to rotate secrets automatically without disrupting your applications. Secrets Manager offers built-in integrations for rotating credentials for all Amazon RDS databases (MySQL, PostgreSQL, Oracle, Microsoft SQL Server, MariaDB, and Amazon Aurora.) You can also extend Secrets Manager to meet your custom rotation requirements by creating an AWS Lambda function to rotate other types of secrets.
  3. Transmit securely. Secrets are transmitted securely over Transport Layer Security (TLS) protocol 1.2. You can also use Secrets Manager with Amazon Virtual Private Cloud (Amazon VPC) endpoints powered by AWS Privatelink to keep this communication within the AWS network and help meet your compliance and regulatory requirements to limit public internet connectivity.
  4. Pay as you go. Pay for the secrets you store in Secrets Manager and for the use of these secrets; there are no long-term contracts, licensing fees, or infrastructure and personnel costs. For example, a typical production-scale web application will generate an estimated monthly bill of $6. If you follow along the instructions in this blog post, your estimated monthly bill for Secrets Manager will be $1. Note: you may incur additional charges for using Amazon RDS and Amazon Lambda, if you’ve already consumed the free tier for these services.

Now that you’re familiar with Secrets Manager features, I’ll show you how to store and automatically rotate credentials for an Oracle database hosted on Amazon RDS. I divided these instructions into three phases:

  1. Phase 1: Store and configure rotation for the superuser credential
  2. Phase 2: Store and configure rotation for the application credential
  3. Phase 3: Retrieve the credential from Secrets Manager programmatically

Prerequisites

To follow along, your AWS Identity and Access Management (IAM) principal (user or role) requires the SecretsManagerReadWrite AWS managed policy to store the secrets. Your principal also requires the IAMFullAccess AWS managed policy to create and configure permissions for the IAM role used by Lambda for executing rotations. You can use IAM permissions boundaries to grant an employee the ability to configure rotation without also granting them full administrative access to your account.

Phase 1: Store and configure rotation for the superuser credential

From the Secrets Manager console, on the right side, select Store a new secret.

Since I’m storing credentials for database hosted on Amazon RDS, I select Credentials for RDS database. Next, I input the user name and password for the superuser. I start by securing the superuser because it’s the most powerful database credential and has full access to the database.
 

Figure 1: For "Select secret type," choose "Credentials for RDS database"

Figure 1: For “Select secret type,” choose “Credentials for RDS database”

For this example, I choose to use the default encryption settings. Secrets Manager will encrypt this secret using the Secrets Manager DefaultEncryptionKey in this account. Alternatively, I can choose to encrypt using a customer master key (CMK) that I have stored in AWS Key Management Service (AWS KMS). To learn more, read the Using Your AWS KMS CMK documentation.
 

Figure 2: Choose either DefaultEncryptionKey or use a CMK

Figure 2: Choose either DefaultEncryptionKey or use a CMK

Next, I view the list of Amazon RDS instances in my account and select the database this credential accesses. For this example, I select the DB instance oracle-rds-database from the list, and then I select Next.

I then specify values for Secret name and Description. For this example, I use Database/Development/Oracle-Superuser as the name and enter a description of this secret, and then select Next.
 

Figure 3: Provide values for "Secret name" and "Description"

Figure 3: Provide values for “Secret name” and “Description”

Since this database is not yet being used, I choose to enable rotation. To do so, I select Enable automatic rotation, and then set the rotation interval to 60 days. Remember, if this database credential is currently being used, first update the application (see phase 3) to use Secrets Manager APIs to retrieve secrets before enabling rotation.
 

Figure 4: Select "Enable automatic rotation"

Figure 4: Select “Enable automatic rotation”

Next, Secrets Manager requires permissions to rotate this secret on my behalf. Because I’m storing the credentials for the superuser, Secrets Manager can use this credential to perform rotations. Therefore, on the same screen, I select Use a secret that I have previously stored in AWS Secrets Manager, and then select Next.

Finally, I review the information on the next screen. Everything looks correct, so I select Store. I have now successfully stored a secret in Secrets Manager.

Note: Secrets Manager will now create a Lambda function in the same VPC as my Oracle database and trigger this function periodically to change the password for the superuser. I can view the name of the Lambda function on the Rotation configuration section of the Secret Details page.

The banner on the next screen confirms that I’ve successfully configured rotation and the first rotation is in progress, which enables me to verify that rotation is functioning as expected. Secrets Manager will rotate this credential automatically every 60 days.
 

Figure 5: The confirmation notification

Figure 5: The confirmation notification

Phase 2: Store and configure rotation for the application credential

The superuser is a powerful credential that should be used only for administrative tasks. To enable your applications to access a database, create a unique database credential per application and grant these credentials limited permissions. You can use these database credentials to read or write to database tables required by the application. As a security best practice, deny the ability to perform management actions, such as creating new credentials.

In this phase, I will store the credential that my application will use to connect to the Oracle database. To get started, from the Secrets Manager console, on the right side, select Store a new secret.

Next, I select Credentials for RDS database, and input the user name and password for the application credential.

I continue to use the default encryption key. I select the DB instance oracle-rds-database, and then select Next.

I specify values for Secret Name and Description. For this example, I use Database/Development/Oracle-Application-User as the name and enter a description of this secret, and then select Next.

I now configure rotation. Once again, since my application is not using this database credential yet, I’ll configure rotation as part of storing this secret. I select Enable automatic rotation, and set the rotation interval to 60 days.

Next, Secrets Manager requires permissions to rotate this secret on behalf of my application. Earlier in the post, I mentioned that applications credentials have limited permissions and are unable to change their password. Therefore, I will use the superuser credential, Database/Development/Oracle-Superuser, that I stored in Phase 1 to rotate the application credential. With this configuration, Secrets Manager creates a clone application user.
 

Figure 6: Select the superuser credential

Figure 6: Select the superuser credential

Note: Creating a clone application user is the preferred mechanism of rotation because the old version of the secret continues to operate and handle service requests while the new version is prepared and tested. There’s no application downtime while changing between versions.

I review the information on the next screen. Everything looks correct, so I select Store. I have now successfully stored the application credential in Secrets Manager.

As mentioned in Phase 1, AWS Secrets Manager creates a Lambda function in the same VPC as the database and then triggers this function periodically to rotate the secret. Since I chose to use the existing superuser secret to rotate the application secret, I will grant the rotation Lambda function permissions to retrieve the superuser secret. To grant this permission, I first select role from the confirmation banner.
 

Figure 7: Select the "role" link that's in the confirmation notification

Figure 7: Select the “role” link that’s in the confirmation notification

Next, in the Permissions tab, I select SecretsManagerRDSMySQLRotationMultiUserRolePolicy0. Then I select Edit policy.
 

Figure 8: Edit the policy on the "Permissions" tab

Figure 8: Edit the policy on the “Permissions” tab

In this step, I update the policy (see below) and select Review policy. When following along, remember to replace the placeholder ARN-OF-SUPERUSER-SECRET with the ARN of the secret you stored in Phase 1.


{
  "Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "ec2:CreateNetworkInterface",
			"ec2:DeleteNetworkInterface",
			"ec2:DescribeNetworkInterfaces",
			"ec2:DetachNetworkInterface"
		],
		"Resource": "*"
	},
	{
	    "Sid": "GrantPermissionToUse",
		"Effect": "Allow",
		"Action": [
            "secretsmanager:GetSecretValue"
        ],
		"Resource": "ARN-OF-SUPERUSER-SECRET"
	}
  ]
}

Here’s what it will look like:
 

Figure 9: Edit the policy

Figure 9: Edit the policy

Next, I select Save changes. I have now completed all the steps required to configure rotation for the application credential, Database/Development/Oracle-Application-User.

Phase 3: Retrieve the credential from Secrets Manager programmatically

Now that I have stored the secret in Secrets Manager, I add code to my application to retrieve the database credential from Secrets Manager. I use the sample code from Phase 2 above. This code sets up the client and retrieves and decrypts the secret Database/Development/Oracle-Application-User.

Remember, applications require permissions to retrieve the secret, Database/Development/Oracle-Application-User, from Secrets Manager. My application runs on Amazon EC2 and uses an IAM role to obtain access to AWS services. I attach the following policy to my IAM role. This policy uses the GetSecretValue action to grant my application permissions to read secret from Secrets Manager. This policy also uses the resource element to limit my application to read only the Database/Development/Oracle-Application-User secret from Secrets Manager. You can refer to the Secrets Manager Documentation to understand the minimum IAM permissions required to retrieve a secret.


{
 "Version": "2012-10-17",
 "Statement": {
    "Sid": "RetrieveDbCredentialFromSecretsManager",
    "Effect": "Allow",
    "Action": "secretsmanager:GetSecretValue",
    "Resource": "arn:aws:secretsmanager:<AWS-REGION>:<ACCOUNT-NUMBER>:secret: Database/Development/Oracle-Application-User     
 }
}

In the above policy, remember to replace the placeholder <AWS-REGION> with the AWS region that you’re using and the placeholder <ACCOUNT-NUMBER> with the number of your AWS account.

Summary

I explained the key benefits of Secrets Manager as they relate to RDS and showed you how to help meet your compliance requirements by configuring Secrets Manager to rotate database credentials automatically on your behalf. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum.

Want more AWS Security news? Follow us on Twitter.

Apurv Awasthi

Apurv is the product manager for credentials management services at AWS, including AWS Secrets Manager and IAM Roles. He enjoys the “Day 1” culture at Amazon because it aligns with his experience building startups in the sports and recruiting industries. Outside of work, Apurv enjoys hiking. He holds an MBA from UCLA and an MS in computer science from University of Kentucky.

Build a social media follower counter

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/build-social-media-follower-counter/

In this tutorial from HackSpace magazine issue 9, Paul Freeman-Powell shows you how to keep track of your social media followers, and encourage subscribers, by building a live follower counter. Get your copy of HackSpace magazine in stores now, or download it as a free PDF here.

Issues 10 of HackSpace magazine is available online and in stores from tomorrow!

The finished build with all components connected, working, and installed in the frame ready for hanging on the wall

If you run a local business like an electronics shop or a café, or if you just want to grow your online following and influence, this project is a fun way to help you keep track of your progress. A counter could also help contribute to growing your following if you hang it on the wall and actively ask your customers to like/follow you to see the numbers go up!

You’ve probably seen those social media follower counters that feature mechanical splitflap displays. In this project we’ll build a counter powered by RGB LEDs that scrolls through four social profiles, using APIs to pull the number of followers for each account. I’m using YouTube, Twitter, Facebook, and Instagram; you can, of course, tailor the project to your needs.

This project involves a bit of electronics, a bit of software coding, and a bit of woodwork, as well as some fairly advanced display work as we transfer a small portion of the Raspberry Pi’s HDMI output onto the LED matrices.

Let’s get social

First, you need to get your Raspberry Pi all set up and talking to the social networks that you’re going to display. Usually, it’s advisable to install Raspbian without any graphical user interface (GUI) for most electronics projects, but in this case you’ll be actively using that GUI, so make sure you start with a fresh and up-to-date installation of full-fat Raspbian.

phpMyAdmin gives you an easy web interface to allow you to access and edit the device’s settings – for example, speed and direction of scrolling, API credentials, and the social network accounts to monitor

You start by turning your humble little Raspberry Pi into your very own mini web server, which will gather your credentials, talk to the social networks, and display the follower counts. To do this, you need to install a LAMP (Linux, Apache, MySQL, PHP) stack. Start by installing the Apache web server by opening a Terminal and typing:

sudo apt-get install apache2 -y

Then, open the web browser on your Pi and type http://localhost — you will see a default page telling you that Apache is working. The page on our little ‘website’ will use code written in the PHP language, so install that by returning to your Terminal and typing:

sudo apt-get install php -y

Once that’s complete, restart Apache:

sudo service apache2 restart

Next, you’ll install the database to store your credentials, settings, and the handles of the social accounts to track. This is done with the following command in your Terminal:

sudo apt-get install mysql-server php-mysql -y

To set a root password for your database, type the following command and follow the on-screen instructions:

sudo mysql_secure_installation

Restart Apache again. Then, for easier management of the database, I recommend installing phpMyAdmin:

sudo apt-get install phpMyAdmin -y

At this point, it’s a good idea to connect your Pi to a WiFi network, unless you’re going to be running a network cable to it. Either way, it’s useful to have SSH enabled and to know its IP address so we can access it remotely. Type the following to access Pi settings and enable SSH:

sudo raspi-config

To determine your Pi’s IP address (which will likely be something like 192.168.0.xxx), type either of the following two commands:

ifconfig # this gives you lots of extra info
hostname -I # this gives you less info, but all we need in this case

Now that SSH is enabled and you know the LAN IP address of the Pi, you can use PuTTY to connect to it from another computer for the rest of your work. The keyboard, mouse, and monitor can now be unplugged from the Raspberry Pi.

Social media monitor

To set up the database, type http://XXX/ phpmyadmin (where XXX is your Pi’s IP address) and log in as root with the password you set previously. Head to the User Accounts section and create a new user called socialCounter.

You can now download the first bit of code for this project by running this in your Terminal window:

cd /var/www/html

sudo apt-get update

sudo apt-get install git -y

sudo git clone https://github.com/paulfp/social- media-counter.git

Next, open up the db.php script and edit it to include the password you set when creating the socialCounter user:

cd ./social-media-counter

sudo nano db.php

The database, including tables and settings, is contained in the socialCounter.sql file; this can be imported either via the Terminal or via phpMyAdmin, then open up the credentials table. To retrieve the subscriber count, YouTube requires a Google API key, so go to console.cloud.google.com and create a new Project (call it anything you like). From the left-hand menu, select ‘APIs & Services’, followed by ‘Library’ and search for the YouTube Data API and enable it. Then go to the ‘Credentials’ tab and create an API key that you can then paste into the ‘googleApiKey’ database field.

Facebook requires you to create an app at developers.facebook.com, after which you can paste the details into the facebookAppId and facebookSecret fields. Unfortunately, due to recent scandals surrounding clandestine misuse of personal data on Facebook, you’ll need to submit your app for review and approval before it will work.

The ‘social_accounts’ table is where you enter the user names for the social networks you want to monitor, so replace those with your own and then open a new tab and navigate to http://XXX/socialmedia-counter. You should now see a black page with a tiny carousel showing the social media icons plus follower counts next to each one. The reason it’s so small is because it’s a 64×16 pixel portion of the screen that we’ll be displaying on our 64×16 LED boards.

GPIO pins to LED display

Now that you have your social network follower counts being grabbed and displayed, it’s time to get that to display on our screens. The first step is to wire them up with the DuPont jumper cables from the Raspberry Pi’s GPIO pins to the connection on the first board. This is quite fiddly, but there’s an excellent guide and diagram on GitHub within Henner Zeller’s library that we’ll be using later, so head to hsmag.cc/PLyRcK and refer to wiring.md.

The Raspberry Pi connects to the RGB LED screens with 14 jumper cables, and the screens are daisy-chained together with a ribbon cable

The second screen is daisy-chained to the first one with the ribbon cable, and the power connector that comes with the screens will plug into both panels. Once you’re done, your setup should look just like the picture on this page.

To display the Pi’s HDMI output on the LED screens, install Adafruit’s rpi-fb-matrix library (which in turn uses Henner Zeller’s library to address the panels) by typing the following commands:

sudo apt-get install -y build-essential libconfig++-dev

cd ~

git clone --recursive https://github.com/ adafruit/rpi-fb-matrix.git

cd rpi-fb-matrix

Next, you must define your wiring as regular. Type the following to edit the Makefile:

nano Makefile

Look for the HARDWARE_DESC= property and ensure the line looks like this: export HARDWARE_DESC=regular before saving and exiting nano. You’re now ready to compile the code, so type this and then sit back and watch the output:

make clean all

Once that’s done, there are a few more settings to change in the matrix configuration file, so open that up:

nano matrix.cfg

You need to make several changes in here, depending on your setup:

  • Change display_width to 64 and display_height to 16
  • Set panel_width to 32 and panel_height to 16
  • Set chain_length to 2
  • Set parallel_count to 1

The panel array should look like this:

panels = ( 
  ( { order = 1; rotate = 0; }, { order = 0; rotate = 0; } )
)

Uncomment the crop_origin = (0, 0) line to tell the tool that we don’t want to squish the entire display onto our screens, just an equivalent portion starting right in the top left of the display. Press CTRL+X, then Y, then ENTER to save and exit.

It ain’t pretty…but it’s out of sight. The Raspberry Pi plus the power supply for the screens fit nice and neatly behind the screens. I left each end open to allow airflow

Finally, before you can test the output, there are some other important settings you need to change first, so open up the Raspberry Pi’s boot configuration as follows:

sudo nano /boot/config.txt

First, disable the on-board sound (as it uses hardware that the screens rely on) by looking for the line that says dtparam=audio=on and changing it to off. Also, uncomment the line that says hdmi_force_hotplug=1, to ensure that an HDMI signal is still generated even with no HDMI monitor plugged in. Save and then reboot your Raspberry Pi.

Now run the program using the config you just set:

cd ~/rpi-fb-matrix

sudo ./rpi-fb-matrix matrix.cfg

You should now see the top 64×16 pixels of your Pi’s display represented on your RGB LED panels! This probably consists of the Raspberry Pi icon and the rest of the top portion of the display bar.

No screensaver!

At this point it’s important to ensure that there’s no screensaver or screen blanking enabled on the Pi, as you want this to display all the time. To disable screen blanking, first install the xscreensaver tool:

sudo apt-get install xscreensaver

That will add a screensaver option to the Pi’s GUI menus, allowing you to disable it completely. Finally, you need to tell the Raspberry Pi to do two things each time it loads:

  • Run the rpi-fb-matrix program (like we did manually just now)
  • Open the web browser in fullscreen (‘kiosk’ mode), pointed to the Social Counter web page

To do so, edit the Pi’s autostart configuration file:

sudo nano ~/.config/lxsession/LXDE-pi/autostart

Insert the following two lines at the end:

@sudo /home/pi/rpi-fb-matrix/rpi-fb-matrix /home/ pi/rpi-fb-matrix/matrix.cfg\

@chromium-browser --kiosk http://localhost/ social-media-counter

Et voilà!

Disconnect any keyboard, monitor, or mouse from the Pi and reboot it one more time. Once it’s started up again, you should have a fully working display cycling through each enabled social network, showing up-to-date follower counts for each.

It’s now time to make a surround to hold all the components together and allow you to wall-mount your display. The styling you go for is up to you — you could even go all out and design and 3D print a custom package.

The finished product, in pride of place on the wall of our office. Now I just need some more subscribers…!

For my surround, I went for the more rustic and homemade look, and used some spare bits of wood from an internal door frame lining. This worked really well due to the pre-cut recess. With a plywood back, you can screw everything together so that the wood holds the screens tightly enough to not require any extra fitting or gluing, making for easier future maintenance. To improve the look and readability of the display (as well as soften the light and reduce the brightness), you can use a reflective diffuser from an old broken LED TV if you can lay your hands on one from eBay or a TV repair shop, or just any other bit of translucent material. I found that two layers stapled on worked and looked great. Add some hooks to the back and — Bob’s your uncle — a finished, wall-mounted display!

Phew — that was quite an advanced build, but you now have a sophisticated display that can be used for any number of things and should delight your customers whilst helping to build your social following as well. Don’t forget to tweet us a picture of yours!

The post Build a social media follower counter appeared first on Raspberry Pi.

New – Pay-per-Session Pricing for Amazon QuickSight, Another Region, and Lots More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-pay-per-session-pricing-for-amazon-quicksight-another-region-and-lots-more/

Amazon QuickSight is a fully managed cloud business intelligence system that gives you Fast & Easy to Use Business Analytics for Big Data. QuickSight makes business analytics available to organizations of all shapes and sizes, with the ability to access data that is stored in your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, flat files in S3, and (via connectors) data stored in on-premises MySQL, PostgreSQL, and SQL Server databases. QuickSight scales to accommodate tens, hundreds, or thousands of users per organization.

Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:

Pay-per-Session Pricing
Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.

However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.

In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:

Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).

Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.

Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.

To learn more, visit the QuickSight Pricing page.

A New Region
QuickSight is now available in the Asia Pacific (Tokyo) Region:

The UI is in English, with a localized version in the works.

Hourly Data Refresh
Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.

Access to Data in Private VPCs
This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.

Parameters with On-Screen Controls
QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:

To learn more, read about Parameters in QuickSight.

URL Actions for Linked Dashboards
You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:

You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.

Dashboard Sharing
You can now share QuickSight dashboards across every user in an account.

Larger SPICE Tables
The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.

Upgrade to Enterprise Edition
The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.

Available Now
Everything I listed above is available now and you can start using it today!

You can try QuickSight for 60 days at no charge, and you can also attend our June 20th Webinar.

Jeff;

 

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/756020/rss

Security updates have been issued by Arch Linux (strongswan, wireshark-cli, wireshark-common, wireshark-gtk, and wireshark-qt), CentOS (libvirt, procps-ng, and thunderbird), Debian (apache2, git, and qemu), Gentoo (beep, git, and procps), Mageia (mariadb, microcode, python, virtualbox, and webkit2), openSUSE (ceph, pdns, and perl-DBD-mysql), Red Hat (kernel), SUSE (HA kernel modules, libmikmod, ntp, and tiff), and Ubuntu (nvidia-graphics-drivers-384).

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754653/rss

Security updates have been issued by CentOS (dhcp), Debian (xen), Fedora (dhcp, flac, kubernetes, leptonica, libgxps, LibRaw, matrix-synapse, mingw-LibRaw, mysql-mmm, patch, seamonkey, webkitgtk4, and xen), Mageia (389-ds-base, exempi, golang, graphite2, libpam4j, libraw, libsndfile, libtiff, perl, quassel, spring-ldap, util-linux, and wget), Oracle (dhcp and kernel), Red Hat (389-ds-base, chromium-browser, dhcp, docker-latest, firefox, kernel-alt, libvirt, qemu-kvm, redhat-vertualization-host, rh-haproxy18-haproxy, and rhvm-appliance), Scientific Linux (389-ds-base, dhcp, firefox, libvirt, and qemu-kvm), and Ubuntu (poppler).

timeShift(GrafanaBuzz, 1w) Issue 44

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/05/11/timeshiftgrafanabuzz-1w-issue-44/

Welcome to TimeShift Grafana v5.1.2 is available and includes an important bug fix for MySQL, plus an update for GDPR compliance. See below for more details and the full release notes.
Also, KubeCon + CloudNativeCon Europe 2018 videos are now available including talks from members of the Grafana Labs team! Check out these talks below.
If you would like your article highlighted in our weekly roundup, feel free to send me an email at [email protected]

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/753257/rss

Security updates have been issued by Fedora (cups-filters, ghostscript, glusterfs, PackageKit, qpdf, and xen), Mageia (anki, libofx, ming, sox, webkit2, and xdg-user-dirs), Oracle (corosync, java-1.7.0-openjdk, and pcs), Red Hat (java-1.7.0-openjdk), Scientific Linux (corosync, firefox, gcc, glibc, golang, java-1.7.0-openjdk, java-1.8.0-openjdk, kernel, krb5, librelp, libvncserver, libvorbis, ntp, openssh, openssl, PackageKit, patch, pcs, policycoreutils, qemu-kvm, and xdg-user-dirs), Slackware (libwmf and mozilla), and Ubuntu (apache2, ghostscript, mysql-5.7, wavpack, and webkit2gtk).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/753137/rss

Security updates have been issued by Arch Linux (drupal), Debian (chromium-browser, gunicorn, libvorbis, openjdk-8, roundcube, sdl-image1.2, slurm-llnl, and tor), Fedora (boost, cups-filters, ghostscript, gsoap, memcached, mod_http2, and qpdf), openSUSE (Chromium and mysql-community-server), and Red Hat (glusterfs, OpenShift Container Platform 3.1, OpenShift Container Platform 3.2, OpenShift Container Platform 3.3, OpenShift Container Platform 3.4, OpenShift Container Platform 3.5, OpenShift Container Platform 3.6, OpenShift Container Platform 3.7, OpenShift Container Platform 3.8, OpenShift Container Platform 3.9, and openvswitch).

10 visualizations to try in Amazon QuickSight with sample data

Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/

If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.

The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.

Which data sources does Amazon QuickSight support?

At the time of publication, you can use the following data methods:

  • Connect to AWS data sources, including:
    • Amazon RDS
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Athena
    • Amazon S3
  • Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  • Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
  • Import data from SaaS applications like Salesforce and Snowflake
  • Use big data processing engines like Spark and Presto

This list is constantly growing. For more information, see Supported Data Sources.

Answers in instants

SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

  1. Connect to a data source, and then create a new dataset or choose an existing dataset.
  2. (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
  3. Create a new analysis.
  4. Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
  5. (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
  6. (Optional) Add more visuals to the analysis.
  7. (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
  8. (Optional) Publish the analysis as a dashboard to share insights with other users.

The following graphic illustrates a typical Amazon QuickSight workflow.

Visualizations created in Amazon QuickSight with sample datasets

Visualizations for a data analyst

Source:  https://data.worldbank.org/

Download and Resources:  https://datacatalog.worldbank.org/dataset/world-development-indicators

Data catalog:  The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.

The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.

The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.

Visualizations for a trading analyst

Source:  Deutsche Börse Public Dataset (DBG PDS)

Download and resources:  https://aws.amazon.com/public-datasets/deutsche-boerse-pds/

Data catalog:  The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.

The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.

The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.

Visualizations for a data scientist

Source:  https://catalog.data.gov/

Download and resources:  https://catalog.data.gov/dataset/road-weather-information-stations-788f8

Data catalog:  Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.

The following graph shows the present max air temperature in Seattle from different RWI station sensors.

The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.

Visualizations for a data engineer

Source:  https://www.kaggle.com/

Download and resources:  https://www.kaggle.com/datasnaek/youtube-new/data

Data catalog:  Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.

The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.

The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.

Visualizations for a business user

Source:  New York Taxi Data

Download and resources:  https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb

Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.

The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.

A quick search for that date and location shows you the following news report:

Summary

Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!

 


Additional Reading

If you found this post useful, be sure to check out Amazon QuickSight Adds Support for Combo Charts and Row-Level Security and Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight.


Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.

 

 

 

Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.