Rapid7 Statement on the New Standard Contractual Clauses for International Transfers of Personal Data

2021-09-21 Chelsea Portney

Post Syndicated from Chelsea Portney original https://blog.rapid7.com/2021/09/21/rapid7-statement-on-the-new-standard-contractual-clauses-for-international-transfers-of-personal-data/

Context: On June 4, 2021, the European Commission published new standard contractual clauses (“New SCCs”). Under the General Data Protection Regulation (“GDPR”), transfers of personal data to countries outside of the European Economic Area (EEA) must meet certain conditions. The New SCCs are an approved mechanism to enable companies transferring personal data outside of the EEA to meet those conditions, and they replace the previous set of standard contractual clauses (“Old SCCs”), which were deemed inadequate by the Court of Justice of the European Union (“CJEU”). The New SCCs made a number of improvements to the previous version, including but not limited to (i) a modular design which allows parties to choose the module applicable to the personal data being transferred, (ii) use by non-EEA data exporters, and (iii) strengthened data subjects rights and protections.

Rapid7 Action: In light of the European Commission’s adoption of the New SCCs, Rapid7 performed a thorough assessment of its personal data transfers which involved reviewing the technical, contractual, and organizational measures we have in place, evaluating local laws where the personal data will be transferred, and analyzing the necessity for the transfers in accordance with the type and scope of the personal data being transferred. Rapid7 will be updating our Data Processing Addendum on September 27, 2021, to incorporate the New SCCs, where required, for the transfer of personal data outside of the EEA. Rapid7’s adoption of the New SCCs helps ensure we are able to continue to serve all our clients in compliance with GDPR data transfer rules.

Ongoing Commitments: Rapid7 is committed to upholding high standards of privacy and security for our customers, and we are pleased to be able to offer the New SCCs which provide enhanced protections that better take account of the rapidly evolving data environment. We will continue to monitor ongoing changes in order to comply with applicable law and will regularly assess our technical, contractual, and organizational measures in an effort to improve our data protection safeguards. For information on how Rapid7 collects, uses, and discloses personal data, as well as the choices available regarding personal data collected by Rapid7, please see the Rapid7 Privacy Policy. Additionally, Rapid7 remains dedicated to maintaining and enhancing our robust security and privacy program which is outlined in detail on our Trust page.

For more information about our security and privacy program, please email [email protected].

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Agentless Oracle database monitoring with ODBC

2021-09-21 Aigars Kadiķis

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/agentless-oracle-database-monitoring-with-odbc/15589/

Did you know that Zabbix has an out-of-the-box template for collecting Oracle database metrics? With this template, we can collect data like database, tablespace, ASM, and many other metrics agentlessly, by using ODBC. This blog post will guide you on how to set up ODBC monitoring for Oracle 11.2, 12.1, 18.5, or 19.2 database servers. This post can serve as the perfect set of guidelines for deploying Oracle database monitoring in your environment.

Download Instant client and SQLPlus

The provided commands apply for the following operating systems: CentOS 8, Oracle Linux 8, or Rocky Linux.

First we have to download the following packages:
oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm
oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

Here we are downloading

Oracle instant client – required, to establish connectivity to an Oracle database
SQLPlus – A tool that we can use to test the connectivity to an Oracle database
Oracle ODBC package – contains the required ODBC drivers and configuration scripts to enable ODBC connectivity to an Oracle database

Upload the packages to the Zabbix server (or proxy, if you wish to monitor your Oracle DB on a proxy) and place it in:

/tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
/tmp/oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm
/tmp/oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

Solve OS dependencies

Install ‘libaio’ and ‘libnsl’ library:

dnf -y install libaio-devel libnsl

Otherwise, we will receive errors:

# rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
error: Failed dependencies:
        libaio is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64
        libnsl.so.1()(64bit) is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64
# rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm
error: Failed dependencies:
        libnsl.so.1()(64bit) is needed by oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64

Check if Oracle components have been previously deployed on the system. The commands below should provide an empty output:

rpm -qa | grep oracle
ldconfig -p | grep oracle

Install Oracle Instant Client

rpm -ivh /tmp/oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64.rpm

Make sure that the package ‘oracle-instantclient19.12-basic-19.12.0.0.0-1.x86_64’ is installed:

rpm -qa | grep oracle

LD config

The official Oracle template page at git.zabbix.com talks about the method to configure Oracle ENV Usage for the service. For this version 19.12 of instant client, it is NOT REQUIRED to create a ‘/etc/sysconfig/zabbix-server’ file with content:

export ORACLE_HOME=/usr/lib/oracle/19.12/client64
export PATH=$PATH:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib64:/usr/lib:$ORACLE_HOME/bin
export TNS_ADMIN=$ORACLE_HOME/network/admin

While we did install the rpm package, the Oracle 19.12 client package did auto-configure LD path at the global level – it means every user on the system can use the Oracle instant client. We can see the LD path have been configured under:

cat /etc/ld.so.conf.d/oracle-instantclient.conf

This will print:

/usr/lib/oracle/19.12/client64/lib

To ensure that the required Oracle libraries are recognized by the OS, we can run:

ldconfig -p | grep oracle

It should print:

liboramysql19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/liboramysql19.so
libocijdbc19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libocijdbc19.so
libociei.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libociei.so
libocci.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libocci.so.19.1
libnnz19.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libnnz19.so
libmql1.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libmql1.so
libipc1.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libipc1.so
libclntshcore.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so.19.1
libclntshcore.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so
libclntsh.so.19.1 (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntsh.so.19.1
libclntsh.so (libc6,x86-64) => /usr/lib/oracle/19.12/client64/lib/libclntsh.so

Note: If for some reason the ldconfig command shows links to other dynamic libraries – that’s when we might have to create a separate ENV file for Zabbix server/Proxy, which would link the Zabbix application to the correct dynamic libraries, as per the example at the start of this section.

Check if the Oracle service port is reachable

To save us some headache down the line, let’s first check the network connectivity to our Oracle database host. Let’s check if we can reach the default Oracle port at the network level. In this example, we will try to connect to the default Oracle database port, 1521. Depending on which port your Oracle database is listening for connections, adjust accordingly,. Make sure the output says ‘Connected to 10.1.10.15:1521’:

nc -zv 10.1.10.15 1521

Test connection with SQLPlus

We can simulate the connection to the Oracle database before moving on with the ODBC configuration. Make sure that the Oracle username and password used in the command are correct. For this task, we will first need to install the SQLPlus package.:

rpm -ivh /tmp/oracle-instantclient19.12-sqlplus-19.12.0.0.0-1.x86_64.rpm

To simulate the connection, we can use a one-liner command. In the example command I’m using the username ‘system’ together with the password ‘oracle’ to reach out to the Oracle database server ‘10.1.10.15’ via port ‘1521’ and connect to the service name ‘xe’:

sqlplus64 'system/oracle@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.10.15)(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=xe)))'

In the output we can see: we are using the 19.12 client to connect to 11.2 server:

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Sep 6 13:47:36 2021
Version 19.12.0.0.0
Copyright (c) 1982, 2021, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

Note: This gives us an extra hint regarding the Oracle instant client – newer versions of the client are backwards compatible with the older versions of the Oracle database server. Though this doesn’t apply to every version of Oracle client/server, please check the Oracle instant client documentation first.

ODBC connector

When it comes to configuring ODBC, let’s first install the ODBC driver manager

dnf -y install unixODBC

Now we can see that we have two new files – ‘/etc/odbc.ini’ (possibly empty) and ‘/etc/odbcinst.ini’.

The file ‘/etc/odbcinst.ini’ describes driver relation. Currently, when we ‘grep’ the keyword ‘oracle’ there is no oracle relation installed, the output is empty when we run:

grep -i oracle /etc/odbcinst.ini

Our next step is to Install Oracle ODBC driver package:

rpm -ivh /tmp/oracle-instantclient19.12-odbc-19.12.0.0.0-1.x86_64.rpm

The ‘oracle-instantclient*-odbc’ package contains a script that will update the ‘/etc/odbcinst.ini’ configuration automatically:

cd /usr/lib/oracle/19.12/client64/bin
./odbc_update_ini.sh / /usr/lib/oracle/19.12/client64/lib

It will print:

 *** ODBCINI environment variable not set,defaulting it to HOME directory!

Now when we print the file on the screen:

cat /etc/odbcinst.ini

We will see that there is the Oracle 19 ODBC driver section added at the end of the file::

[Oracle 19 ODBC driver]
Description     = Oracle ODBC driver for Oracle 19
Driver          = /usr/lib/oracle/19.12/client64/lib/libsqora.so.19.1
Setup           =
FileUsage       =
CPTimeout       =
CPReuse         =

It’s important to check if there are no errors produced in the output when executing the ‘ldd’ command. This ensures that the dependencies are satisfied and accessible and there are no conflicts with the library versioning:

ldd /usr/lib/oracle/19.12/client64/lib/libsqora.so.19.1

It will print something similar like:

linux-vdso.so.1 (0x00007fff121b5000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb18601c000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb185c9a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb185a7a000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fb185861000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb185659000)
libaio.so.1 => /lib64/libaio.so.1 (0x00007fb185456000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fb18523f000)
libclntsh.so.19.1 => /usr/lib/oracle/19.12/client64/lib/libclntsh.so.19.1 (0x00007fb1810e6000)
libclntshcore.so.19.1 => /usr/lib/oracle/19.12/client64/lib/libclntshcore.so.19.1 (0x00007fb180b42000)
libodbcinst.so.2 => /lib64/libodbcinst.so.2 (0x00007fb18092c000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb180567000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb1864da000)
libnnz19.so => /usr/lib/oracle/19.12/client64/lib/libnnz19.so (0x00007fb17fdba000)
libltdl.so.7 => /lib64/libltdl.so.7 (0x00007fb17fbb0000)

When we executed the ‘odbc_update_ini.sh’ script, a new DSN (data source name) file was made in ‘/root/.odbc.ini’. This is a sample configuration ODBC configuration file which describes what settings this version of ODBC driver supports.

Let’s move this configuration file from the user directories to a location accessible system-wide:

cat /root/.odbc.ini | sudo tee -a /etc/odbc.ini

And remove the file from the user directory completely:

rm /root/.odbc.ini

This way, every user in the system will use only this one ODBC configuration file.

We can now alter the existing configuration – /etc/odbc.ini. I’m highlighting things that have been changed from the defaults:

[Oracle11g]
AggregateSQLType = FLOAT
Application Attributes = T
Attributes = W
BatchAutocommitMode = IfAllSuccessful
BindAsFLOAT = F
CacheBufferSize = 20
CloseCursor = F
DisableDPM = F
DisableMTS = T
DisableRULEHint = T
Driver = Oracle 19 ODBC driver
DSN = Oracle11g
EXECSchemaOpt =
EXECSyntax = T
Failover = T
FailoverDelay = 10
FailoverRetryCount = 10
FetchBufferSize = 64000
ForceWCHAR = F
LobPrefetchSize = 8192
Lobs = T
Longs = T
MaxLargeData = 0
MaxTokenSize = 8192
MetadataIdDefault = F
QueryTimeout = T
ResultSets = T
ServerName = //10.1.10.15:1521/xe
SQLGetData extensions = F
SQLTranslateErrors = F
StatementCache = F
Translation DLL =
Translation Option = 0
UseOCIDescribeAny = F
UserID = system
Password = oracle

DNS – Data source name. Should match the section name in brackets, e.g.:[Oracle11g]
ServerName – Oracle server address
UserID – Oracle user name
Password – Oracle user password

To test the connection from the command line, let’s use the isql command-line tool which should simulate the ODBC connection akin to what the Zabbix is doing when gathering metrics:

isql -v Oracle11g

The isql command in this example picks up the ODBC settings (Username, Password, Server address) from the odbc.ini file. All we have to do is reference the particular DSN – Oracle11g

On the other hand, if we do not prefer to keep the password on the filesystem (/etc/odbc.ini), we can erase the lines ‘UserID’ and ‘Password’. Then we can test the ODBC connection with:

isql -v Oracle11g 'system' 'oracle'

In case of a successful connection it should print:

+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL>

And that’s it for the ODBC configuration! Now we should be able to apply the Oracle by ODBC template in Zabbix

Don’t forget that we also need to provide the necessary Oracle credentials to start collecting Oracle database metrics:

The lessons learned in this blog post can be easily applied to ODBC monitoring and troubleshooting in general, not just Oracle. If you’re having any issues or wish to share your experience with ODBC or Oracle database monitoring – feel free to leave us a comment!

Alaska’s Department of Health and Social Services Hack

2021-09-21 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/09/alaskas-department-of-health-and-social-services-hack.html

Apparently, a nation-state hacked Alaska’s Department of Health and Social Services.

Not sure why Alaska’s Department of Health and Social Services is of any interest to a nation-state, but that’s probably just my failure of imagination.

Automatically tune your guitar with Raspberry Pi Pico

2021-09-21 Ashley Whittaker

Post Syndicated from Ashley Whittaker original https://www.raspberrypi.org/blog/automatically-tune-your-guitar-with-raspberry-pi-pico/

You sit down with your six-string, ready to bash out that new song you recently mastered, but find you’re out of tune. Redditor u/thataintthis (Guyrandy Jean-Gilles) has taken the pain out of tuning your guitar, so those of us lacking this necessary skill can skip the boring bit and get back to playing.

Before you dismiss this project as just a Raspberry Pi Pico-powered guitar tuning box, read on, because when the maker said this is a fully automatic tuner, they meant it.

How does it work?

Guyrandy’s device listens to the sound of a string being plucked and decides which note it needs to be tuned to. Then it automatically turns the tuning keys on the guitar’s headstock just the right amount until it achieves the correct note.

Genius.

If this were a regular tuning box, it would be up to the musician to fiddle with the tuning keys while twanging the string until they hit a note that matches the one being made by the tuning box.

It’s currently hardcoded to do standard tuning, but it could be tweaked to do things like Drop D tuning.

Pico automatic guitar tuner — Waiting for that green light

Upgrade suggestions

Commenters were quick to share great ideas to make this build even better. Issues of harmonics were raised, and possible new algorithms to get around it were shared. Another commenter noticed the maker wrote their own code in C and suggested making use of the existing ulab FFT in MicroPython. And a final great idea was training the Raspberry Pi Pico to accept the guitar’s audio output as input and analyse the note that way, rather than using a microphone, which has a less clear sound quality.

These upgrades seemed to pique the maker’s interest. So maybe watch this space for a v2.0 of this project…

Shred, Otto, shred

(Watch out for some spicy language in the comments section of the original reddit post. People got pretty lively when articulating their love for this build.)

Inspiration

This project was inspired by the Roadie automatic tuning device. Roadie is sleek but it costs big cash money. And it strips you of the hours of tinkering fun you get from making your own version.

All the code for the project can be found here.

The post Automatically tune your guitar with Raspberry Pi Pico appeared first on Raspberry Pi.

Comic for 2021.09.21

2021-09-21 Explosm.net

Post Syndicated from Explosm.net original http://explosm.net/comics/5982/

New Cyanide and Happiness Comic

Live Stream with Tediore™ – Wyze LED Strip with ESP32 / MJ Fan Controller

2021-09-21 digiblurDIY

Post Syndicated from digiblurDIY original https://www.youtube.com/watch?v=CDwpxecW700

Live Fan Controller ESP8266 & Water Valve ESP32-C3 Transplants

2021-09-21 digiblurDIY

Post Syndicated from digiblurDIY original https://www.youtube.com/watch?v=ie6SBHvBR0g

Hoyt: Structural pattern matching in Python 3.10

2021-09-21

Post Syndicated from original https://lwn.net/Articles/869883/rss

Ben Hoyt has published a critical
overview of the Python 3.10 pattern-matching feature.

As shown above, there are cases where match really
shines. But they are few and far between, mostly when handling
syntax trees and writing parsers. A lot of code does have
if ... elif chains, but these are often either
plain switch-on-value, where elif works almost as well, or
the conditions they’re testing are a more complex combination of
tests that don’t fit into case patterns (unless you use awkward
case _ if cond clauses, but that’s strictly worse than
elif).

(Pattern matching has been covered here as
well).

Accelerate your data warehouse migration to Amazon Redshift – Part 4

2021-09-20 Michael Soo

Post Syndicated from Michael Soo original https://aws.amazon.com/blogs/big-data/part-4-accelerate-your-data-warehouse-migration-to-amazon-redshift/

This is the fourth in a series of posts. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially reduce your overall cost to migrate to Amazon Redshift.

Check out the previous posts in the series:

Accelerate your data warehouse migration to Amazon Redshift – Part 1 to learn more about macro conversion, case-insensitive string comparison, and other new features
Accelerate your data warehouse migration to Amazon Redshift – Part 2 to learn about automation for proprietary data types
Accelerate your data warehouse migration to Amazon Redshift – Part 3 to learn about automation for proprietary SQL statements

Amazon Redshift is the leading cloud data warehouse. No other data warehouse makes it as easy to gain new insights from your data. With Amazon Redshift, you can query exabytes of data across your data warehouse, operational data stores, and data lake using standard SQL. You can also integrate other AWS services like Amazon EMR, Amazon Athena, Amazon SageMaker, AWS Glue, AWS Lake Formation, and Amazon Kinesis to use all the analytic capabilities in the AWS Cloud.

Many customers have asked for help migrating from self-managed data warehouse engines, like Teradata, to Amazon Redshift. In these cases, you typically have terabytes or petabytes of data, a heavy reliance on proprietary features, and thousands of extract, transform, and load (ETL) processes and reports built over a few years (or decades) of use.

Until now, migrating a data warehouse to AWS was complex and involved a significant amount of manual effort. You needed to manually remediate syntax differences, inject code to replace proprietary features, and manually tune the performance of queries and reports on the new platform.

For example, you may have a significant investment in BTEQ (Basic Teradata Query) scripting for database automation, ETL, or other tasks. Previously, you needed to manually recode these scripts as part of the conversion process to Amazon Redshift. Together with supporting infrastructure (job scheduling, job logging, error handling), this was a significant impediment to migration.

Today, we’re happy to share with you a new, purpose-built command line tool called Amazon Redshift RSQL. Some of the key features added in Amazon Redshift RSQL are enhanced flow control syntax and single sign-on support. You can also describe properties or attributes of external tables in an AWS Glue catalog or Apache Hive Metastore, external databases in Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition, and tables shared using Amazon Redshift data sharing.

We have also enhanced the AWS Schema Conversion Tool (AWS SCT) to automatically convert BTEQ scripts to Amazon Redshift RSQL scripts. The converted scripts run on Amazon Redshift with little to no changes.

In this post, we describe some of the features of Amazon Redshift RSQL, show example scripts, and demonstrate how to convert BTEQ scripts into Amazon Redshift RSQL scripts.

Amazon Redshift RSQL features

If you currently use Amazon Redshift, you may already be running scripts on Amazon Redshift using the PSQL command line client. These scripts operate on Amazon Redshift RSQL with no modification. You can think of Amazon Redshift RSQL as an Amazon Redshift-native version of PSQL.

In addition, we have designed Amazon Redshift RSQL to make it easy to transition BTEQ scripts to the tool. The following are some examples of Amazon Redshift RSQL commands that make this possible. (For full details, see Amazon Redshift RSQL.)

\EXIT – This command is an extension of the PSQL \quit command. Like \quit, \EXIT terminates the execution of Amazon Redshift RSQL. In addition, you can specify an optional exit code with \EXIT.
\LOGON – This command creates a new connection to a database. \LOGON is an alias for the PSQL \connect command. You can specify connection parameters using positional syntax or as a connection string.
\REMARK – This command prints the specified string to the output. \REMARK extends the PSQL \echo command by adding the ability to break the output over multiple lines, using // as a line break.
\RUN – This command runs the Amazon Redshift RSQL script contained in the specified file. \RUN extends the PSQL \i command by adding an option to skip any number of lines in the specified file.
\OS – This is an alias for the PSQL \! command. \OS runs the operating system command that is passed as a parameter. Control returns to Amazon Redshift RSQL after running the OS command.
\LABEL – This is a new command for Amazon Redshift RSQL. \LABEL establishes an entry point for execution, as the target for a \GOTO command.
\GOTO – This command is a new command for Amazon Redshift RSQL. It’s used in conjunction with the \LABEL command. \GOTO skips all intervening commands and resumes processing at the specified \LABEL. The \LABEL must be a forward reference. You can’t jump to a \LABEL that lexically precedes the \GOTO.
\IF (\ELSEIF, \ELSE, \ENDIF) – This command is an extension of the PSQL \if (\elif, \else, \endif) command. \IF and \ELSEIF support arbitrary Boolean expressions including AND, OR, and NOT conditions. You can use the \GOTO command within a \IF block to control conditional execution.
\EXPORT – This command specifies the name of an export file that Amazon Redshift RSQL uses to store database information returned by a subsequent SQL SELECT statement.

We’ve also added some variables to Amazon Redshift RSQL to support converting your BTEQ scripts.

:ACTIVITYCOUNT – This variable returns the number of rows affected by the last submitted request. For a data-returning request, this is the number of rows returned to Amazon Redshift RSQL from the database. ACTIVITYCOUNT is similar to the PSQL variable ROW_COUNT, however, ROW_COUNT does not report affected-row count for SELECT, COPY or UNLOAD.
:ERRORCODE – This variable contains the return code for the last submitted request to the database. A zero signifies the request completed without error. The ERRORCODE variable is an alias for the variable SQLSTATE.
:ERRORLEVEL – This variable assigns severity levels to errors. Use the severity levels to determine a course of action based on the severity of the errors that Amazon Redshift RSQL encounters.
:MAXERROR – This variable designates a maximum error severity level beyond which Amazon Redshift RSQL terminates job processing.

An example Amazon Redshift RSQL script

Let’s look at an example. First, we log in to an Amazon Redshift database using Amazon Redshift RSQL. You specify the connection information on the command line as shown in the following code. The port and database are optional and default to 5439 and dev respectively if not provided.

$ rsql -h testcluster1.<example>.redshift.amazonaws.com -U testuser1 -d myredshift -p 5439
Password for user testuser1: 
DSN-less Connected
DBMS Name: Amazon Redshift
Driver Name: Amazon Redshift ODBC Driver
Driver Version: 1.4.34.1000
Rsql Version: 1.0.1
Redshift Version: 1.0.29551
Type "help" for help.

(testcluster1) testuser1@myredshift=#

If you choose to change the connection from within the client, you can use the \LOGON command:

(testcluster1) testuser1@myredshift=# \logon testschema testuser2 testcluster2.<example>.redshift.amazonaws.com
Password for user testuser2: 
DBMS Name: Amazon Redshift
Driver Name: Amazon Redshift ODBC Driver
Driver Version: 1.4.34.1000
Rsql Version: 1.0.1

Now, let’s run a simple script that runs a SELECT statement, checks for output, then branches depending on whether data was returned or not.

First, we inspect the script by using the \OS command to print the file to the screen:

(testcluster1) testuser1@myredshift=# \os cat activitycount.sql
select * from testschema.employees;

\if :ACTIVITYCOUNT = 0
  \remark '****No data found****'
  \goto LETSQUIT
\else
  \remark '****Data found****'
  \goto LETSDOSOMETHING
\endif

\label LETSQUIT
\remark '****We are quitting****'
\exit 0

\label LETSDOSOMETHING
\remark '****We are doing it****'
\exit 0

The script prints one of two messages depending on whether data is returned by the SELECT statement or not.

Now, let’s run the script using the \RUN command. The SELECT statement returns 11 rows of data. The script prints a “data found” message, and jumps to the LETSDOSOMETHING label.

(testcluster1) testuser1@myredshift=# \run file=activitycount.sql
  id  | name    | manager_id | last_promo_date
 -----+---------+------------+-----------------
 112  | Britney | 201        | 2041-03-30
 101  | Bob     | 100        |
 110  | Mark    | 201        |
 106  | Jeff    | 102        |
 201  | Ana     | 104        |
 104  | Chris   | 103        |
 111  | Phyllis | 103        |
 102  | Renee   | 101        | 2021-01-01
 100  | Caitlin |            | 2021-01-01
 105  | David   | 103        | 2021-01-01
 103  | John    | 101        |
 (11 rows)

****Data found****
\label LETSQUIT ignored
\label LETSDOSOMETHING processed
****We are doing it****

That’s Amazon Redshift RSQL in a nutshell. If you’re developing new scripts for Amazon Redshift, we encourage you to use Amazon Redshift RSQL and take advantage of its additional capabilities. If you have existing PSQL scripts, you can run those scripts using Amazon Redshift RSQL with no changes.

Use AWS SCT to automate your BTEQ conversions

If you’re a Teradata developer or DBA, you’ve probably built a library of BTEQ scripts that you use to perform administrative work, load or transform data, or to generate datasets and reports. If you’re contemplating a migration to Amazon Redshift, you’ll want to preserve the investment you made in creating those scripts.

AWS SCT has long had the ability to convert BTEQ to AWS Glue. Now, you can also use AWS SCT to automatically convert BTEQ scripts to Amazon Redshift RSQL. AWS SCT supports all the new Amazon Redshift RSQL features like conditional execution, escape to the shell, and branching.

Let’s see how it works. We create two Teradata tables, product_stg and product. Then we create a simple ETL script that uses a MERGE statement to update the product table using data from the product_stg table:

CREATE TABLE testschema.product_stg (
  prod_id INTEGER
, description VARCHAR(100) CHARACTER SET LATIN
,category_id INTEGER)
UNIQUE PRIMARY INDEX ( prod_id );

CREATE TABLE testschema.product (
  prod_id INTEGER
, description VARCHAR(100) CHARACTER SET LATIN
, category_id INTEGER)
UNIQUE PRIMARY INDEX ( prod_id );

We embed the MERGE statement inside a BTEQ script. The script tests error conditions and branches accordingly:

.SET WIDTH 100

SELECT COUNT(*) 
FROM testschema.product_stg 
HAVING COUNT(*) > 0;

.IF ACTIVITYCOUNT = 0 then .GOTO NODATA;

MERGE INTO testschema.product tgt 
USING testschema.product_stg stg 
   ON tgt.prod_id = stg.prod_id
WHEN MATCHED THEN UPDATE SET
      description = stg.description
    , category_id = stg.category_id
WHEN NOT MATCHED THEN INSERT VALUES (
  stg.prod_id
, stg.description
, stg.category_id
);

.GOTO ALLDONE;

.LABEL NODATA

.REMARK 'Staging table is empty. Stopping'

.LABEL ALLDONE 

.QUIT;

Now, let’s use AWS SCT to convert the script to Amazon Redshift RSQL. AWS SCT converts the BTEQ commands to their Amazon Redshift RSQL and Amazon Redshift equivalents. The converted script is as follows:

\rset width 100
SELECT
    COUNT(*)
    FROM testschema.product_stg
    HAVING COUNT(*) > 0;
\if :ACTIVITYCOUNT = 0
    \goto NODATA
\endif
UPDATE testschema.product
SET description = stg.description, category_id = stg.category_id
FROM testschema.product_stg AS stg
JOIN testschema.product AS tgt
    ON tgt.prod_id = stg.prod_id;
INSERT INTO testschema.product
SELECT
    stg.prod_id, stg.description, stg.category_id
    FROM testschema.product_stg AS stg
    WHERE NOT EXISTS (
        SELECT 1
        FROM testschema.product AS tgt
        WHERE tgt.prod_id = stg.prod_id);
\goto ALLDONE
\label NODATA
\remark 'Staging table is empty. Stopping'
\label ALLDONE
\quit :ERRORLEVEL

The following are the main points of interest in the conversion:

The BTEQ .SET WIDTH command is converted to the Amazon Redshift RSQL \RSET WIDTH command.
The BTEQ ACTIVITYCOUNT variable is converted to the Amazon Redshift RSQL ACTIVITYCOUNT variable.
The BTEQ MERGE statement is converted into an UPDATE followed by an INSERT statement. Currently, Amazon Redshift doesn’t support a native MERGE statement.
The BTEQ .LABEL and .GOTO statements are translated to their Amazon Redshift RSQL equivalents \LABEL and \GOTO.

Let’s look at the actual process of using AWS SCT to convert a BTEQ script.

After starting AWS SCT, you create a Teradata migration project and navigate to the BTEQ scripts node in the source tree window pane. Right-click and choose Load scripts.

Then select the folder that contains your BTEQ scripts. The folder appears in the source tree. Open it and navigate to the script you want to convert. In our case, the script is contained in the file merge.sql. Right-click on the file, choose Convert script, then choose Convert to RSQL.You can inspect the converted script in the bottom middle pane. When you’re ready to save the script to a file, do that from the target tree on the right side.

If you have many BTEQ scripts, you can convert an entire folder at once by selecting the folder instead of an individual file.

Convert shell scripts

Many applications run BTEQ commands from within shell scripts. For example, you may have a shell script that redirects log output and controls login credentials, as in the following:

bteq <<EOF >> ${LOG} 2>&1

.run file $LOGON;

SELECT COUNT(*) 
FROM testschema.product_stg 
HAVING COUNT(*) > 0;
…

EOF

If you use shell scripts to run BTEQ, we’re happy to share that AWS SCT can help you convert those scripts. AWS SCT supports bash scripts now, and we’ll add additional shell dialects in the future.

The process to convert shell scripts is very similar to BTEQ conversion. You select a folder that contains your scripts by navigating to the Shell node in the source tree and then choosing Load scripts.

After the folder is loaded, you can convert one (or more) scripts by selecting them and choosing Convert script.

As before, the converted script appears in the UI, and you can save it from the target tree on the right side of the page.

Conclusion

We’re happy to share Amazon Redshift RSQL and expect it to be a big hit with customers. If you’re contemplating a migration from Teradata to Amazon Redshift, Amazon Redshift RSQL and AWS SCT can simplify the conversion of your existing Teradata scripts and help preserve your investment in existing reports, applications, and ETL.

All of the features described in this post are available for you to use today. You can download Amazon Redshift RSQL and AWS SCT and give it a try.

We’ll be back soon with the next installment in this series. Check back for more information on automating your migrations from Teradata to Amazon Redshift. In the meantime, you can learn more about Amazon Redshift, Amazon Redshift RSQL, and AWS SCT. Happy migrating!

About the Authors

Michael Soo is a Senior Database Engineer with the AWS Database Migration Service team. He builds products and services that help customers migrate their database workloads to the AWS cloud.

Po Hong, PhD, is a Principal Data Architect of Lake House Global Specialty Practice,
AWS Professional Services. He is passionate about supporting customers to adopt innovative solutions to reduce time to insight. Po is specialized in migrating large scale MPP on-premises data warehouses to the AWS Lake House architecture.

Entong Shen is a Software Development Manager of Amazon Redshift. He has been working on MPP databases for over 9 years and has focused on query optimization, statistics and migration related SQL language features such as stored procedures and data types.

Adekunle Adedotun is a Sr. Database Engineer with Amazon Redshift service. He has been working on MPP databases for 6 years with a focus on performance tuning. He also provides guidance to the development team for new and existing service features.

Asia Khytun is a Software Development Manager for the AWS Schema Conversion Tool. She has 10+ years of software development experience in C, C++, and Java.

Illia Kratsov is a Database Developer with the AWS Project Delta Migration team. He has 10+ years experience in data warehouse development with Teradata and other MPP databases.

Accelerate your data warehouse migration to Amazon Redshift – Part 3

2021-09-20 Michael Soo

Post Syndicated from Michael Soo original https://aws.amazon.com/blogs/big-data/part-3-accelerate-your-data-warehouse-migration-to-amazon-redshift/

This is the third post in a multi-part series. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and reduce your overall cost to migrate to Amazon Redshift.

Check out the previous posts in the series:

Accelerate your data warehouse migration to Amazon Redshift – Part 1 to learn more about automated conversion of database macros, case-insensitive string comparison, and case-sensitive identifiers,
Accelerate your data warehouse migration to Amazon Redshift – Part 2 to learn about automatic conversion for proprietary data types.

Amazon Redshift is the leading cloud data warehouse. No other data warehouse makes it as easy to gain new insights from your data. With Amazon Redshift, you can query exabytes of data across your data warehouse, operational data stores, and data lake using standard SQL. You can also integrate other services such as Amazon EMR, Amazon Athena, and Amazon SageMaker to use all the analytic capabilities in the AWS Cloud.

Many customers have asked for help migrating from self-managed data warehouse engines, like Teradata, to Amazon Redshift. In these cases, you may have terabytes (or petabytes) of historical data, a heavy reliance on proprietary features, and thousands of extract, transform, and load (ETL) processes and reports built over years (or decades) of use.

Until now, migrating a Teradata data warehouse to AWS was complex and involved a significant amount of manual effort.

Today, we’re happy to share recent enhancements to Amazon Redshift and the AWS Schema Conversion Tool (AWS SCT) that make it easier to automate your Teradata to Amazon Redshift migrations.

In this post, we introduce new automation for merge statements, a native function to support ASCII character conversion, enhanced error checking for string to date conversion, enhanced support for Teradata cursors and identity columns, automation for ANY and SOME predicates, automation for RESET WHEN clauses, automation for two proprietary Teradata functions (TD_NORMALIZE_OVERLAP and TD_UNPIVOT), and automation to support analytic functions (QUANTILE and QUALIFY).

Merge statement

Like its name implies, the merge statement takes an input set and merges it into a target table. If an input row already exists in the target table (a row in the target table has the same primary key value), then the target row is updated. If there is no matching target row, the input row is inserted into the table.

Until now, if you used merge statements in your workload, you were forced to manually rewrite the merge statement to run on Amazon Redshift. Now, we’re happy to share that AWS SCT automates this conversion for you. AWS SCT decomposes a merge statement into an update on existing records followed by an insert for new records.

Let’s look at an example. We create two tables in Teradata: a target table, employee, and a delta table, employee_delta, where we stage the input rows:

CREATE TABLE testschema.employee(
  id INTEGER
, name VARCHAR(20)
, manager INTEGER)
UNIQUE PRIMARY INDEX (id)
;

CREATE TABLE testschema.employee_delta (
  id INTEGER
, name VARCHAR(20)
, manager INTEGER)
UNIQUE PRIMARY INDEX(id)
;

Now we create a Teradata merge statement that updates a row if it exists in the target, otherwise it inserts the new row. We embed this merge statement into a macro so we can show you the conversion process later.

REPLACE MACRO testschema.merge_employees AS (
  MERGE INTO testschema.employee tgt
  USING testschema.employee_delta delta
    ON delta.id = tgt.id
  WHEN MATCHED THEN
    UPDATE SET name = delta.name, manager = delta.manager
  WHEN NOT MATCHED THEN
    INSERT (delta.id, delta.name, delta.manager);
);

Now we use AWS SCT to convert the macro. (See Accelerate your data warehouse migration to Amazon Redshift – Part 1 for details on macro conversion.) AWS SCT creates a stored procedure that contains an update (to implement the WHEN MATCHED condition) and an insert (to implement the WHEN NOT MATCHED condition).

CREATE OR REPLACE PROCEDURE testschema.merge_employees()
AS $BODY$
BEGIN
    UPDATE testschema.employee
    SET name = "delta".name, manager = "delta".manager
    FROM testschema.employee_delta AS delta JOIN testschema.employee AS tgt
        ON "delta".id = tgt.id;
      
    INSERT INTO testschema.employee
    SELECT
      "delta".id
    , "delta".name
    , "delta".manager
    FROM testschema.employee_delta AS delta
    WHERE NOT EXISTS (
      SELECT 1
      FROM testschema.employee AS tgt
      WHERE "delta".id = tgt.id
    );
END;
$BODY$
LANGUAGE plpgsql;

This example showed how to use merge automation for macros, but you can convert merge statements in any application context: stored procedures, BTEQ scripts, Java code, and more. Download the latest version of AWS SCT and try it out.

ASCII() function

The ASCII function takes as input a string and returns the ASCII code, or more precisely, the UNICODE code point, of the first character in the string. Previously, Amazon Redshift supported ASCII as a leader-node only function, which prevented its use with user-defined tables.

We’re happy to share that the ASCII function is now available on Amazon Redshift compute nodes and can be used with user-defined tables. In the following code, we create a table with some string data:

CREATE TABLE testschema.char_table (
  id INTEGER
, char_col  CHAR(10)
, varchar_col VARCHAR(10)
);

INSERT INTO testschema.char_table VALUES (1, 'Hello', 'world');

Now you can use the ASCII function on the string columns:

# SELECT id, char_col, ascii(char_col), varchar_col, ascii(varchar_col) FROM testschema.char_table;

 id |  char_col  | ascii | varchar_col | ascii 
  1 | Hello      |    72 | world       |   119

Lastly, if your application code uses the ASCII function, AWS SCT automatically converts any such function calls to Amazon Redshift.

The ASCII feature is available now—try it out in your own cluster.

TO_DATE() function

The TO_DATE function converts a character string into a DATE value. A quirk of this function is that it can accept a string value that isn’t a valid date and translate it into a valid date.

For example, consider the string 2021-06-31. This isn’t a valid date because the month of June has only 30 days. However, the TO_DATE function accepts this string and returns the “31st” day of June (July 1):

# SELECT to_date('2021-06-31', 'YYYY-MM-DD');
 to_date 
 2021-07-01
(1 row)

Customers have asked for strict input checking for TO_DATE, and we’re happy to share this new capability. Now, you can include a Boolean value in the function call that turns on strict checking:

# SELECT to_date('2021-06-31', 'YYYY-MM-DD', TRUE);
ERROR: date/time field date value out of range: 2021-6-31

You can turn off strict checking explicitly as well:

# SELECT to_date('2021-06-31', 'YYYY-MM-DD', FALSE);
 to_date 
 2021-07-01
(1 row)

Also, the Boolean value is optional. If you don’t include it, strict checking is turned off, and you see the same behavior as before the feature was launched.

You can learn more about the TO_DATE function and try out strict date checking in Amazon Redshift now.

CURSOR result sets

A cursor is a programming language construct that applications use to manipulate a result set one row at a time. Cursors are more relevant for OLTP applications, but some legacy applications built on data warehouses also use them.

Teradata provides a diverse set of cursor configurations. Amazon Redshift supports a more streamlined set of cursor features.

Based on customer feedback, we’ve added automation to support Teradata WITH RETURN cursors. These types of cursors are opened within stored procedures and returned to the caller for processing of the result set. AWS SCT will convert a WITH RETURN cursor to an Amazon Redshift REFCURSOR.

For example, consider the following procedure, which contains a WITH RETURN cursor. The procedure opens the cursor and returns the result to the caller as a DYNAMIC RESULT SET:

REPLACE PROCEDURE testschema.employee_cursor (IN p_mgrid INTEGER) DYNAMIC RESULT SETS 1
BEGIN
   DECLARE result_set CURSOR WITH RETURN ONLY FOR 
     SELECT id, name, manager 
     FROM testschema.employee
     WHERE manager = to_char(p_mgrid); 
   OPEN result_set;
END;

AWS SCT converts the procedure as follows. An additional parameter is added to the procedure signature to pass the REFCURSOR:

CREATE OR REPLACE PROCEDURE testschema.employee_cursor(par_p_mgrid IN INTEGER, dynamic_return_cursor INOUT refcursor)
AS $BODY$
DECLARE
BEGIN
    OPEN dynamic_return_cursor FOR
    SELECT
        id, name, manager
        FROM testschema.employee
        WHERE manager = to_char(par_p_mgrid, '99999');
END;
$BODY$
LANGUAGE plpgsql;

IDENTITY columns

Teradata supports several non-ANSI compliant features for IDENTITY columns. We have enhanced AWS SCT to automatically convert these features to Amazon Redshift, whenever possible.

Specifically, AWS SCT now converts the Teradata START WITH and INCREMENT BY clauses to the Amazon Redshift SEED and STEP clauses, respectively. For example, consider the following Teradata table:

CREATE TABLE testschema.identity_table (
  a2 BIGINT GENERATED ALWAYS AS IDENTITY (
    START WITH 1 
    INCREMENT BY 20
  )
);

The GENERATED ALWAYS clause indicates that the column is always populated automatically—a value can’t be explicitly inserted or updated into the column. The START WITH clause defines the first value to be inserted into the column, and the INCREMENT BY clause defines the next value to insert into the column.

When you convert this table using AWS SCT, the following Amazon Redshift DDL is produced. Notice that the START WITH and INCREMENT BY values are preserved in the target syntax:

CREATE TABLE IF NOT EXISTS testschema.identity_table (
  a2 BIGINT IDENTITY(1, 20)
)
DISTSTYLE KEY
DISTKEY
(a2)
SORTKEY
(a2);

Also, by default, an IDENTITY column in Amazon Redshift only contains auto-generated values, so that the GENERATED ALWAYS property in Teradata is preserved:

# INSERT INTO testschema.identity_table VALUES (100);
ERROR:  cannot set an identity column to a value

IDENTITY columns in Teradata can also be specified as GENERATED BY DEFAULT. In this case, a value can be explicitly defined in an INSERT statement. If no value is specified, the column is filled with an auto-generated value like normal. Before, AWS SCT didn’t support conversion for GENERATED BY DEFAULT columns. Now, we’re happy to share that AWS SCT automatically converts such columns for you.

For example, the following table contains an IDENTITY column that is GENERATED BY DEFAULT:

CREATE TABLE testschema.identity_by_default (
  a1 BIGINT GENERATED BY DEFAULT AS IDENTITY (
     START WITH 1 
     INCREMENT BY 20 
  )
PRIMARY INDEX (a1);

The IDENTITY column is converted by AWS SCT as follows. The converted column uses the Amazon Redshift GENERATED BY DEFAULT clause:

CREATE TABLE testschema.identity_by_default (
  a1 BIGINT GENERATED BY DEFAULT AS IDENTITY(1,20) DISTKEY
)                                                          
 DISTSTYLE KEY                                               
 SORTKEY (a1);

There is one additional syntax issue that requires attention. In Teradata, an auto-generated value is inserted when NULL is specified for the column value:

INSERT INTO identity_by_default VALUES (null);

Amazon Redshift uses a different syntax for the same purpose. Here, you include the keyword DEFAULT in the values list to indicate that the column should be auto-generated:

INSERT INTO testschema.identity_by_default VALUES (default);

We’re happy to share that AWS SCT automatically converts the Teradata syntax for INSERT statements like the preceding example. For example, consider the following Teradata macro:

REPLACE MACRO testschema.insert_identity_by_default AS (
  INSERT INTO testschema.identity_by_default VALUES (NULL);
);

AWS SCT removes the NULL and replaces it with DEFAULT:

CREATE OR REPLACE PROCEDURE testschema.insert_identity_by_default() LANGUAGE plpgsql
AS $$                                                              
BEGIN                                                              
  INSERT INTO testschema.identity_by_default VALUES (DEFAULT);
END;                                                               
$$

IDENTITY column automation is available now in AWS SCT. You can download the latest version and try it out.

ANY and SOME filters with inequality predicates

The ANY and SOME filters determine if a predicate applies to one or more values in a list. For example, in Teradata, you can use <> ANY to find all employees who don’t work for a certain manager:

REPLACE MACRO testschema.not_in_103 AS (
  SELECT *
  FROM testschema.employee 
  WHERE manager <> ANY (103)
;
);

Of course, you can rewrite this query using a simple not equal filter, but you often see queries from third-party SQL generators that follow this pattern.

Amazon Redshift doesn’t support this syntax natively. Before, any queries using this syntax had to be manually converted. Now, we’re happy to share that AWS SCT automatically converts ANY and SOME clauses with inequality predicates. The macro above is converted to a stored procedure as follows.

CREATE OR REPLACE PROCEDURE testschema.not_in_103(macro_out INOUT refcursor)
AS $BODY$
BEGIN
    OPEN macro_out FOR
    SELECT *
    FROM testschema.employee
    WHERE ((manager <> 103));
END;
$BODY$
LANGUAGE plpgsql;

If the values list following the ANY contains two more values, AWS SCT will convert this to a series of OR conditions, one for each element in the list.

ANY/SOME filter conversion is available now in AWS SCT. You can try it out in the latest version of the application.

Analytic functions with RESET WHEN

RESET WHEN is a Teradata feature used in SQL analytical window functions. It’s an extension to the ANSI SQL standard. RESET WHEN determines the partition over which a SQL window function operates based on a specified condition. If the condition evaluates to true, a new dynamic sub-partition is created inside the existing window partition.

For example, the following view uses RESET WHEN to compute a running total by store. The running total accumulates as long as sales increase month over month. If sales drop from one month to the next, the running total resets.

CREATE TABLE testschema.sales (
  store_id INTEGER
, month_no INTEGER
, sales_amount DECIMAL(9,2)
)
;

REPLACE VIEW testschema.running_total (
  store_id
, month_no
, sales_amount
, cume_sales_amount
)
AS
SELECT 
  store_id
, month_no
, sales_amount
, SUM(sales_amount) OVER (
     PARTITION BY store_id 
     ORDER BY month_no
     RESET WHEN sales_amount < SUM(sales_amount) OVER (
       PARTITION BY store_id
       ORDER BY month_no
       ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
     )
     ROWS UNBOUNDED PRECEDING 
  )
FROM testschema.sales;

To demonstrate, we insert some test data into the table:

INSERT INTO testschema.sales VALUES (1001, 1, 35000.00);
INSERT INTO testschema.sales VALUES (1001, 2, 40000.00);
INSERT INTO testschema.sales VALUES (1001, 3, 45000.00);
INSERT INTO testschema.sales VALUES (1001, 4, 25000.00);
INSERT INTO testschema.sales VALUES (1001, 5, 30000.00);
INSERT INTO testschema.sales VALUES (1001, 6, 30000.00);
INSERT INTO testschema.sales VALUES (1001, 7, 50000.00);
INSERT INTO testschema.sales VALUES (1001, 8, 35000.00);
INSERT INTO testschema.sales VALUES (1001, 9, 60000.00);
INSERT INTO testschema.sales VALUES (1001, 10, 80000.00);
INSERT INTO testschema.sales VALUES (1001, 11, 90000.00);
INSERT INTO testschema.sales VALUES (1001, 12, 100000.00);

The sales amounts drop after months 3 and 7. The running total is reset accordingly at months 4 and 8.

SELECT * FROM testschema.running_total;

   store_id     month_no  sales_amount  cume_sales_amount
-----------  -----------  ------------  -----------------
       1001            1      35000.00           35000.00
       1001            2      40000.00           75000.00
       1001            3      45000.00          120000.00
       1001            4      25000.00           25000.00
       1001            5      30000.00           55000.00
       1001            6      30000.00           85000.00
       1001            7      50000.00          135000.00
       1001            8      35000.00           35000.00
       1001            9      60000.00           95000.00
       1001           10      80000.00          175000.00
       1001           11      90000.00          265000.00
       1001           12     100000.00          365000.00

AWS SCT converts the view as follows. The converted code uses a subquery to emulate the RESET WHEN. Essentially, a marker attribute is added to the result that flags a month over month sales drop. The flag is then used to determine the longest preceding run of increasing sales to aggregate.

CREATE OR REPLACE VIEW testschema.running_total (
  store_id
, month_no
, sales_amount
, cume_sales_amount) AS
SELECT
  store_id
, month_no
, sales_amount
, sum(sales_amount) OVER 
    (PARTITION BY k1, store_id ORDER BY month_no NULLS 
     FIRST ROWS UNBOUNDED      PRECEDING)
FROM (
  SELECT
   store_id
 , month_no
 , sales_amount
 , SUM(CASE WHEN k = 1 THEN 0 ELSE 1 END) OVER 
     (PARTITION BY store_id ORDER BY month_no NULLS 
       FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS k1
 FROM (
   SELECT
     store_id
   , month_no
   , sales_amount
   , CASE WHEN sales_amount < SUM(sales_amount) OVER 
      (PARTITION BY store_id ORDER BY month_no 
        ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) 
      OR sales_amount IS NULL THEN 0 ELSE 1 END AS k
   FROM testschema.sales
  )
);

We expect that RESET WHEN conversion will be a big hit with customers. You can try it now in AWS SCT.

TD_NORMALIZE_OVERLAP() function

The TD_NORMALIZE_OVERLAP function combines rows that have overlapping PERIOD values. The resulting normalized row contains the earliest starting bound and the latest ending bound from the PERIOD values of all the rows involved.

For example, we create a Teradata table that records employee salaries with the following code. Each row in the table is timestamped with the period that the employee was paid the given salary.

CREATE TABLE testschema.salaries (
  emp_id INTEGER
, salary DECIMAL(8,2)
, from_to PERIOD(DATE)
);

Now we add data for two employees. For emp_id = 1 and salary = 2000, there are two overlapping rows. Similarly, the two rows with emp_id = 2 and salary = 3000 are overlapping.

SELECT * FROM testschema.salaries ORDER BY emp_id, from_to;

     emp_id      salary  from_to
-----------  ----------  ------------------------
          1     1000.00  ('20/01/01', '20/05/31')
          1     2000.00  ('20/06/01', '21/02/28')
          1     2000.00  ('21/01/01', '21/06/30')
          2     3000.00  ('20/01/01', '20/03/31')
          2     3000.00  ('20/02/01', '20/04/30')

Now we create a view that uses the TD_NORMALIZE_OVERLAP function to normalize the overlapping data:

REPLACE VIEW testschema.normalize_salaries AS 
WITH sub_table(emp_id, salary, from_to) AS (
  SELECT 
    emp_id
  , salary
  , from_to
  FROM testschema.salaries
)
SELECT *
FROM 
  TABLE(TD_SYSFNLIB.TD_NORMALIZE_OVERLAP (NEW VARIANT_TYPE(sub_table.emp_id, sub_table.salary), sub_table.from_to)
    RETURNS (emp_id INTEGER, salary DECIMAL(8,2), from_to PERIOD(DATE))
    HASH BY emp_id
    LOCAL ORDER BY emp_id, salary, from_to
  ) AS DT(emp_id, salary, duration)
;

We can check that the view data is actually normalized:

select * from testschema.normalize_salaries order by emp_id, duration;

     emp_id      salary  duration
-----------  ----------  ------------------------
          1     1000.00  ('20/01/01', '20/05/31')
          1     2000.00  ('20/06/01', '21/06/30')
          2     3000.00  ('20/01/01', '20/04/30')

You can now use AWS SCT to convert any TD_NORMALIZE_OVERLAP statements. We first convert the salaries table to Amazon Redshift (see Accelerate your data warehouse migration to Amazon Redshift – Part 2 for details about period data type automation):

CREATE TABLE testschema.salaries (
  emp_id integer distkey
, salary numeric(8,2) ENCODE az64
, from_to_begin date ENCODE az64
, from_to_end date ENCODE az64    
)                                   
DISTSTYLE KEY                       
SORTKEY (emp_id);

# SELECT * FROM testschema.salaries ORDER BY emp_id, from_to_begin;
 emp_id | salary  | from_to_begin | from_to_end 
      1 | 1000.00 | 2020-01-01    | 2020-05-31
      1 | 2000.00 | 2020-06-01    | 2021-02-28
      1 | 2000.00 | 2021-01-01    | 2021-06-30
      2 | 3000.00 | 2020-01-01    | 2020-03-31
      2 | 3000.00 | 2020-02-01    | 2020-04-30

Now we use AWS SCT to convert the normalize_salaries view. AWS SCT adds a column that marks the start of a new group of rows. It then produces a single row for each group with a normalized timestamp.

CREATE VIEW testschema.normalize_salaries (emp_id, salary, from_to_begin, from_to_end) AS
WITH sub_table AS (
  SELECT
    emp_id
  , salary
  , from_to_begin AS start_date
  , from_to_end AS end_date
  , CASE
      WHEN start_date <= lag(end_date) OVER (PARTITION BY emp_id, salary ORDER BY start_date, end_date) THEN 0 
      ELSE 1
    END AS GroupStartFlag
    FROM testschema.salaries
  )
SELECT
  t2.emp_id
, t2.salary
, min(t2.start_date) AS from_to_begin
, max(t2.end_date) AS from_to_end
FROM (
  SELECT
    emp_id
  , salary
  , start_date
  , end_date
  , sum(GroupStartFlag) OVER (PARTITION BY emp_id, salary ORDER BY start_date ROWS UNBOUNDED PRECEDING) AS GroupID
  FROM 
    sub_table
) AS t2
GROUP BY 
  t2.emp_id
, t2.salary
, t2.GroupID;

We can check that the converted view returns the correctly normalized data:

# SELECT * FROM testschema.normalize_salaries ORDER BY emp_id;
 emp_id | salary  | from_to_begin | from_to_end 
      1 | 1000.00 | 2020-01-01    | 2020-05-31
      1 | 2000.00 | 2020-06-01    | 2021-06-30
      2 | 3000.00 | 2020-01-01    | 2020-04-30

You can try out TD_NORMALIZE_OVERLAP conversion in the latest release of AWS SCT. Download it now.

TD_UNPIVOT() function

The TD_UNPIVOT function transforms columns into rows. Essentially, we use it to take a row of similar metrics over different time periods and create a separate row for each metric.

For example, consider the following Teradata table. The table records customer visits by year and month for small kiosk stores:

CREATE TABLE TESTSCHEMA.kiosk_monthly_visits (
  kiosk_id INTEGER
, year_no INTEGER
, jan_visits INTEGER
, feb_visits INTEGER
, mar_visits INTEGER
, apr_visits INTEGER
, may_visits INTEGER
, jun_visits INTEGER
, jul_visits INTEGER
, aug_visits INTEGER
, sep_visits INTEGER
, oct_visits INTEGER
, nov_visits INTEGER
, dec_visits INTEGER)
PRIMARY INDEX (kiosk_id);

We insert some sample data into the table:

INSERT INTO testschema.kiosk_monthly_visits VALUES (100, 2020, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200);

Next, we create a view that unpivots the table so that the monthly visits appear on separate rows. The single row in the pivoted table creates 12 rows in the unpivoted table, one row per month.

REPLACE VIEW testschema.unpivot_kiosk_monthly_visits (
  kiosk_id
, year_no
, month_name
, month_visits
)
AS
SELECT 
  kiosk_id
, year_no
, month_name (FORMAT 'X(10)')
, month_visits
FROM TD_UNPIVOT (
 ON (SELECT * FROM testschema.kiosk_monthly_visits)
 USING
 VALUE_COLUMNS ('month_visits')
 UNPIVOT_COLUMN('month_name')
 COLUMN_LIST(
   'jan_visits'
 , 'feb_visits'
 , 'mar_visits'
 , 'apr_visits'
 , 'may_visits'
 , 'jun_visits'
 , 'jul_visits'
 , 'aug_visits'
 , 'sep_visits'
 , 'oct_visits'
 , 'nov_visits'
 , 'dec_visits'
 )
 COLUMN_ALIAS_LIST (
   'jan'
 , 'feb'
 , 'mar'
 , 'apr'
 , 'may'
 , 'jun'
 , 'jul'
 , 'aug'
 , 'sep'
 , 'oct'
 , 'nov'
 , 'dec'
 )
) a;

When you select from the view, the monthly sales are unpivoted into 12 separate rows:

SELECT * FROM testschema.unpivot_monthly_sales;

 id           yr           mon     mon_sales
----------- ----------- ---------- ----------
100         2021        jan           1100.00
100         2021        feb           1200.00
100         2021        mar           1300.00
100         2021        apr           1400.00
100         2021        may           1500.00
100         2021        jun           1600.00
100         2021        jul           1700.00
100         2021        aug           1800.00
100         2021        sep           1900.00
100         2021        oct           2000.00
100         2021        nov           2100.00
100         2021        dec           2200.00

Now we use AWS SCT to convert the view into ANSI SQL that can be run on Amazon Redshift. The conversion creates a common table expression (CTE) to place each month in a separate row. It then joins the CTE and the remaining attributes from the original pivoted table.

REPLACE VIEW testschema.unpivot_kiosk_monthly_visits (kiosk_id, year_no, month_name, month_visits) AS
WITH cols
AS (SELECT
    'jan' AS col
UNION ALL
SELECT
    'feb' AS col
UNION ALL
SELECT
    'mar' AS col
UNION ALL
SELECT
    'apr' AS col
UNION ALL
SELECT
    'may' AS col
UNION ALL
SELECT
    'jun' AS col
UNION ALL
SELECT
    'jul' AS col
UNION ALL
SELECT
    'aug' AS col
UNION ALL
SELECT
    'sep' AS col
UNION ALL
SELECT
    'oct' AS col
UNION ALL
SELECT
    'nov' AS col
UNION ALL
SELECT
    'dec' AS col)
SELECT
    t1.kiosk_id, t1.year_no, col AS "month_name",
    CASE col
        WHEN 'jan' THEN "jan_visits"
        WHEN 'feb' THEN "feb_visits"
        WHEN 'mar' THEN "mar_visits"
        WHEN 'apr' THEN "apr_visits"
        WHEN 'may' THEN "may_visits"
        WHEN 'jun' THEN "jun_visits"
        WHEN 'jul' THEN "jul_visits"
        WHEN 'aug' THEN "aug_visits"
        WHEN 'sep' THEN "sep_visits"
        WHEN 'oct' THEN "oct_visits"
        WHEN 'nov' THEN "nov_visits"
        WHEN 'dec' THEN "dec_visits"
        ELSE NULL
    END AS "month_visits"
    FROM testschema.kiosk_monthly_visits AS t1
    CROSS JOIN cols
    WHERE month_visits IS NOT NULL;

You can check that the converted view produces the same result as the Teradata version:

# SELECT * FROM testschema.unpivot_kiosk_monthly_visits;
 kiosk_id | year_no | month_name | month_visits 
      100 |    2020 | oct        |        2000
      100 |    2020 | nov        |        2100
      100 |    2020 | jul        |        1700
      100 |    2020 | feb        |        1200
      100 |    2020 | apr        |        1400
      100 |    2020 | aug        |        1800
      100 |    2020 | sep        |        1900
      100 |    2020 | jan        |        1100
      100 |    2020 | mar        |        1300
      100 |    2020 | may        |        1500
      100 |    2020 | jun        |        1600
      100 |    2020 | dec        |        2200

You can try out the conversion support for TD_UNPIVOT in the latest version of AWS SCT.

QUANTILE function

QUANTILE is a ranking function. It partitions the input set into a specified number of groups, each containing an equal portion of the total population. QUANTILE is a proprietary Teradata extension of the NTILE function found in ANSI SQL.

For example, we can compute the quartiles of the monthly visit data using the following Teradata view:

REPLACE VIEW testschema.monthly_visit_rank AS
SELECT
  kiosk_id
, year_no
, month_name
, month_visits
, QUANTILE(4, month_visits) qtile
FROM
 testschema.unpivot_kiosk_monthly_visits
;

When you select from the view, the QUANTILE function computes the quartile and applies it as an attribute on the output:

SELECT * FROM monthly_visit_rank;

   kiosk_id      year_no  month_name  month_visits        qtile
-----------  -----------  ----------  ------------  -----------
        100         2020  jan                 1100            0
        100         2020  feb                 1200            0
        100         2020  mar                 1300            0
        100         2020  apr                 1400            1
        100         2020  may                 1500            1
        100         2020  jun                 1600            1
        100         2020  jul                 1700            2
        100         2020  aug                 1800            2
        100         2020  sep                 1900            2
        100         2020  oct                 2000            3
        100         2020  nov                 2100            3
        100         2020  dec                 2200            3

Amazon Redshift supports a generalized NTILE function, which can implement QUANTILE, and is ANSI-compliant. We’ve enhanced AWS SCT to automatically convert QUANTILE function calls to equivalent NTILE function calls.

For example, when you convert the preceding Teradata view, AWS SCT produces the following Amazon Redshift code:

SELECT 
  unpivot_kiosk_monthly_visits.kiosk_id
, unpivot_kiosk_monthly_visits.year_no
, unpivot_kiosk_monthly_visits.month_name
, unpivot_kiosk_monthly_visits.month_visits
, ntile(4) OVER (ORDER BY unpivot_kiosk_monthly_visits.month_visits ASC  NULLS FIRST) - 1) AS qtile 
FROM 
  testschema.unpivot_kiosk_monthly_visits
;

QUANTILE conversion support is available now in AWS SCT.

QUALIFY filter

The QUALIFY clause in Teradata filters rows produced by an analytic function. Let’s look at an example. We use the following table, which contains store revenue by month. Our goal is to find the top five months by revenue:

CREATE TABLE testschema.sales (
  store_id INTEGER
, month_no INTEGER
, sales_amount DECIMAL(9,2))
PRIMARY INDEX (store_id);


SELECT * FROM sales;

   store_id     month_no  sales_amount
-----------  -----------  ------------
       1001            1      35000.00
       1001            2      40000.00
       1001            3      45000.00
       1001            4      25000.00
       1001            5      30000.00
       1001            6      30000.00
       1001            7      50000.00
       1001            8      35000.00
       1001            9      60000.00
       1001           10      80000.00
       1001           11      90000.00
       1001           12     100000.00

The data shows that July, September, October, November, and December were the top five sales months.

We create a view that uses the RANK function to rank each month by sales, then use the QUALIFY function to select the top five months:

REPLACE VIEW testschema.top_five_months(
  store_id
, month_no
, sales_amount
, month_rank
) as
SELECT
  store_id
, month_no
, sales_amount
, RANK() OVER (PARTITION BY store_id ORDER BY sales_amount DESC) month_rank
FROM
  testschema.sales
QUALIFY RANK() OVER (PARTITION by store_id ORDER BY sales_amount DESC) <= 5
;

Before, if you used the QUALIFY clause, you had to manually recode your SQL statements. Now, AWS SCT automatically converts QUALIFY into Amazon Redshift-compatible, ANSI-compliant SQL. For example, AWS SCT rewrites the preceding view as follows:

CREATE OR REPLACE VIEW testschema.top_five_months (
  store_id
, month_no
, sales_amount
, month_rank) AS
SELECT
  qualify_subquery.store_id
, qualify_subquery.month_no
, qualify_subquery.sales_amount
, month_rank
FROM (
  SELECT
    store_id
  , month_no
  , sales_amount
  , rank() OVER (PARTITION BY store_id ORDER BY sales_amount DESC NULLS FIRST) AS month_rank
  , rank() OVER (PARTITION BY store_id ORDER BY sales_amount DESC NULLS FIRST) AS qualify_expression_1
  FROM testschema.sales) AS qualify_subquery
  WHERE 
    qualify_expression_1 <= 5;

AWS SCT converts the original query into a subquery, and applies the QUALIFY expression as a filter on the subquery. AWS SCT adds an additional column to the subquery for the purpose of filtering. This is not strictly needed, but simplifies the code when column aliases aren’t used.

You can try QUALIFY conversion in the latest version of AWS SCT.

Summary

We’re happy to share these new features with you. If you’re contemplating a migration to Amazon Redshift, these capabilities can help automate your schema conversion and preserve your investment in existing reports and applications. If you’re looking to get started on a data warehouse migration, you can learn more about Amazon Redshift and AWS SCT from our public documentation.

This post described a few of the dozens of new features we’re introducing to automate your Teradata migrations to Amazon Redshift. We’ll share more in upcoming posts about automation for proprietary Teradata features and other exciting new capabilities.

Check back soon for more information. Until then, you can learn more about Amazon Redshift and the AWS Schema Conversion Tool. Happy migrating!

About the Authors

Michael Soo is a Senior Database Engineer with the AWS Database Migration Service team. He builds products and services that help customers migrate their database workloads to the AWS cloud.

Raza Hafeez is a Data Architect within the Lake House Global Specialty Practice of AWS Professional Services. He has over 10 years of professional experience building and optimizing enterprise data warehouses and is passionate about enabling customers to realize the power of their data. He specializes in migrating enterprise data warehouses to AWS Lake House Architecture.

Po Hong, PhD, is a Principal Data Architect of Lake House Global Specialty Practice, AWS Professional Services. He is passionate about supporting customers to adopt innovative solutions to reduce time to insight. Po is specialized in migrating large scale MPP on-premises data warehouses to the AWS Lake House architecture.

Sumit Singh is a database engineer with Database Migration Service team at Amazon Web Services. He works closely with customers and provide technical assistance to migrate their on-premises workload to AWS cloud. He also assists in continuously improving the quality and functionality of AWS Data migration products.

Nelly Susanto is a Senior Database Migration Specialist of AWS Database Migration Accelerator. She has over 10 years of technical background focusing on migrating and replicating databases along with data-warehouse workloads. She is passionate about helping customers in their cloud journey.

Getting started with testing serverless applications

2021-09-20 Talia Nassi

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/getting-started-with-testing-serverless-applications/

Testing is an essential step in the software development lifecycle. Through the different types of tests, you validate user experience, performance, and detect bugs in your code. Features should not be considered done until all of the corresponding tests are written.

The distributed nature of serverless architectures separates your application logic from other concerns like state management, request routing, workflow orchestration, and queue polling.

In this post, I cover the three main types of testing developers do when building applications. I also go through what changes and what stays the same when building serverless applications with AWS Lambda, in addition to the challenges of testing serverless applications.

The challenges of testing serverless applications

To test your code fully using managed services, you need to emulate the cloud environment on your local machine. However, this is usually not practical.

Secondly, using many managed services for event-driven architecture means you must also account for external resources like queues, database tables, and event buses. This means you write more integration tests than unit tests, altering the standard testing pyramid. Building more integration tests can impact the maintenance of your tests or slow your testing speed.

Lastly, with synchronous workloads, such as a traditional web service, you make a request and assert on the response. The test doesn’t need to do anything special because the thread is blocked until the response returns.

However, in the case of event-driven architectures, state changes are driven by events flowing from one resource to another. Your tests must detect side effects in downstream components and these might not be immediate. This means that the tests must tolerate asynchronous behaviors, which can make for more complicated and slower-running tests.

Unit testing

Unit tests validate individual units of your code, independent from any other components. Unit tests check the smallest unit of functionality and should only have one reason to fail – the unit is not correctly implemented.

Unit tests generally cover the smallest units of functionality although the size of each unit can vary. For example, a number of functions may provide a coherent piece of behavior and you may want to test them as a single unit. In this case, your unit test might call an entry-point function that invokes several others to do its job. You test these functions together as a single unit.

Integration Testing

One good practice to test how services interact with each other is to write integration tests that mock the behavior of your services in the cloud.

The point of integration tests is to make sure that two components of your application work together properly. Integration tests are important in serverless applications because they rely heavily on integrations of different services. Unless you are testing in production, the most efficient way to run automated integration tests is to emulate your services in the cloud.

This can be done with tools like moto. Moto mocks calls to AWS automatically without requiring any other dependencies. Another useful tool is localstack. Localstack allows you to mock certain AWS service APIs on your local machine that you can use for testing the integration of two or more services.

You can also configure test events and manually test directly from the Lambda console. Remember that when you test a Lambda function, you are not only testing the business logic. You must also mock its payload and call a function invoke. There are over 200 event sources that can trigger Lambda functions. Each service has its own unique event format, and contains data about the resource or request that invoked the function. Find the full list of test events in the AWS documentation.

To configure a test event for AWS Lambda:

Navigate to the Lambda console and choose the function. From the Test dropdown, choose Configure Test Event.
Choose Create a new Test Event and select the template for the service you want to act as the trigger for your Lambda function. In this example, you choose Amazon DynamoDB Update.
Save the test event and choose Test in the Code source section. Each user can create up to 10 test events per function. Those test events are private to you. Lambda runs the function on your behalf. The function handler receives and then processes the sample event.
The Execution result shows the execution status as succeeded.

End-to-end testing

When testing your serverless applications end-to-end, it’s important to understand your user and data flows. The most important business-critical flows of your software are what should be tested end-to-end in your UI.

From a business perspective, these should be the most valuable user and data flows that occur in your product. Another resource to utilize is data from your customers. From your analytics platform, find the actions that users are doing the most in production.

End-to-end tests should be running in your build pipeline and act as blockers if one of them fails. They should also be updated as new features are added to your product.

The testing pyramid

The standard testing pyramid above on the left indicates that systems should have more unit tests than any other type of test, then a medium number of integration tests, and the least number of end-to-end tests.

However, when testing serverless applications, this standard shifts to a hexagonal structure on the right because it’s mostly made up of two or more AWS services talking to each other. You can mock out those integrations with tools such as moto or localstack.

Add automated tests to your CI/CD pipeline

As serverless applications scale, having automated tests is essential in getting fast feedback on the current state of your product. It is not scalable to test everything manually, so investing in an automation tool to run your tests is essential.

All of the tests in your build pipeline, including unit, integration, and end-to-end tests should be blocking in your CI/CD pipeline. This means if one of them fails, it should block the promotion of that code into production. And remember – there’s no such thing as a flakey test. Either the test does what it’s supposed to do, or it doesn’t.

Narrowly scope your tests

Testing asynchronous processes can be tricky. Not only must you monitor different parts of your system, you also need to know when to stop waiting and end the test. When there are multiple asynchronous steps, the delays add up to a longer-running test. It’s also more difficult to estimate how long we should wait before ending. There are two approaches to mitigate these issues.

Firstly, write separate, more narrowly-scoped tests over each asynchronous step. This limits the possible causes of asynchronous test failure you need to investigate. Also, with fewer asynchronous steps, these tests will run quicker and it will be easier to estimate how long to wait before timing out.

Secondly, verify as much of your system as possible using synchronous tests. Then, you only need asynchronous tests to verify residual concerns that aren’t already covered. Synchronous tests are also easier to diagnose when they fail, so you want to catch as many issues with them as possible before running your asynchronous tests.

Conclusion

In this blog post, you learn the three types of testing – unit testing, integration testing, and end-to-end testing. Then you learn how to configure test events with Lambda. I then cover the shift from the standard testing pyramid to the hexagonal testing pyramid for serverless, and why more integration tests are necessary. Then you learn a few best practices to keep in mind for getting started with testing your serverless applications.

For more information on serverless, head to Serverless Land.

What to Consider when Selecting a Region for your Workloads

2021-09-20 Saud Albazei

Post Syndicated from Saud Albazei original https://aws.amazon.com/blogs/architecture/what-to-consider-when-selecting-a-region-for-your-workloads/

The AWS Cloud is an ever-growing network of Regions and points of presence (PoP), with a global network infrastructure that connects them together. With such a vast selection of Regions, costs, and services available, it can be challenging for startups to select the optimal Region for a workload. This decision must be made carefully, as it has a major impact on compliance, cost, performance, and services available for your workloads.

Evaluating Regions for deployment

There are four main factors that play into evaluating each AWS Region for a workload deployment:

Compliance. If your workload contains data that is bound by local regulations, then selecting the Region that complies with the regulation overrides other evaluation factors. This applies to workloads that are bound by data residency laws where choosing an AWS Region located in that country is mandatory.
Latency. A major factor to consider for user experience is latency. Reduced network latency can make substantial impact on enhancing the user experience. Choosing an AWS Region with close proximity to your user base location can achieve lower network latency. It can also increase communication quality, given that network packets have fewer exchange points to travel through.
Cost. AWS services are priced differently from one Region to another. Some Regions have lower cost than others, which can result in a cost reduction for the same deployment.
Services and features. Newer services and features are deployed to Regions gradually. Although all AWS Regions have the same service level agreement (SLA), some larger Regions are usually first to offer newer services, features, and software releases. Smaller Regions may not get these services or features in time for you to use them to support your workload.

Evaluating all these factors can make coming to a decision complicated. This is where your priorities as a business should influence the decision.

Assess potential Regions for the right option

Evaluate by shortlisting potential Regions.

Check if these Regions are compliant and have the services and features you need to run your workload using the AWS Regional Services website.
Check feature availability of each service and versions available, if your workload has specific requirements.
Calculate the cost of the workload on each Region using the AWS Pricing Calculator.
Test the network latency between your user base location and each AWS Region.

At this point, you should have a list of AWS Regions with varying cost and network latency that looks something Table 1:

Region	Compliance	Latency	Cost	Services / Features
Region A	✓	15 ms	$$	✓
Region B	✓	20 ms	$$$	X
Region C	✓	80 ms	$	✓

Table 1. Region evaluation matrix

Many workloads such as high performance computing (HPC), analytics, and machine learning (ML), are not directly linked to a customer-facing application. These would not be sensitive to network latency, so you may want to select the Region with the lowest cost.

Alternatively, you may have a backend service for a game or mobile application in which network latency has a direct impact on user experience. Measure the difference in network latency between each Region, and determine if it is worth the increased cost. You can leverage the Amazon CloudFront edge network, which helps reduce latency and increases communication quality. This is because it uses a fully managed AWS network infrastructure, which connects your application to the edge location nearest to your users.

Multi-Region deployment

You can also split the workload across multiple Regions. The same workload may have some components that are sensitive to network latency and some that are not. You may determine you can benefit from both lower network latency and reduced cost at the same time. Here’s an example:

Figure 1. Multi-Region deployment optimized for feature availability

Figure 1 shows a serverless application deployed at the Bahrain Region (me-south-1) which has a close proximity to the customer base in Riyadh, Saudi Arabia. Application users enjoy a lower latency network connecting to the AWS Cloud. Analytics workloads are deployed in the Ireland Region (eu-west-1), which has a lower cost for Amazon Redshift and other features.

Note that data transfer between Regions is not free and, in this example, costs $0.115 per GB. However, even with this additional cost factored in, running the analytical workload in Ireland (eu-west-1) is still more cost-effective. You can also benefit from additional capabilities and features that may have not yet been released in the Bahrain (me-south-1) Region.

This multi-Region setup could also be beneficial for applications with a global user base. The application can be deployed in multiple secondary AWS Regions closer to the user base locations. It uses a primary AWS Region with a lower cost for consolidated services and latency-insensitive workloads.

Figure 2. Multi-Region deployment optimized for network latency

Figure 2 allows for an application to span multiple Regions to serve read requests with the lowest network latency possible. Each client will be routed to the nearest AWS Region. For read requests, an Amazon Route 53 latency routing policy will be used. For write requests, an endpoint routed to the primary Region will be used. This primary endpoint can also have periodic health checks to failover to a secondary Region for disaster recovery (DR).

Other factors may also apply for certain applications such as ones that require Amazon EC2 Spot Instances. Regions differ in size, with some having three, and others up to six Availability Zones (AZ). This results in varying Spot Instance capacity available for Amazon EC2. Choosing larger Regions offers larger Spot capacity. A multi-Region deployment offers the most Spot capacity.

Conclusion

Selecting the optimal AWS Region is an important first step when deploying new workloads. There are many other scenarios in which splitting the workload across multiple AWS Regions can result in a better user experience and cost reduction. The four factors mentioned in this blog post can be evaluated together to find the most appropriate Region to deploy your workloads.

If the workload is bound by any regulations, shortlist the Regions that are compliant. Measure the network latency between each Region and the location of the user base. Estimate the workload cost for each Region. Check that the shortlisted Regions have the services and features your workload requires. And finally, determine if your workload can benefit from running in multiple Regions.

Dive deeper into the AWS Global Infrastructure Website for more information.

[$] More Rust concepts for the kernel

2021-09-20

Post Syndicated from original https://lwn.net/Articles/869428/rss

The first day of the Kangrejos (Rust for Linux) conference
introduced the project and what it was trying to accomplish; day 2 covered a number of core Rust
concepts and their relevance to the kernel. On the third and final day of
the conference, Wedson Almeida Filho delved deeper into how Rust can be
made to work in the Linux kernel, covered some of the lessons that have been
learned so far, and discussed next steps with a number of kernel
developers.

Kim Scott | Just Work | Talks at Google

2021-09-20 Talks at Google

Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=HenYIyjR740

Security updates for Monday

2021-09-20

Post Syndicated from original https://lwn.net/Articles/869863/rss

Security updates have been issued by Debian (gnutls28, nettle, nextcloud-desktop, and openssl1.0), Fedora (dovecot-fts-xapian, drupal7, ghostscript, haproxy, libtpms, lynx, wordpress, and xen), openSUSE (xen), Red Hat (rh-ruby27-ruby), and SUSE (openssl, openssl1, and xen).

Building federated GraphQL on AWS Lambda

2021-09-20 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-federated-graphql-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

IMDb is the world’s most popular source for movie, TV, and celebrity content. It deals with a complex business domain including movies, shows, celebrities, industry professionals, events, and a distributed ownership model. There are clear boundaries between systems and data owned by various teams.

Historically, IMDb uses a monolithic REST gateway system that serves clients. Over the years, it has become challenging to manage effectively. There are thousands of files, business logic that lacks clear ownership, and unreliable integration tests tied to the data. To fix this, the team used GraphQL (GQL). This is a query language for APIs that lets you request only the data that you need and a runtime for fulfilling those queries with your existing data.

It’s common to implement this protocol by creating a monolithic service that hosts the complete schema and resolves all fields requested by the client. It is good for applications with a relatively small domain and clear, single-threaded ownership. IMDb chose the federated approach, that allows us to federate GQL requests to all existing data teams. This post shows how to build federated GraphQL on AWS Lambda.

Overview

This article covers migration from a monolithic REST API and monolithic frontend to a federated backend system powering a modern frontend. It enumerates challenges in the earlier system and explains why federated GraphQL addresses these problems.

I present the architecture overview and describe the decisions made when designing the new system. I also present our experiences with developing and running high-traffic and high-visibility pages on the new endpoint – improvement in IMDb’s ownership model, development lifecycle, in addition to ease of scaling.

Comparing GraphQL with federated GraphQL

Federated GraphQL allows you to combine GraphQLs APIs from multiple microservices into a single API. Clients can make a single request and fetch data from multiple sources, including joining across data sources, without additional support from the source services.

This is an example schema fragment:

type TitleQuote {
  "Quote ID"
  id: ID!
  "Is this quote a spoiler?"
  isSpoiler: Boolean!
  "The lines that make up this quote"
  lines: [TitleQuoteLine!]!
  "Votes from users about this quote..."
  interestScore: InterestScore!
  "The language of this quote"
  language: DisplayableLanguage!
}
"A specific line in the Title Quote. Can be a verbal line with characters speaking or stage directions"
type TitleQuoteLine {
  "The characters who speak this line, e.g.  'Rick'. Not required: a line may be non-verbal"
  characters: [TitleQuoteCharacter!]
  "The body of the quotation line, e.g 'Here's looking at you kid. '.  Not required: you may have stage directions with no dialogue."
  text: String
  "Stage direction, e.g. 'Rick gently places his hand under her chin and raises it so their eyes meet'. Not required."
  stageDirection: String
}

This is an example monolithic query: “Get the 2 top quotes from The A-Team (title identifier: tt0084967)”:

{ 
  title(id:"tt0084967"){ 
    quotes(first:2){ 
      lines { text } 
    } 
  }
}

Here is an example response:

{ 
  "data": { 
    "title": { 
      "quotes": { 
        "lines": [
          { 
            "text": "I love it when a plan comes together!"
          },
          {
            "text": "10 years ago a crack commando unit was sent to prison by a military court for a crime they didn't commit..."
          }
        ]
      } 
    }
  }
}

This is an example federated query: “What is Jackie Chan (id nm0000329) known for? Get the text, rating and image for each title”

{
  name(id: "nm0000329") {
    knownFor(first: 4) {
      title {
        titleText {
          text
        }
        ratingsSummary {
          aggregateRating
        }
        primaryImage {
          url
        }
      }
    }
  }
}

The monolithic example fetches quotes from a single service. In the federated example, knownFor, titleText, ratingsSummary, primaryImage are fetched transparently by the gateway from separate services. IMDb federates the requests across 19 graphlets, which are transparent to the clients that call the gateway.

Architecture overview

IMDb supports three channels for users: website, iOS, and Android apps. Each of the channels can request data from a single federated GraphQL gateway, which federates the request to multiple graphlets (sub-graphs). Each of the invoked graphlets returns a partial response, which the gateway merges with responses returned by other graphlets. The client receives only the data that they requested, in the shape specified in the query. This can be especially useful when the developers must be conscious of network usage (for example, over mobile networks).

This is the architecture diagram:

There are two core components in the architecture: the Gateway and Schema Manager, which run on Lambda. The Gateway is a Node.js-based Lambda function that is built on top of open-source Apollo Gateway code. It is customized with code responsible predominantly for handling authentication, caching, metrics, and logging.

Other noteworthy components are Upstream Graphlets and an A/B Testing Service that enables A/B tests in the graph. The Gateway is connected to an Application Load Balancer, which is protected by AWS WAF and fronted by Amazon CloudFront as our CDN layer. This uses Amazon ElastiCache with Redis as the engine to cache partial responses coming from graphlets. All logs and metrics generated by the system are collected in Amazon CloudWatch.

Choosing the compute platform

This uses Lambda, since it scales on demand. IMDb uses Lambda’s Provisioned Concurrency to reduce cold start latency. The infrastructure is abstracted away so there is no need for us to manage our own capacity and handle patches.

Additionally, Application Load Balancer (ALB) has support for directing HTTP traffic to Lambda. This is an alternative to API Gateway. The workload does not need many of the features that API Gateway provides, since the gateway has a single endpoint, making ALB a better choice. ALB also supports AWS WAF.

Using Lambda, the team designed a way to fetch schema updates without needing to fetch the schema with every request. This is addressed with the Schema Manager component. This component improves latency and improves the overall customer experience.

Integration with legacy data services

The main purpose of the federated GQL migration is to deprecate a monolithic service that aggregates data from multiple backend services before sending it to the clients. Some of the data in the federated graph comes from brand new graphlets that are developed with the new system in mind.

However, much of the data powering the GQL endpoint is sourced from the existing backend services. One benefit of running on Lambda is the flexibility to choose the runtime environment that works best with the underlying data sources and data services.

For the graphlets relying on the legacy services, IMDb uses lightweight Java Lambda functions using provided client libraries written in Java. They connect to legacy backends via AWS PrivateLink, fetch the data, and shape it in the format expected by the GQL request. For the modern graphlets, we recommend the graphlet teams to explore Node.js as the first option due to improved performance and ease of development.

Caching

The gateway supports two caching modes for graphlets: static and dynamic. Static caching allows graphlet owners to specify a default TTL for responses returned by a graphlet. Dynamic caching calculates TTL based on a custom caching extension returned with the partial response. It allows graphlet owners to decide on the optimal TTL for content returned by their graphlet. For example, it can be 24 hours for queries containing only static text.

Permissions

Each of the graphlets runs in a separate AWS account. The graphlet accounts grant the gateway AWS account (as AWS principal) invoke permissions on the graphlet Lambda function. This uses a cross-account IAM role in the development environment that is assumed by stacks deployed in engineers’ personal accounts.

Experience with developing on federated GraphQL

The migration to federated GraphQL delivered on expected results. We moved the business logic closer to the teams that have the right expertise – the graphlet teams. At the same time, a dedicated platform team owns and develops the core technical pieces of the ecosystem. This includes the Gateway and Schema Manager, in addition to the common libraries and CDK constructs that can be reused by the graphlet teams. As a result, there is a clear ownership structure, which is aligned with the way IMDb teams are organized.

In terms of operational excellence of the platform team, this reduced support tickets related to business logic. Previously, these were routed to an appropriate data service team with a delay. Integration tests are now stable and independent from underlying data, which increases our confidence in the Continuous Deployment process. It also eliminates changing data as a potential root cause for failing tests and blocked pipelines.

The graphlet teams now have full ownership of the data available in the graph. They own the partial schema and the resolvers that provide data for that part of the graph. Since they have the most expertise in that area, the potential issues are identified early on. This leads to a better customer experience and overall health of the system.

The web and app developers groups are also impacted by the migration. The learning curve was aided by tools like GraphQL Playground and Apollo Client. The teams covered the learning gap quickly and started delivering new features.

One of the main pages at IMDb.com is the Title Page (for example, Shutter Island). This was successfully migrated to use the new GQL endpoint. This proves that the new, serverless federated system can serve production traffic at over 10,000 TPS.

Conclusion

A single, highly discoverable, and well-documented backend endpoint enabled our clients to experiment with the data available in the graph. We were able to clean up the backend API layer, introduce clear ownership boundaries, and give our client powerful tools to speed up their development cycle.

The infrastructure uses Lambda to remove the burden of managing, patching, and scaling our EC2 fleets. The team dedicated this time to work on features and operational excellence of our systems.

Part two will cover how IMDb manages the federated schema and the guardrails used to ensure high availability of the federated GraphQL endpoint.

For more serverless learning resources, visit Serverless Land.

Login Authentication Goes Automated With New InsightAppSec Improvements

2021-09-20 Tom Caiazza

Post Syndicated from Tom Caiazza original https://blog.rapid7.com/2021/09/20/login-authentication-goes-automated-with-new-insightappsec-improvements/

Move over, macros — automated login is here.

At Rapid7, we know the most powerful tools in your security portfolio are the ones that help you understand your risks quickly. With our new automated login for InsightAppSec, you can access and scan even the most complex, modern applications quickly and easily. That means you’ll spend less time worrying about whether your scans are authenticating and more time assessing and responding to vulnerabilities.

In the world before automated login — we’ll call these the dark ages — security professionals needed to write scripts and rely on macros to navigate more complex applications with their many layers of authentication. This has always been a time-consuming process that takes resources away from the work of identifying and remediating vulnerabilities.

InsightAppSec with automated authentication analyzes and identifies the login pages, enters the credentials, and logs in to the app automatically. Then, it provides you with a confidence score so you’re sure it’s been logged in successfully. Fewer confusing steps, fewer macros — just more understanding of risk from the restricted parts of your web applications.

A look inside

So, what’s different? Well, for starters, the look and feel of the scan will be intuitive and easy to use. We’ve taken great pains to maximize your efficiency at every turn so when you start a new application scan and select authentication, automated authentication will be the default.

Login Authentication Goes Automated With New InsightAppSec Improvements

We’ve also improved secondary navigation to include new, more logical groupings, making settings easier to find.

The process couldn’t be easier. Simply choose the application you wish to scan from the InsightAppSec All Apps page, open Scan Config, and select Automated Authentication from the Authentication’s page. Enter your credentials once, and you’re good to save for later or start the scan now.

For more on how this works and how automated login improves this process, check out our InsightAppSec Quick Start guide.

The first of many updates

Moving to automated login is more than just a single new feature — it opens the door to more innovations. Automated login uses a new architecture that allows InsightAppSec to interact with web apps in the same way a user and their browser would behave. This is critical as applications become more complex, which in turn presents new challenges to automating certain processes. Automated login is just the first feature we’re rolling out based on this new, more innovative architecture.

As web applications become more complex, the solutions you employ to secure them should become more powerful. Automated authentication provides your security team with the ability to efficiently and accurately scan even the most complex applications quickly and in an intuitive way right out of the box. It flattens the learning curve for setting up and running scans, giving any member of your security team the ability to run scans and identify vulnerabilities.

We are including automated login through InsightAppSec for existing and new customers right away. If you want to learn more, click here for more resources.

Kernel prepatch 5.15-rc2

2021-09-20

Post Syndicated from original https://lwn.net/Articles/869838/rss

The 5.15-rc2 kernel prepatch is out for
testing.

So I’ve spent a fair amount of this week trying to sort out all the
odd warnings, and I want to particularly thank Guenter Roeck for his
work on tracking where the build failures due to -Werror come from.

Is it done? No. But on the whole I’m feeling fairly good about this
all, even if it has meant that I’ve been looking at some really odd
and grotty code. Who knew I’d still worry about some odd EISA driver
on alpha, after all these years? A slight change of pace 😉

How to customize your HTTP DDoS protection settings

2021-09-20 Omer Yoachimik

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/http-ddos-managed-rules/

How to customize your HTTP DDoS protection settings

We’re excited to announce the availability of the HTTP DDoS Managed Ruleset. This new feature allows Cloudflare customers to independently tailor their HTTP DDoS protection settings. Whether you’re on the Free plan or the Enterprise plan, you can now tweak and optimize the settings directly from within the Cloudflare dashboard or via API.

We expect that in most cases, Cloudflare customers won’t need to customize any settings. Our mission is to make DDoS disruptions a thing of the past, with no customer overhead. To achieve this mission we’re constantly investing in our automated detection and mitigation systems. In some rare cases, there is a need to make some configuration changes, and so now, Cloudflare customers can customize those protection mechanisms independently. The next evolutionary step is to make those settings learn and auto-tune themselves for our customers, based on their unique traffic patterns. Zero-touch DDoS protection at scale.

Unmetered DDoS Protection

Back in 2017, we announced that we will never kick a customer off of our network because they face large attacks, even if they are not paying us at all (i.e., using the Free plan). Furthermore, we committed to never charge a customer for DDoS attack traffic — no matter the size and duration of the attack. Just this summer, our systems automatically detected and mitigated one of the largest DDoS attacks of all time. As opposed to other vendors, Cloudflare customers don’t need to request a service credit for the attack traffic: we simply exclude DDoS traffic from our billing systems. This is done automatically, just like our attack detection and mitigation mechanisms.

Autonomous DDoS Protection

Our unmetered DDoS protection commitment is possible due to our ongoing investment in our network and technology stack. The global coverage and capacity of our network allows us to absorb the largest attacks ever recorded, without manual intervention. Using BGP Anycast, traffic is routed to the closest Cloudflare edge data center as a form of global inter-data center load balancing. Traffic is then load balanced efficiently inside the data center between servers with the help of Unimog, our home-grown L4 load balancer, to ensure that traffic arrives at the least loaded server. Then, each server scans for malicious traffic and, if detected, applies mitigations in the most optimal location in the tech stack. Each server detects and mitigates attacks completely autonomously without requiring any centralized consensus, and shares details with each other using multicast. This is done using our proprietary autonomous edge detection and mitigation system, and this is how we’re able to continue offering unmetered DDoS protection for free at the scale we operate at.

Configurable DDoS Protection

Our autonomous systems use a set of dynamic rules that scan for attack patterns, known attack tools, suspicious patterns, protocol violations, requests causing large amounts of origin errors, excessive traffic hitting the origin/cache, and additional attack vectors. Each rule has a predefined sensitivity level and default action that varies based on the rule’s confidence that the traffic is indeed part of an attack.

But how do we determine those confidence levels? The answer to that depends on each specific rule and what that rule is looking for. Some rules look for the patterns in HTTP attributes that are generated by known attack tools and botnets, known protocol violations and other general suspicious patterns and traffic abnormalities. If a given rule is searching for the HTTP patterns of known attack tools, then once found, the likelihood (i.e., confidence) that this traffic is part of an attack is high, and we can therefore safely block all the traffic that matches that rule. However, in other cases, the detected patterns or abnormal activity might resemble an attack but can actually be caused by faulty applications that generate abnormal HTTP calls, misbehaving API clients that flood their origin server, and even legitimate traffic that naively violates protocol standards. In those cases, we might want to rate-limit the traffic that matches the rule or serve a challenge action to verify and allow legitimate users in while blocking bad bots and attackers.

Configuring the DDoS Protection Settings

In the past, you’d have to go through our support channels to customize any of the default actions and sensitivity levels. In some cases, this may have taken longer to resolve than desired. With today’s release, you can tailor and fine-tune the settings of our autonomous edge system by yourself to quickly improve the accuracy of the protection for your specific application needs.

If you previously contacted Cloudflare Support to apply customizations, the DDoS Ruleset has been set to Essentially off or Low for your zone, based on your existing customization. You can visit the dashboard to view the settings and change them if you need.

If you’ve requested to exclude or bypass the mitigations for specific HTTP attributes or IPs, or if you’ve requested a significantly high threshold that requires Cloudflare approval, then those customizations are still active but may not yet be visible in the dashboard.

If you haven’t experienced this issue previously, there is no action required on your end. However, if you would like to customize your DDoS protection settings, go directly to the DDoS tab or follow these steps:

Log in to the Cloudflare dashboard, and select your account and website.
Go to Firewall > DDoS.
Next to HTTP DDoS attack protection, click Configure.
In Ruleset configuration, select the action and sensitivity values for all the rules in the HTTP DDoS Managed Ruleset.

Alternatively, follow the API documentation to programmatically configure the DDoS protection settings.

In the configuration page, you can select a different Action and Sensitivity Level to override all the DDoS protection rules as a group of rules (i.e., the “ruleset”).

Alternatively, you can click Browse Rules to override specific rules, rather than all of them as a set of rules.

Mitigation Action

The mitigation action defines what action to take when the mitigation rule is applied. Our systems constantly analyze traffic and track potentially malicious activity. When certain request-per-second thresholds exceed the configured sensitivity level, a mitigation rule with a dynamically generated signature will be applied to mitigate the attack. The default mitigation action can vary according to the specific rule. A rule with less confidence may apply a Challenge action as a form of soft mitigation, and a rule with a Block action is applied when there is higher confidence that the traffic is part of an attack — as a form of a stricter mitigation action.

The available values for the action are:

Block
Challenge (CAPTCHA)
Log
Use Rule Defaults

Some examples of when you may want to change the mitigation action include:

Safer onboarding: You’re onboarding a new HTTP application which has odd traffic patterns, naively violates protocol violations or causes spiky behavior. In this case, you can set the action to Log and see what traffic our systems flag. Afterwards, you can make the necessary changes to the sensitivity levels as required and switch the mitigation action back to the default.
Stricter mitigation: A DDoS attack has been detected but a Rate-limit or Challenge action have been applied due to the rule’s default logic. However, in this specific case, you’re sure that this is malicious traffic, so you can change the action to Block for a more complete mitigation.

Mitigation Sensitivity

The sensitivity level defines when the mitigation rule is applied. Our systems constantly analyze traffic and track potentially malicious activity. When certain request-per-second thresholds are crossed, a mitigation rule with a dynamically generated signature will be applied to mitigate the attack. Toggling the sensitivity levels allows you to define when the mitigation is applied. The higher the sensitivity, the faster the mitigation is applied. The available values for sensitivity are:

High (default)
Medium
Low
Essentially Off

Essentially Off means that we’ve set an exceptionally low sensitivity level so in most cases traffic won’t be mitigated for you. However, attack traffic will be mitigated at exceptional levels to ensure the safety and stability of the Cloudflare network.

Some examples of when you may want to change the sensitivity action include:

Avoid impact to legitimate traffic: One of the rules has applied mitigation to your legitimate traffic due to a suspicious pattern. In this case, you may want to reduce the rule sensitivity to avoid recurrence of the issue and negative impact to your traffic.
Legacy applications: One of your legacy HTTP applications is violating protocol standards, or you may have mistakenly introduced a bug into your mobile application/API client. These cases may result in abnormal traffic activity that may be flagged by our systems. In such a case, you can select the Essentially Off sensitivity level until you’ve resolved the issue on your end, to avoid false positives.

Overriding Specific Rules

As mentioned above, you can also select a specific rule to override its action and sensitivity levels. The per-rule override takes priority over the ruleset override.

When configuring per-rule overrides, you’ll see that some rules have a DDoS Dynamic action. This means that the mitigation is multi-staged and will apply different mitigation actions depending on various factors including attack type, request characteristics, and various other factors. This dynamic action can also be overridden if you choose to do so.

DDoS Attack Analytics

When a DDoS attack is detected and mitigated, you’ll receive a real-time DDoS alert (if you’ve configured one) and you’ll be able to view the attack in the Firewall analytics dashboard. The attack details and the rule ID that was triggered will also be displayed in the Activity log as part of each HTTP request log that was mitigated.

Helping Build a Better Internet

At Cloudflare, everything we do is guided by our mission to help build a better Internet. A significant part of that mission is to make DDoS downtime and service disruptions a thing of the past. By giving our users the visibility and tools they need in order to understand and improve their DDoS protection, we’re hoping to make another step towards a better Internet.

Do you have feedback about the user interface or anything else? In the new DDoS tab, we’ve added a link to provide feedback, so you too can help shape the future of Cloudflare’s DDoS protection Managed Rules.

Not using Cloudflare yet? Start now.

The Great Miami Hurricane of 1926

2021-09-20 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=th-BtR7Embw

NEVER MISS A BLOG

Download Instant client and SQLPlus

Solve OS dependencies

Install Oracle Instant Client

LD config

Check if the Oracle service port is reachable

Test connection with SQLPlus

ODBC connector

How does it work?

Upgrade suggestions

Inspiration

Amazon Redshift RSQL features

An example Amazon Redshift RSQL script

Use AWS SCT to automate your BTEQ conversions

Convert shell scripts

Conclusion

About the Authors

Merge statement

ASCII() function

TO_DATE() function

CURSOR result sets

IDENTITY columns

ANY and SOME filters with inequality predicates

Analytic functions with RESET WHEN

TD_NORMALIZE_OVERLAP() function

TD_UNPIVOT() function

QUANTILE function

QUALIFY filter

Summary

About the Authors

The challenges of testing serverless applications

Unit testing

Integration Testing

End-to-end testing

The testing pyramid

Add automated tests to your CI/CD pipeline

Narrowly scope your tests

Conclusion

Evaluating Regions for deployment

Assess potential Regions for the right option

Multi-Region deployment

Conclusion

Overview

Comparing GraphQL with federated GraphQL

Architecture overview

Choosing the compute platform

Integration with legacy data services

Caching

Permissions

Experience with developing on federated GraphQL

Conclusion

A look inside

The first of many updates

Unmetered DDoS Protection

Autonomous DDoS Protection

Configurable DDoS Protection

Configuring the DDoS Protection Settings

Mitigation Action

Mitigation Sensitivity

Overriding Specific Rules

DDoS Attack Analytics

Helping Build a Better Internet

The collective thoughts of the interwebz