[$] The ABI status of filesystem formats

Post Syndicated from original https://lwn.net/Articles/833696/rss

One of the key rules of Linux kernel development is that the ABI between
the kernel and user space cannot be broken; any change that breaks
previously working programs will, outside of exceptional circumstances, be
reverted. The rule seems clear, but there are ambiguities when it comes to
determining just what constitutes the kernel ABI; tracepoints are a perennial example of this. A recent
discussion has
brought another one of those ambiguities to light: the on-disk format of Linux
filesystems.

Migrating IBM Netezza to Amazon Redshift using the AWS Schema Conversion Tool

Post Syndicated from Mattia Berlusconi original https://aws.amazon.com/blogs/big-data/migrating-ibm-netezza-to-amazon-redshift-with-the-aws-sct/

The post How to migrate a large data warehouse from IBM Netezza to Amazon Redshift with no downtime described a high-level strategy to move from an on-premises Netezza data warehouse to Amazon Redshift. In this post, we explain how a large European Enterprise customer implemented a Netezza migration strategy spanning multiple environments, using the AWS Schema Conversion Tool (AWS SCT) to accelerate schema and data migration. We also walk you through validating that the schema and data content were migrated as expected and followed Amazon Redshift best practices.

Solution overview

It’s important to build a migration plan unique to your organization’s processes and non-functional requirements. The following plan is a real-world use case from a large European Enterprise customer. It details the different environments migrated to and the tasks, tools, and scripts used to complete the work:

  1. Assess migration tasks
    1. Understand the scope of the migration
    2. Record objects to be migrated into a migration runbook
  2. Set up the migration environment
    1. Install AWS SCT
    2. Configure AWS SCT for Netezza source environments
  3. Migrate to the development environment
    1. Create users, groups, and schema
    2. Convert schema
    3. Migrate data
    4. Validate data
    5. Transform ETL, UDF, and procedures
  4. Migrate to other pre-production environments
    1. Create users, groups, and schema
    2. Convert schema
    3. Migrate data
    4. Validate data
    5. Transform ETL, UDF, and procedures
  5. Migrate to the production environment
    1. Create users, groups, and schema
    2. Convert schema
    3. Migrate data
    4. Validate data
    5. Transform ETL, UDF, and procedures
    6. Business validation (including optional dual-running)
    7. Cut over

Assessing migration tasks

To plan and keep track of the migration tasks, you should produce a tracker of all the Netezza databases, tables, and views in scope. This information forms a migration runbook that is updated during the migration to document the progress of data migration from Netezza to Amazon Redshift. For each table identified, record the number of rows and size in GB.

Some Netezza source systems contain two Netezza data warehouses, for example one for ETL loading throughout the day and one for end-user reporting users. Make sure it’s clear which data warehouses are in scope for the migration.

Setting up the migration environment

The migration strategy uses the AWS SCT to accelerate schema object conversion and migrate the data from the Netezza database to the Amazon Redshift cluster. The following diagram illustrates this architecture.

The migration should ensure the following:

  • The AWS SCT is installed within the AWS account onto an Amazon Elastic Compute Cloud (Amazon EC2) instance to facilitate migration operations, orchestrate the AWS SCT data extraction agents, and provide access via a user-friendly console.
  • The AWS SCT data extraction agents are installed and run as close to the Netezza data warehouse as possible. AWS strongly recommends installing them on premises within the same subnet as the Netezza data warehouse.

During the transfer of data from the on-premises data center to the AWS account, you can use either a direct connection or offline storage. AWS Snowball is a petabyte-scale offline solution for moving large amounts of data into the AWS account where sufficient bandwidth of a direct connection isn’t available. AWS Direct Connect is a cloud service solution that makes it easy to establish a dedicated network connection from your premises to an AWS account. You can establish private connectivity between your AWS account and your data center, office, or co-location environment by using Direct Connect, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections. Using Direct Connect also adds flexibility in case extract jobs need to be re-run.

Configuring AWS SCT for the Netezza source environment

The AWS SCT is installed on an EC2 instance running Microsoft Windows 10 with administrator privileges. Choosing Microsoft Windows as the operating system allows your users to graphically control the creation of projects, modify profiles, start and view the progress of the conversions, and view the output of the migration assessment reports.

Because you don’t perform the data migration directly on the AWS SCT console, a general purpose EC2 instance with 4 vCPU, 16 GB memory, 100 GB storage, and moderate network bandwidth is sufficient.

You should configure several AWS SCT data extraction agents to match the amount of data to be concurrently transferred and the number of Netezza connections available. You can install the data extraction agents on on-premises VM instances running Linux with root administration privileges. The size of each instance is 8 vCPU, 32 GB memory, and up to 10 Gb network capacity. For disk storage, we use 1TB of 500 IOPS Provisioned SSD because intermediate results are stored on disk.

It’s preferable that the on-premises instances are located as close as possible to the Netezza data warehouse, ideally only a single network hop away. This is important because each data extraction agent creates a table on the instance file system as storage for the extracted data. Also, for each agent, the CPU chosen is more powerful because the compression of the extracted data is processor intensive.

As stated earlier, the number of agents should be proportionate to the amount of concurrent data streams being transferred and the number of Netezza connections available for the transfer. A rule of thumb is to have one data extraction agent for each TB of compressed Netezza data to be migrated in parallel. For optimum performance, it’s recommended that each agent is installed on a single VM instance.

You should work with the DBA team to ensure as many Netezza concurrent connections are made available to the data extraction agents as possible. For the best performance, allocating all the available connections gives all the power of the source database, but if you need to run workloads in parallel with the data extracts, asking for a smaller amount (for example, 21) can suffice. This is a trade-off between resources available against the time required to migrate the data.

For this use case, we allocated seven extraction agents, because the largest project phase extracted 6 TB of Netezza data. The DBA team configured 21 Netezza concurrent connections, so each agent was configured with three parallel data extraction processes (known as threads; see the following configuration file).

Two parameters on the data extraction agents can impact the length of time it takes for the data to migrate from Netezza to the agents: the number of connections and the number of threads.

Tuning is required for each data extraction agent to maximize throughput during the data migration phase. Tuning is achieved by modifying the file /usr/share/aws/sct-extractor/conf/settings.properties, and the file must be applied against each agent. See the following code:

# Number of connections in the pool per agent
extractor.source.connection.pool.size=5

# Number of threads per agent
extractor.extracting.thread.pool.size=3

The preceding code has the following features:

  • extractor.source.connection.pool.size defines the number of connections the agent opens against the Netezza data warehouse.
  • extractor.extracting.thread.pool.size defines the number of parallel jobs the agent can spawn concurrently. The sum of this parameter for all the agents should be smaller than the maximum concurrent connections configured from Netezza.
  • It’s an AWS recommendation to have extractor.source.connection.pool.size 1.5 times larger than extractor.extracting.thread.pool.size. This is because while a task is running, the AWS SCT may need additional connections to retrieve metadata from Netezza to create additional tasks or other operations, such as to collect table statistics.

Migrating to the development environment

The first task to undertake is data model schema transformation. It consists of transforming the Netezza schema objects into Amazon Redshift-compliant syntax and deploying them into the Amazon Redshift development environment. Before migrating the Netezza tables and views, you must create the schemas, groups, and users.

Creating schemas, users, and groups

If you don’t follow this step, all the objects are created in the Amazon Redshift public schema, which isn’t recommended. The following best practices aren’t specific to Netezza migration, but you can use them as a checklist during this step:

  • Create schemas to logically separate views and tables.
  • Groups are easier to maintain than many users because you can grant permissions to groups, and you can add and remove users from groups. Also, groups can direct all traffic from all users in the group to a specific Amazon Redshift WLM queue (which can control priorities as well as QMR limits).
  • Grant permissions at the schema level to allow selected groups to access the schema. This is independent of the permissions for the objects within the schema.
  • Finally, assign users to groups.

Transforming the schema

The AWS SCT analyzes the Netezza data model schema, converts the syntax into Amazon Redshift-compliant DDL statements, and applies the target schema to the Amazon Redshift cluster. The AWS SCT accelerates this phase by making sure Amazon Redshift best practices are taken into account during the transformation.

Within Amazon Redshift, column-level encoding makes sure that the most performant level of compression is applied to every data block of storage for the tables. It’s recommended that the latest ZSTD encoding is applied to all varchar, char, Boolean, and geometry columns, and the AZ64 encoding is applied to all other columns, including integers and decimals.

To improve zone map performance, don’t encode the first column of a sort key (set to raw encoding).

Netezza supports both character-length and byte-length semantics.

If character length semantics (the default) is selected, the length is specified in terms of characters, and it can consume more bytes than the length indicates. For example, if the varchar datatype length is set to 100, it allows multi-byte characters from 1–4 bytes to a maximum of 400 bytes.

If the bytes length semantics is selected, length is specified in terms of bytes. It can support only the number of bytes specified in the varchar. For example, if the varchar datatype is set to 100 only, it allows storing characters up to 100 bytes. This includes single byte and multi byte.

As of this writing, Amazon Redshift doesn’t support character-length semantics, which can lead to String length exceeds DDL length errors while loading the data into Amazon Redshift tables. The simplest solution is to multiply the length of such attributes by 4. A more efficient solution requires determining the maximum length of each varchar column in bytes in Netezza, adding an additional 20% buffer to the maximum length, and setting that as the maximum value for the Amazon Redshift varchar datatype column.

If the Netezza column maximum length in bytes is less than Amazon Redshift column length in bytes, you don’t need to increase the size of the column length in Amazon Redshift. The following query gets the column datatype in Netezza:

SELECT
	DATABASE,
	NAME,
	TYPE,
	ATTNAME,
	FORMAT_TYPE
FROM
	_V_relation_column
WHERE
	DATABASE = 'YOUR_DBASE'
	AND FORMAT_TYPE LIKE '%CHAR%'
	AND name IN ('tablename')
ORDER BY
	NAME;

The following script generates a query to get the maximum amount of bytes actually used for each varchar column:

SELECT
	'Select max(octet_length(' || ATTNAME || ')) from ' || DATABASE || '.' || name || ';'
FROM
	_v_relation_column
WHERE
	DATABASE = 'YOUR_DBASE'
	AND FORMAT_TYPE LIKE '%CHAR%'
	AND name IN ('TABLE')
ORDER BY
	NAME;

During data migration, you can use the following query to identify the reason for the load failure:

SELECT
	DISTINCT(ti."table"),
	ti."schema",
	starttime,
	err.tbl,
	err.colname,
	err.type,
	err.col_length,
	err.err_reason
FROM
	stl_load_errors err,
	svv_table_info ti
WHERE
	starttime > 'YYYY-MM-DD'
	AND ti.table_id = err.tbl
	AND err.err_reason = 'String length exceeds DDL length'
ORDER BY
	starttime DESC;

If the error reason is String length exceeds DDL length, you need to increase the length of the affected column.

As recommended earlier, based on the maximum column length in Netezza, you should add an additional 20% buffer to it and set that as maximum length to Amazon Redshift.

The following in Netezza is example output from the preceding SQL command:

SELECT
	max(octet_length(column_a))
FROM
	schema_a.table_a;

The following code is output in Amazon Redshift:

ALTER TABLE schema_a.table_a 
ALTER COLUMN column_a TYPE varchar(60);

AWS SCT uses statistics from the source database with user-specified optimization strategies to determine the appropriate distribution key and sort key strategies for the target schema. These optimization strategies require collecting statistics from the source database in order to activate the most relevant optimization rule for each table.

It’s recommended to do the following:

  • Choose the current Netezza key distribution style as a good starting point for an Amazon Redshift table’s key distribution strategy. When the table is within Amazon Redshift with representative workloads, you can optimize the distribution choice if needed.
  • Set the Amazon Redshift distribution style to auto for all Netezza tables with random distribution. This makes sure that Amazon Redshift automatically chooses the most performant distribution style depending on the number of rows in the table.

Migrating the data

You use the AWS SCT to migrate the data from the source Netezza data warehouse to the Amazon Redshift cluster. The AWS SCT migrates data with a three-phase approach:

  • Extract – Extracts data from Netezza and stores it into the file system of on-premises AWS SCT data extraction agents
  • Upload – Uploads data from the agents to Amazon Simple Storage Service (Amazon S3)
  • Copy – Loads the data from Amazon S3 into Amazon Redshift via the COPY command

For any migration, especially ones with large volumes of data or many objects to migrate, it’s important to plan and migrate the tables in smaller tasks. This is where tracking the runs and progress via the migration runbook from the assessment phase is important.

Segment the source tables based on their size. The following choices were successful for a 60 TB Netezza migration:

  • One AWS SCT task for all tables less than 5 GB
  • One AWS SCT task for all tables 5–15 GB
  • Multiple AWS SCT tasks for tables under 50 GB; a few tables per task
  • One AWS SCT task for each table bigger than 50 GB

You should refine configuration according to the available migration windows. The approach ensures the following:

  • A task is an atomic process; if it succeeds, all the managed tables are migrated successfully
  • If it fails, it might be more convenient to run the entire task from scratch rather than double-check the status and consistency of each table
  • Task size should trade off between the mentioned opposite poles

To manage the substitution of special characters during these phases, set the following parameters:

  • For NULL values as a string, enter ~~~~. By default, this is not checked. Numeric and date type nulls are by default extracted as ‘\N’ and loaded to Amazon Redshift as nulls.
    • If it is unchecked or if it is checked and value is left black, AWS SCT extracts char/varchar type null as ‘\N’ and the COPY command has the NULL AS ‘\N’ parameter set. This causes issues during COPY operations when we have data with value ‘N’ in any column.
    • If checked and value is ~~~~, AWS SCT extracts char and varchar type null as ~~~~ and the COPY command has the NULL AS ~~~~ parameter set. Using junk characters (such as ~~~~) extracts char and varchar null values as ~~~~, and the COPY command replaces and loads ~~~~ as NULL. This way, we can extract and load char and varchar null values. This doesn’t cause issues during COPY when we have data with the value ‘N’ in any column.
    • If checked and value is ”, AWS SCT extracts the char and varchar type null as ” and the COPY command has the NULL AS ” parameter set. NULL AS ” is equivalent to EMPTYASNULL.
  • Deselect Use blank as null value. If BLANKASNULL is set (which is default setting), it replaces white space characters (‘ ‘) with NULL for char and varchar datatypes, and if the column is NOT NULL, inserting NULL fails. Deselecting BLANKASNULL loads the data as it is in the source.
  • Deselect Use empty as null value. If EMPTYASNULL is set (which is default setting), it replaces empty data (two delimiters in succession with no characters between the delimiters) with NULL for char and varchar datatypes. This is not needed.

The following screenshot shows our configuration for the AWS SCT tasks.

To keep track of the tasks and record them accurately in the migration runbook, on the AWS S3 settings tab, set the folder name to be the same as the task name. Using a consistent naming convention allows easier tracking of progress in the runbook, and is useful during troubleshooting for any issues encountered.

For each subject area in scope, the extraction can either occur while sharing the connections and threads with other process during the day, or it’s recommended for the initial data load to schedule the tasks during the evening, weekend, or agreed schedule with as many Netezza resources as possible.

Breaking the migration down into smaller tasks allows you to log the progress in the migration runbook and run individual tasks to completion during the allocated migration window.

It’s recommended to migrate a small sample table first to test the parameter settings. The following sample table contains specific examples of edge cases that can provide quick feedback as to the suitability of the parameter settings:

create table <schema>.test_dummy
(
       idrow integer,
       field1 integer,
       field2 character varying (50)
);
 
insert into <schema>.test_dummy (idrow, field1, field2) values (1, null, null);
insert into <schema>.test_dummy (idrow, field1, field2) values (2, null, '');
insert into <schema>.test_dummy (idrow, field1, field2) values (3, null, '    ');
insert into <schema>.test_dummy (idrow, field1, field2) values (4, 34, '  Test4   ');
insert into <schema>.test_dummy (idrow, field1, field2) values (5, 15, '');
insert into <schema>.test_dummy (idrow, field1, field2) values (6, 25, '   ');
insert into <schema>.test_dummy (idrow, field1, field2) values (7, 655, 'Test7');

When migrating large Netezza tables, data is migrated on a table-by-table basis using multiple data extraction agents. You should split large tables (for example, tables with more than 20 million rows or greater than 50 GB) into partitions using the AWS SCT virtual partitions functionality. Using virtual partitioning is a recommended best practice for data warehouse migrations using the AWS SCT extractors.

Virtual partitions decrease the migration timeline of a table by parallelizing the extraction of a configurable amount of subsections. You can migrate partitions in parallel, and extract failure is limited to a single partition instead of the entire table.

The AWS SCT creates a subtask for each table partition. Then, when the migration is running, AWS SCT assigns the subtask to an available data extractor to run. The AWS SCT orchestrates which subtask runs on which extractor, thereby keeping all extractors as busy as possible throughout the migration.

To use virtual partitioning, you should identify an attribute that you can use to evenly split the table. It’s important that the virtual partitions are well balanced in order to exploit the benefit of the parallelism. The AWS SCT usually virtually defines such partitions at extraction time—virtual partitions aren’t related to how data is stored into the source data warehouse.

AWS SCT provides three types of virtual partitioning: list, range, and auto split. For more information, see Use virtual partitioning in the AWS Schema Conversion Tool.

When using list partitioning, for very big tables (over 100 GB), the Netezza data slice IDs are an option for the partition key.

Migrating to other pre-production environments

After the data migration has successfully been proven in the development environment, you may choose to migrate to other pre-production environments. Apply the same steps and validation checks, including:

  • Validate that the schema deployment matches the development environment.
  • Validate the data migration has completed successfully, and that no data load errors are logged into the STL_LOAD_ERRORS table. The typical reasons for errors at this stage include schema mismatch, different input file formats, or insufficient varchar length for the input data.
  • Validate the ETL deployment is loading the data as expected.

Migrating to the production environment

Migration to the production environment follows the same processes as the non-production environments, with the addition of the following steps:

  • Undertake the task of business validation with your stakeholders to measure the accuracy of the migration in meeting the program goals:
    1. Undertake a period of dual-running the ETL deployment with production data being dual-loaded into the Netezza data warehouse and the production Amazon Redshift cluster.
    2. Compare the results sets from the Netezza data warehouse and the production Amazon Redshift cluster (the data validation scripts in the following section support this task).
    3. Update the migration runbook for each source table to record the number of records migrated, which validation checks have been run, and any discrepancies found during the checks.
    4. Run reports and dashboards against the Netezza data warehouse and the production Amazon Redshift cluster and ensure the results match.
    5. Obtain sign-off upon successful completion of these business validation tests.
  • After you successfully complete the dual-running of both ETL and reporting deployments, the source of truth is transferred from the Netezza data warehouse to the production Amazon Redshift cluster by decommissioning the Netezza ETL deployment and the Netezza data warehouse, and re-pointing all reporting and dashboard connections to the Amazon Redshift cluster.
  • When the Amazon Redshift cluster is live, monitor the cluster and ensure data model best practices are being followed.

Validating the data

After you migrate the data model schema and data contents to Amazon Redshift, you should perform data-validation tests to measure the migration’s success. The scripts included in this section cover checks commonly undertaken during migration engagements. All these scripts must be run by a superuser account.

Amazon Redshift utilities

The Amazon Redshift Utilities GitHub repo contains a set of utilities to accelerate troubleshooting or analysis on Amazon Redshift. Such utilities consist of queries, views, and scripts. These scripts aren’t deployed by default into Amazon Redshift clusters. The recommendation is to deploy the views into an admin schema.

Comparing source vs. target table and view counts

For Netezza, enter the following code:

SET CATALOG <database_name>;

SELECT
	'<schema_name>' ,
	sum(CASE OBJTYPE WHEN 'TABLE' THEN 1 ELSE 0 END) AS table_count ,
	sum(CASE OBJTYPE WHEN 'VIEW' THEN 1 ELSE 0 END) AS view_count
FROM
	_v_objects
WHERE
	OBJTYPE IN ('TABLE', 'VIEW')
	AND OBJNAME IN ('<table_name>')
GROUP BY
	SCHEMA;

For Amazon Redshift, enter the following code:

SELECT
	trim(pg_namespace.nspname) AS schema_name ,
	sum(CASE pg_class.relkind WHEN 'r' THEN 1 ELSE 0 END) AS table_count ,
	sum(CASE pg_class.relkind WHEN 'v' THEN 1 ELSE 0 END) AS view_count
FROM
	pg_namespace,
	pg_class
WHERE
	pg_class.relnamespace = pg_namespace.oid
	AND pg_class.relkind IN ('r', 'v')
	AND schema_name IN ('<schema_name>')
	AND pg_class.relname IN ('<table_name>')
GROUP BY
	schema_name
ORDER BY
	1;

Comparing source vs. target table constraints

For Netezza, enter the following code:

SET CATALOG <database_name>;

SELECT
	nc.database,
	nc.Schema_Name,
	nc.Table_Name,
	ISNULL(pk_count, 0) pk_count,
	ISNULL(fk_count, 0) fk_count,
	ISNULL(uk_count, 0) uk_count,
	ISNULL(ck_count, 0) ck_count,
	ISNULL(nn_t_count, 0) nn_count,
	ISNULL(pk_count, 0)+ ISNULL(fk_count, 0)+ ISNULL(uk_count, 0)+ ISNULL(ck_count, 0)+ ISNULL(nn_t_count, 0) Total_Count
FROM
	(
	SELECT
		database, Schema_Name, Table_Name, Table_Id, sum(CASE Constraint_Type WHEN 'p' THEN 1 ELSE 0 END) AS pk_count, sum(CASE Constraint_Type WHEN 'f' THEN 1 ELSE 0 END) AS fk_count, sum(CASE Constraint_Type WHEN 'u' THEN 1 ELSE 0 END) AS uk_count, sum(CASE Constraint_Type WHEN 'c' THEN 1 ELSE 0 END) AS ck_count
	FROM
		(
		SELECT
			DISTINCT database, SCHEMA Schema_Name, relation Table_Name, contype Constraint_Type, constraintname, objid Table_Id
		FROM
			_v_relation_keydata) in1
	GROUP BY
		database, Schema_Name, Table_Name, Table_Id) oc
RIGHT OUTER JOIN (
	SELECT
		database, SCHEMA Schema_Name, name Table_Name, objid Table_Id, sum(CASE attnotnull WHEN TRUE THEN 1 ELSE 0 END) AS nn_t_count, sum(CASE attnotnull WHEN FALSE THEN 1 ELSE 0 END) AS nn_f_count, count(attnotnull) nn_total_count
	FROM
		_v_relation_column
	WHERE
		TYPE = 'TABLE'
		AND attnum>0
	GROUP BY
		database, Schema_Name, Table_Name, Table_Id) nc ON
	(oc.Table_Id = nc.Table_Id)
WHERE
	nc.database = '<database_name>'
	AND nc.table_name IN ('<table_name>')
ORDER BY
	1, 2;

For Amazon Redshift, enter the following code:

SELECT
	Schema_Name,
	Table_Name,
	pk_count,
	fk_count,
	uk_count,
	ck_count,
	ISNULL(nn_count, 0) nn_count,
	pk_count + fk_count + uk_count + ck_count + ISNULL(nn_count, 0) Total_Count
FROM
	(
	SELECT
		Schema_Name, Table_Name, Table_Id, sum(CASE Constraint_Type WHEN 'p' THEN 1 ELSE 0 END) AS pk_count, sum(CASE Constraint_Type WHEN 'f' THEN 1 ELSE 0 END) AS fk_count, sum(CASE Constraint_Type WHEN 'u' THEN 1 ELSE 0 END) AS uk_count, sum(CASE Constraint_Type WHEN 'c' THEN 1 ELSE 0 END) AS ck_count
	FROM
		(
		SELECT
			trim(pg_namespace.nspname) Schema_Name, trim(pg_class.relname) Table_Name, trim(pg_constraint.conname) Constraint_Name, pg_constraint.contype Constraint_Type, pg_class.oid Table_Id
		FROM
			pg_namespace
		INNER JOIN pg_class ON
			pg_namespace.oid = pg_class.relnamespace
		LEFT OUTER JOIN pg_constraint ON
			pg_constraint.conrelid = pg_class.oid
		WHERE
			schema_name NOT IN ('pg_catalog', 'pg_toast', 'information_schema')
			AND schema_name NOT LIKE '%_ext'
			AND pg_class.relkind = 'r')
	GROUP BY
		Schema_Name, Table_Name, Table_Id) oc
LEFT OUTER JOIN (
	SELECT
		attrelid Table_Id, count(attnotnull) nn_count
	FROM
		pg_attribute
	WHERE
		attnotnull = TRUE
		AND attnum>0
	GROUP BY
		Table_Id) nc ON
	(oc.Table_Id = nc.Table_Id)
WHERE
	schema_name IN ('<schema_name>')
	AND table_name IN ('<table_name>')
ORDER BY
	1, 2;
Netezza
Amazon Redshift

Generating missing constraints from Netezza

Run the following SQL statements in Netezza to generate the DDL statements to add any missing constraints in Amazon Redshift:

-- Generate Primary Key Constraints DDL
SET CATALOG <database_name>;

SELECT
	'ALTER TABLE <schema_name>.' || relation || ' ADD CONSTRAINT ' || constraintname || ' PRIMARY KEY (' || attname || ')'
FROM
	_V_RELATION_KEYDATA
WHERE
	DATABASE = '<database_name>'
	AND relation IN ('<table_name>')
	AND contype = 'p';

-- Generate Unique Key Constraints DDL
SET CATALOG <database_name>;

SELECT
	'ALTER TABLE <schema_name>.' || relation || ' ADD CONSTRAINT ' || constraintname || ' UNIQUE (' || attname || ')'
FROM
	_V_RELATION_KEYDATA
WHERE
	DATABASE = '<database_name>'
	AND relation IN ('<table_name>')
	AND contype = 'u';

-- Generate Foreign Key Constraints DDL  
SET CATALOG <database_name>;

SELECT
	'ALTER TABLE <schema_name>.' || relation || ' ADD CONSTRAINT ' || constraintname || ' FOREIGN KEY (' || attname || ') REFERENCES <schema_name>.' || pkrelation || '(' || pkattname || ')' refconstrname
FROM
	_V_RELATION_KEYDATA
WHERE
	DATABASE = '<database_name>'
	AND relation IN ('<table_name>')
	AND contype = 'f';

Run the generated script against the Amazon Redshift database.

Identifying tables with insufficient varchar column length

For Netezza, enter the following code:

-- Generate SQL for Varchar Column Length in Bytes   
SET CATALOG <database_name>;

SELECT
	'SELECT ''' || database || ''' database_name,''' || SCHEMA || ''' schema_name,''' || name || ''' table_name, ''' || attname || ''' column_name, ''' || format_type || ''' data_type, ''' || attcolleng || ''' data_type_length_char, ' || 'MAX(OCTET_LENGTH(' || attname || ')) max_bytes FROM ' || SCHEMA || '.' || name || ' UNION ALL'
FROM
	_v_relation_column
WHERE
	TYPE = 'TABLE'
	AND format_type LIKE 'CHARACTER VARYING%'
	AND database = '<database_name>'
	AND name IN ('<table_name>');

For Amazon Redshift, enter the following code:

-- Varchar Column Length in Bytes 
SELECT
	trim(pg_namespace.nspname) Schema_Name,
	trim(pg_class.relname) Table_Name,
	trim(pg_attribute.attname) Column_Name,
	trim(pg_type.typname) Data_Type,
	pg_attribute.atttypmod-4 Data_Type_Length_Bytes
FROM
	pg_attribute
JOIN pg_type ON
	pg_type.oid = pg_attribute.atttypid
JOIN pg_class ON
	pg_class.oid = pg_attribute.attrelid
JOIN pg_namespace ON
	pg_namespace.oid = pg_class.relnamespace
WHERE
	trim(pg_type.typname) LIKE 'varchar%'
	AND Data_Type_Length_Bytes <> 1
	AND Schema_Name IN ('<schema_name>')
	AND Table_Name IN ('<table_name>')
ORDER BY
	1, 2, 3;

Comparing source vs. target row count

Remove the final UNION ALL from the following two scripts output before running.

For Netezza, enter the following:

SET CATALOG <database_name>;

SELECT
	'SELECT ''' || database || ''' database_name,''' || SCHEMA || ''' schema_name,''' || tablename || ''' table_name,COUNT(*) count_of_rows from ' || SCHEMA || '.' || tablename || ' UNION ALL'
FROM
	_v_table
WHERE
	OBJTYPE = 'TABLE'
	AND database = '<database_name>'
	AND tablename IN ('<table_name>');

For Amazon Redshift, enter the following code:

SELECT
	'SELECT ''' || schema_name || ''' schema_name,''' || table_name || ''' table_name,COUNT(*) count_of_rows from ' || schema_name || '.' || table_name || ' UNION ALL'
FROM
	(
	SELECT
		trim(pg_namespace.nspname) schema_name, trim(pg_class.relname) table_name
	FROM
		pg_namespace
	INNER JOIN pg_class ON
		pg_namespace.oid = pg_class.relnamespace
	WHERE
		pg_class.relkind = 'r'
		AND schema_name IN ('<schema_name>')
		AND table_name IN ('<table_name>')
	ORDER BY
		1, 2 );

Comparing source vs. target columns

For Netezza, enter the following code:

SET CATALOG <database_name>;

SELECT
	database,
	SCHEMA Schema_Name,
	name Table_Name,
	attnum,
	attname Column_Name,
	Format_Type Data_Type
FROM
	_v_relation_column
WHERE
	TYPE = 'TABLE'
	AND attnum>0
	AND database = '<database_name>'
	AND name IN ('<table_name>')
ORDER BY
	name,
	attnum;

For Amazon Redshift, enter the following code:

SET search_path TO <schema_name1>,<schema_name2>;

SELECT
	trim(pgn.nspname) AS Schema_Name,
	trim(pgc.relname) AS Table_Name,
	det.attnum,
	det.attname AS Column_Name,
	def.type Data_Type
FROM
	pg_class AS pgc
JOIN pg_namespace AS pgn ON
	pgn.oid = pgc.relnamespace
LEFT OUTER JOIN (
	SELECT
		attrelid, attname, attnum
	FROM
		pg_attribute
	WHERE
		attnum>0) AS det ON
	det.attrelid = pgc.oid
LEFT OUTER JOIN pg_table_def def ON
	(def.schemaname = pgn.nspname
	AND def.tablename = pgc.relname
	AND def."column" = det.attname)
WHERE
	Schema_Name IN ('<schema_name>')
	AND Table_Name IN ('<table_name>')
ORDER BY
	1, 2, 3;

Comparing source vs. target distribution key

For Netezza, enter the following code:

SET CATALOG <database_name>;

SELECT
	t.DATABASE,
	t.SCHEMA,
	t.tablename,
	td.attname AS Dist_Key
FROM
	_v_table t
LEFT OUTER JOIN _v_table_dist_map td ON
	t.tablename = td.tablename
	AND (td.distseqno IS NULL
	OR td.distseqno = 1)
WHERE
	t.database = '<database_name>'
	AND t.tablename IN ('<table_name>')
ORDER BY
	1,
	3;

For Amazon Redshift, enter the following code:

SELECT
	trim(pgn.nspname) AS Schema_Name,
	trim(pgc.relname) AS Table_Name,
	decode(pgc.reldiststyle, 0, 'even', 1, 'key', 8, 'all') AS dist_style,
	det.dist_key AS dist_key
FROM
	pg_class AS pgc
JOIN pg_namespace AS pgn ON
	pgn.oid = pgc.relnamespace
JOIN (
	SELECT
		attrelid, min(CASE attisdistkey WHEN 't' THEN attname ELSE NULL END) AS dist_key
	FROM
		pg_attribute
	GROUP BY
		1) AS det ON
	det.attrelid = pgc.oid
WHERE
	Schema_Name IN ('<schema_name>')
	AND Table_Name IN ('<table_name>')
ORDER BY
	1, 2;

Verifying if any invalid UTF-8 characters were replaced

For Amazon Redshift, enter the following code:

-- Validate if any invalid characters are replaced with '?' during COPY

SELECT userid,
       slice,
       tbl,
       starttime,
       session,
       query,
       filename,
       line_number,
       colname,
       raw_line 
FROM   stl_replacements;

Identifying COPY errors

For Amazon Redshift, enter the following code:

SELECT
	ti."schema",
	ti."table",
	starttime,
	err.tbl,
	err.colname,
	err.type,
	err.col_length,
	err.position,
	err.err_reason,
	err.filename,
	err.raw_line,
	err.raw_field_value
FROM
	stl_load_errors err,
	svv_table_info ti
WHERE
	starttime > '<YYYY-MM-DD>'
	AND ti.table_id = err.tbl
	AND ti."table" = '<Table_Name>'
ORDER BY
	1, 2, 3 DESC;

Additional data validation checks

In addition to checking the row count for each table, you should perform tests on data quality to guarantee production data readiness:

  • During this activity, run tailored queries and validate them against Amazon Redshift tables and views. The recommendation is to run such checks against records that include NULL values as well as strings including trailing whitespaces.
  • Compute and compare statistics (min, max, average, sum, checksums) on numeric attributes against Netezza equivalents.

Conclusion

In this post, we detailed a project migration plan to migrate from Netezza to Amazon Redshift. We included examples of sizing the AWS SCT data extraction agents depending on the volume of data to migrate and the resources made available for the transfer. Validation of successful schema and data migration is vital, and we included several scripts to validate that the data model and data content meet expectations.

Special thanks go to AWS colleagues Arturo Bayo, Boopathi P, and Sunil Vora for their project delivery and support with this post.


About the Authors

Mattia Berlusconi is a Data & Analytics consultant with AWS Professional Services supporting enterprises in adopting innovative solutions for organizing and exploiting data to achieve their business objectives. He is specialized in building data platforms and managing database migrations.

 

 

 

Simon Dimaline has specialised in data warehousing and data modelling for more than 20 years. He currently works for the Data & Analytics practice within AWS Professional Services accelerating customers’ adoption of AWS analytics services.

 

 

 

 

Фирми на Путин и Коза Ностра върлуват у нас Лукашенко строи Турски поток в България с европари

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%BB%D1%83%D0%BA%D0%B0%D1%88%D0%B5%D0%BD%D0%BA%D0%BE-%D1%81%D1%82%D1%80%D0%BE%D0%B8-%D1%82%D1%83%D1%80%D1%81%D0%BA%D0%B8-%D0%BF%D0%BE%D1%82%D0%BE%D0%BA-%D0%B2-%D0%B1%D1%8A%D0%BB%D0%B3%D0%B0%D1%80.html

четвъртък 8 октомври 2020


България рискува санкции не само от страна на САЩ, но и от страна на Европейската комисия (ЕК) заради ударно реализирания проект „Турски поток 2“ (Балкански поток). Проверката на „Биволъ“ и БОЕЦ на строежа в с. Макреш и в офиса в гр. Брусарци установи, че работещите там са предимно руски и беларуски граждани. Оказа се, че те са от държавния концерн „Белтрубопроводстрой“, който е дългодишен подпизпълнител на „Газпром“. А Литва през 2019 г. отстранява беларуския концерн от търга за интерконектор към Полша, тъй като е „заплаха за националната сигурност“. Беларусите са искали 100 млн. евро за тръбата, която литовски компании са предложили да изградят за 80 млн. евро. Същевременно данните, че саудитската „Аркад“(Arkad) и италианската „Бонати“(Bonatti) на ОПГ Коза Ностра са само „европейският“ параван за руската „Ай Ди Си“ (IDC) и наелия го, санкциониран от ЕС и САЩ „Газпром“. Потвърждава го разкритието на BIRD.bg за 9 млн. европари за интерконектор със Сърбия, налети в строежа на забранените на територията на ЕС компании.

Посланикът на България в Минск Георги Василев е връчил нота на Външното министерство на Беларус в четвъртък, с която го уведомява за решението на българското правителство да го отзове за консултации, информира “Свободна Европа”. Новината беше съобщена от външния министър Екатерина Захариева във Фейсбук страницата ѝ, след срещата ѝ с лидера на беларуската опозиция Светлана Цихановская в Братислава. Дали обаче вицепремиерката от ГЕРБ ще има смелостта да действа и срещу подчинената на Лукашенко “Белтрубопроводстрой”?

Надписът “Вход по пропускам” върху портата е сигнал, че Брусарци е поредният анклав на Руската федерация в България, където дори табелите често не са на държавния български език. Но не само  санкционираният заради анексията на Крим и нахлуването в Донбас от страна на Владимир Путин „Газпром“ се чувства свободно и дори фриволно на територията на България, която все още е член на ЕС и НАТО.

Нелегитимният на територията на ЕС и САЩ президент на Република Беларус Александър Лукашенко успешно усвоява  европейски фондове в България. Това се случва през строежа на газопровода „Турски поток 2“, наричан от премиера Бойко Борисов „Балкански поток“, установи „Биволъ“ в партньорство с ГД „България Обединена с Една Цел“ (БОЕЦ).

Споменатата пред БОЕЦ от беларуските работници като прекия им работодател „Белтрубопроводстрой“ е държавен концерн, а това означава да е с принципал в лицето на правителството в Минск. Това потвърждава и документ от Търговския регистър на Република Беларус, предоставен на „Биволъ“ от журналиста на опозиционната беларуска телевизия БЕЛСАТ ТВ Стас Ивас.

Същевременно, в сряда партньорското на „Биволъ“ издание Bird.bg публикува в страницата си във Фейсбук информация за над 9 млн. лв. еврофондове,които може да са пренасочени към проекта на руския монополист „Газпром“ в България. Проверката на екипите на „Биволъ“ и БОЕЦ на място в с. Макреш и в база 1 в гр. Брусарци пък показа липсата на български работници по строежите на „Турски поток 2“ край Монтана.

Правителството налива в строежа на Турски поток и европейски средства, алармираха от BIRD.bg. Става дума за средства по проекта „Техническа помощ за завършване на подготвителните дейности, необходими за старта на строителството на междусистемна газова връзка България – Сърбия“.

Най-вероятно става дума за бъдещия интерконектор с Република Сърбия, за каквото настоява ЕК след газовата криза от 2009 г. Но същевременно трасето изведнъж да е “съвпаднало” с Балканския поток, който според “Булгартрансгаз” (БТГ) е просто “разширяване на газопреносната система на България, очевидно “случайно” препокрил с маршрута на “Турски поток 2” към Сърбия и Унгария.

“Дата на сключване на договора/заповедта: 07.10.2020. 9 240 420.00 BGN. Процент на съфинансиране от ЕС 85.00 %”, писа медията.

Същевременно има непотвърдена информация, че италианският клон на “Аркад” за Европа с офис в Милано може да е бил изкупен от руски компании. “Аркад” е топ подизпълнител на саудитския “Газпром” – Saudi Aramco, но журналистът от IRPI Лука Риналди още през 2019 г. коментира за “Биволъ”, че компанията не извършва никаква дейност нито в Италия, нито на територията на Европа.
Изведнъж обаче компанията без един багер на континента, печели търга за изграждане на газопровод в България.
“То не бе заобикаляне на ЗОП през фасадната фирма Аркад /част от договорки между Газпром и Сауди Арамко и по-скоро Путинови договорки, в които Борисов просто игра паж/, не бяха прескачания на разрешения за строителство, екооценки, възлагания на чуждестранни компанни, без да минат разрешителни процедури за работниците и служещите”.
“Да не говорим за надзорници и руски охранителни фирми, които нямат разрешително за тази дейност като фирми и като конкретни лица, които реализират тази дейност”, написа Илиян Василев след посещението на “Биволъ” и БОЕЦ в Брусарци.

„Белтрубопроводствой“ – заплаха за националната сигурност на Литва

През 2019 г. Министерството на енергетиката на Литва внесе съдебна жалба срещу „Белтрубопроводстрой“, с която пледира за 10,2 млн. евро неустойки при реализиране на проект.

Беларуският концерн бе изключен от конкурса за държавна поръчка заедно с партньорската компания KRS (Каунас, Литва). Двете претендираха да строят полско-литовския газов интерконектор Gas Interconnector Poland – Lithuania (GIPL).

Причината за изключването от конкурса бе, че дружеството на Лукашенко представлява

«заплаха за националната сигурност на Литовската Република».

Стойността на проекта GIPL бе оценена от беларуския концерн на около 100 млн. евро. Същевременно консорциумът между литовските компании Alvora и Siauliu dujotiekio statyba е предложил почти 30% по-ниска цена от 79,8 млн. евро, с което са спестени парите на местните данъкоплатци.

Литовските служби препоръчаха на правителството във Вилнюс да отстранят участника в конкурса от Беларус, тъй като е била налице

„информация за контакти с предизвикващи риск лица“

 

Комисията за надзор върху сделките от стратегически дружества установи, че „основните клиенти на „Белтрубопроводстрой“ са руски и беларуски предприятия, включително руският „Газпром“, от които са постъпващите приходи“.

Литовският аналог на ДАНС – Департамент за държавна сигурност, е изготвил доклад, маркиран с „Конфиденциално“ за дейността на „Белтрубопроводстрой“:

 „Такава сделка би представлявала заплаха за националната сигурност на Литовската Република“.

Сред клиентите на БТС са не само „Газпром“, но и „Стройтрансгаз“ на двамата приятели на Путин братята Ротенберг, „Роснефтегазстрой“ и др. БТС участва в проекта за изграждане на газопровода „Ямал – Европа“, писа тогава DELFI.lt. 

„Рисковете от участието им в проекта надвишават риска от преразпределянето на ресурсите и забавянето на проекта за газопровод Литва – Полша“

Литовското контраразузнаване не е пропуснало активността на руските и беларуските служби при изграждането на газопроводи на място, което може да обясни и „секретността“ на обекта в Брусарци.

„Руските и беларуските служби отделят внимание при проучването на инфраструктурата, която е важна за осигуряване на националната сигурност на Литва, руското разузнаване проявява активен интерес към енергийния сектор на страната ни“.

Руски ботуш срещу 10 стотинки на квадрат?

База 1 в Брусарци е офис за поне 50 рускоговорящи служители и склад за десетки контейнери и пелети с оборудване. И е на територията на общинския пазар, а според документите на БОЕЦ, сред които списък на гласувалите в Общинския съвет на града, те са отдали територията на „Ай Ди Си“ срещу смешните 10 стотинки за квадрат като наем.

Но линейният метър тръбопровод, за който плаща „Булгартрангаз“ като компания в рамките на държавния Български енергиен холдинг с принципал министър на икономиката, струва много повече.

„В същото време от документите, с които БОЕЦ разполагаме става ясно, че същата руска фирма, която прибира по 10 000 лв., за да зарови метър тръба,

“плаща” на община Брусарци по десет стотинки за квадратен метър наем за своята база, разположена на територията на общинския пазар в градчето!

Решението е взето единодушно от ОбС, доминиран от ГЕРБ и кметицата на ГЕРБ“, информираха от гражданското сдружение.

За сметката на това, имаше работници на „Белтрубопроводстрой“ и руската компания „Ай Ди Си“, която е изградила сръбския участък от „Турски поток 2“. Техниката и автопаркът на мястото са докарани от  Турция, Сърбия, а някои от колите на „Ай Ди Си“ пък бяха с румънска регистрация.

Единствените български компании на обектите в Макреш и Брусарци, които засече проверката на БОЕЦ и „Биволъ“, бяха заети с обслужващи дейности. Става дума за автобусната компания „Юнион Ивкони“, охранителната фирма „СОТ 161“ и спедиторския оператор „Фрилайн“ ООД.

Контейнерите с оборудването за газопровода, засечени от „Биволъ“ в базата на „Ай Ди Си“ и „Аркад“ в Брусарци, са с маркировка на гръцки, както и със стикерите на българската Free Line . „Фрилайн ООД е създадена през 2004 г., като според сайта й работи с групажна база в Гърция. Регистрирана е на адрес: гр. София, бул. Илиянци 92Г.

Компанията твърди, че като цяло е

„ориентирана към индивидуалните желания на клиента“,

какъвто в случая е съдружието „Аркад“ – „Ай Ди Си“. Дружеството акцентира, че „е асоцииран член на Националното Сдружение на Българските Спедитори (НСБС) и на най-голямата спедиторска мрежа в света – WCA, с 4783 офиса в 188 страни“.

„Юнион Ивкони“ е една от най-влиятелните с лобито си компании в сектора. Собственикът й Ивайло Константинов бе в ръководството на Асоциацията на автобусните превозвачи на България, който, според някои данни, е зет на бившия депутат от ДПС Рамадан Аталай, заседавал много години в Комисията по транспорта в Народното събрание.

Семки за…6 млн. лв.

„А на такъв обект няма ли, как мислите?“, бе отговорът на единия от охранителите на базата на „Аркад“ и „Ай Ди Си“ в Брусарци, запитан има ли руски военни или охранители там.

След като репортерът на „Биволъ“ и лидерът на БОЕЦ бяха изблъскани със сила извън бариерата, на мястото пристигна първо полицейска патрулка, а след това и шефът на Полицейското управление в Брусарци ст. инспектор Данаил Дончев.

Но нападателят „изчезна“: облеченият с военна униформа на Руската федерация без отличителни знаци служител на „Ай Ди Си“ не бе изведен извън офиса, а личността му не бе „установена“. Охранителят от „СОТ 161“, според който това бил „руски колега“, също забрави, че се познава с руснака: „Мога ли да помня имена на 100 души?“.

Бъбривите охранители на „СОТ 161“, които вместо да пресичат конфликти, оставят руски служители, представени като „охрана“, да блъскат представители на „Биволъ“ и БОЕЦ, възмутиха мнозина, гледали стриймовете ни във Фейсбук. Охранителите през цялото време люпят семки, снимат с телефоните, подхвърлят коментари за социално-политическото състояние на държавата и се молят „Биволъ“ и БОЕЦ да не ги докарат до уволнение.

Семките им обаче вече струват 6 млн. лв. на българските данъкоплатци, тъй като толкова са държавните поръчки, според търсачката на „Биволъ“ BIRD, прибрани от собствениците на „СОТ 161“. Формалният собственик е инж. Иван Виденов, който представлява дружеството на официално ниво.

През 2017 г. в. „Капитал“ обяви „СОТ 161“ за най-голямото охранително дружество и „лидер на пазара“, след като компанията изкупи конкурента си „Секюрити холдинг“ в сделка с британската G4S.

Според изданието, само в София, където е съсредоточена дейността й в областта на техническата охрана на обекти, “СОТ 161” има над 60 000 обекта, като на смяна в столицата работят по 47 патрула. “Общият брой на охраняваните обекти в страната пък надхвърля 90 хиляди. По оценки на КЗК това дава на дружеството пазарен дял в сегмента от порядъка на 10-20%“, писа „Капитал“ тогава.

По данни за 2015 г. приходите на “СОТ 161” са малко под 64 млн. лв. Печалбата за годината е 8 млн. лв. Иван Виденов твърди тогава пред „Капитал“, че

„значителна част от тях идват от сегмента на физическата охрана“,

макар и да не уточнява каква точно и кого така изгодно охранява дружеството му.

Въпроси без отговори

Единственият въпрос към ДАНС е защо компаниите, които са санкционирани на територията на ЕС като „Газпром“ и „Белтрубопроводстрой“ свободно работят и си докарват служители в България.

Същият въпрос може да бъде отправен и към Сърбия, след като властите в Белград начело с президента Александър Вучич настояват за влизане в ЕС.

Към прокуратурата остава въпросът защо италианската „Бонати“, за която Investigative Reporting Project Italy (IRPI)„Биволъ“ и „Капитал“ многократно писаха, че е свързвана с италианската ОПГ „Коза Ностра“, не е проверена и отстранена като участник в търга за газопровода през 2019 г.

На всичкото отгоре, след като италианският клон на „Аркад“ с офис в Милано печели проекта, „Бонати“ от конкурент бързо се „преквалифицира“ на „подизпълнител“.

А въпросът защо руските охранители се разпореждат върху общинските обекти на държава – членка на НАТО И ЕС, използвайки сила срещу медии и НПО-та, остана без отговор.

„Не коментираме…“, заяви пред „Биволъ“ шефът на полицията в Брусарци Данаил Дончев.

Същевременно БОЕЦ обявиха нова акция “Турски поток – новият Росенец”, с която призоваха гражданите сами да спрат скъпоструващия проект на последните европейски диктатори Реджеп Тайип Ердоган, Владимир Путин и Александър Лукашенко. Идеята е в 13 ч. в събота гражданите да посетят Брусарци, където да кажат “не” на реализацията на газопровода.

 

Building Extensions for AWS Lambda – In preview

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-extensions-for-aws-lambda-in-preview/

AWS Lambda is announcing a preview of Lambda Extensions, a new way to easily integrate Lambda with your favorite monitoring, observability, security, and governance tools. Extensions enable tools to integrate deeply into the Lambda execution environment to control and participate in Lambda’s lifecycle. This simplified experience makes it easier for you to use your preferred tools across your application portfolio today.

In this post I explain how Lambda extensions work, the changes to the Lambda lifecycle, and how to build an extension. To learn how to use extensions with your functions, see the companion blog post “Introducing AWS Lambda extensions”.

Extensions are built using the new Lambda Extensions API, which provides a way for tools to get greater control during function initialization, invocation, and shut down. This API builds on the existing Lambda Runtime API, which enables you to bring custom runtimes to Lambda.

You can use extensions from AWS, AWS Lambda Ready Partners, and open source projects for use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. You can also build your own extensions to integrate your own tooling using the Extensions API.

There are extensions available today for AppDynamics, Check Point, Datadog, Dynatrace, Epsagon, HashiCorp, Lumigo, New Relic, Thundra, Splunk, AWS AppConfig, and Amazon CloudWatch Lambda Insights. For more details on these, see “Introducing AWS Lambda extensions”.

The Lambda execution environment

Lambda functions run in a sandboxed environment called an execution environment. This isolates them from other functions and provides the resources, such as memory, specified in the function configuration.

Lambda automatically manages the lifecycle of compute resources so that you pay for value. Between function invocations, the Lambda service freezes the execution environment. It is thawed if the Lambda service needs the execution environment for subsequent invocations.

Previously, only the runtime process could influence the lifecycle of the execution environment. It would communicate with the Runtime API, which provides an HTTP API endpoint within the execution environment to communicate with the Lambda service.

Lambda and Runtime API

Lambda and Runtime API

The runtime uses the API to request invocation events from Lambda and deliver them to the function code. It then informs the Lambda service when it has completed processing an event. The Lambda service then freezes the execution environment.

The runtime process previously exposed two distinct phases in the lifecycle of the Lambda execution environment: Init and Invoke.

1. Init: During the Init phase, the Lambda service initializes the runtime, and then runs the function initialization code (the code outside the main handler). The Init phase happens either during the first invocation, or in advance if Provisioned Concurrency is enabled.

2. Invoke: During the invoke phase, the runtime requests an invocation event from the Lambda service via the Runtime API, and invokes the function handler. It then returns the function response to the Runtime API.

After the function runs, the Lambda service freezes the execution environment and maintains it for some time in anticipation of another function invocation.

If the Lambda function does not receive any invokes for a period of time, the Lambda service shuts down and removes the environment.

Previous Lambda lifecycle

Previous Lambda lifecycle

With the addition of the Extensions API, extensions can now influence, control, and participate in the lifecycle of the execution environment. They can use the Extensions API to influence when the Lambda service freezes the execution environment.

AWS Lambda execution environment with the Extensions API

AWS Lambda execution environment with the Extensions API

Extensions are initialized before the runtime and the function. They then continue to run in parallel with the function, get greater control during function invocation, and can run logic during shut down.

Extensions allow integrations with the Lambda service by introducing the following changes to the Lambda lifecycle:

  1. An updated Init phase. There are now three discrete Init tasks: extensions Init, runtime Init, and function Init. This creates an order where extensions and the runtime can perform setup tasks before the function code runs.
  2. Greater control during invocation. During the invoke phase, as before, the runtime requests the invocation event and invokes the function handler. In addition, extensions can now request lifecycle events from the Lambda service. They can run logic in response to these lifecycle events, and respond to the Lambda service when they are done. The Lambda service freezes the execution environment when it hears back from the runtime and all extensions. In this way, extensions can influence the freeze/thaw behavior.
  3. Shutdown phase: we are now exposing the shutdown phase to let extensions stop cleanly when the execution environment shuts down. The Lambda service sends a shut down event, which tells the runtime and extensions that the environment is about to be shut down.
New Lambda lifecycle with extensions

New Lambda lifecycle with extensions

Each Lambda lifecycle phase starts with an event from the Lambda service to the runtime and all registered extensions. The runtime and extensions signal that they have completed by requesting the Next invocation event from the Runtime and Extensions APIs. Lambda freezes the execution environment and all extensions when there are no pending events.

Lambda lifecycle for execution environment, runtime, extensions, and function.png

Lambda lifecycle for execution environment, runtime, extensions, and function.png

For more information on the lifecycle phases and the Extensions API, see the documentation.

How are extensions delivered and run?

You deploy extensions as Lambda layers, which are ZIP archives containing shared libraries or other dependencies.

To add a layer, use the AWS Management Console, AWS Command Line Interface (AWS CLI), or infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), and Terraform.

When the Lambda service starts the function execution environment, it extracts the extension files from the Lambda layer into the /opt directory. Lambda then looks for any extensions in the /opt/extensions directory and starts initializing them. Extensions need to be executable as binaries or scripts. As the function code directory is read-only, extensions cannot modify function code.

Extensions can run in either of two modes, internal and external.

  • Internal extensions run as part of the runtime process, in-process with your code. They are not separate processes. Internal extensions allow you to modify the startup of the runtime process using language-specific environment variables and wrapper scripts. You can use language-specific environment variables to add options and tools to the runtime for Java Correto 8 and 11, Node.js 10 and 12, and .NET Core 3.1. Wrapper scripts allow you to delegate the runtime startup to your script to customize the runtime startup behavior. You can use wrapper scripts with Node.js 10 and 12, Python 3.8, Ruby 2.7, Java 8 and 11, and .NET Core 3.1. For more information, see “Modifying-the-runtime-environment”.
  • External extensions allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start before the runtime process, and can continue after the runtime shuts down. External extensions work with Node.js 10 and 12, Python 3.7 and 3.8, Ruby 2.5 and 2.7, Java Corretto 8 and 11, .NET Core 3.1, and custom runtimes.

External extensions can be written in a different language to the function. We recommend implementing external extensions using a compiled language as a self-contained binary. This makes the extension compatible with all of the supported runtimes. If you use a non-compiled language, ensure that you include a compatible runtime in the extension.

Extensions run in the same execution environment as the function, so share resources such as CPU, memory, and disk storage with the function. They also share environment variables, in addition to permissions, using the same AWS Identity and Access Management (IAM) role as the function.

For more details on resources, security, and performance with extensions, see the companion blog post “Introducing AWS Lambda extensions”.

For example extensions and wrapper scripts to help you build your own extensions, see the GitHub repository.

Showing extensions in action

The demo shows how external extensions integrate deeply with functions and the Lambda runtime. The demo creates an example Lambda function with a single extension using either the AWS CLI, or AWS SAM.

The example shows how an external extension can start before the runtime, run during the Lambda function invocation, and shut down after the runtime shuts down.

To set up the example, visit the GitHub repo, and follow the instructions in the README.md file.

The example Lambda function uses the custom provided.al2 runtime based on Amazon Linux 2. Using the custom runtime helps illustrate in more detail how the Lambda service, Runtime API, and the function communicate. The extension is delivered using a Lambda layer.

The runtime, function, and extension, log their status events to Amazon CloudWatch Logs. The extension initializes as a separate process and waits to receive the function invocation event from the Extensions API. It then sleeps for 5 seconds before calling the API again to register to receive the next event. The extension sleep simulates the processing of a parallel process. This could, for example, collect telemetry data to send to an external observability service.

When the Lambda function is invoked, the extension, runtime and function perform the following steps. I walk through the steps using the log output.

1. The Lambda service adds the configured extension Lambda layer. It then searches the /opt/extensions folder, and finds an extension called extension1.sh. The extension executable launches before the runtime initializes. It registers with the Extensions API to receive INVOKE and SHUTDOWN events using the following API call.

curl -sS -LD "$HEADERS" -XPOST "http://${AWS_LAMBDA_RUNTIME_API}/2020-01-01/extension/register" --header "Lambda-Extension-Name: ${LAMBDA_EXTENSION_NAME}" -d "{ \"events\": [\"INVOKE\", \"SHUTDOWN\"]}" > $TMPFILE
Extension discovery, registration, and start

Extension discovery, registration, and start

2. The Lambda custom provided.al2 runtime initializes from the bootstrap file.

Runtime initialization

Runtime initialization

3. The runtime calls the Runtime API to get the next event using the following API call. The HTTP request is blocked until the event is received.

curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next" > $TMPFILE &

The extension calls the Extensions API and waits for the next event. The HTTP request is again blocked until one is received.

curl -sS -L -XGET "http://${AWS_LAMBDA_RUNTIME_API}/2020-01-01/extension/event/next" --header "Lambda-Extension-Identifier: ${EXTENSION_ID}" > $TMPFILE &
Runtime and extension call APIs to get the next event

Runtime and extension call APIs to get the next event

4. The Lambda service receives an invocation event. It sends the event payload to the runtime using the Runtime API. It sends an event to the extension informing it about the invocation, using the Extensions API.

Runtime and extension receive event

Runtime and extension receive event

5. The runtime invokes the function handler. The function receives the event payload.

Runtime invokes handler

Runtime invokes handler

6. The function runs the handler code. The Lambda runtime receives back the function response and sends it back to the Runtime API with the following API call.

curl -sS -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$RESPONSE" > $TMPFILE
Runtime receives function response and sends to Runtime API

Runtime receives function response and sends to Runtime API

7. The Lambda runtime then waits for the next invocation event (warm start).

Runtime waits for next event

Runtime waits for next event

8. The extension continues processing for 5 seconds, simulating the processing of a companion process. The extension finishes, and uses the Extensions API to register again to wait for the next event.

Extension processing

Extension processing

9. The function invocation report is logged.

Function invocation report

Function invocation report

10. When Lambda is about to shut down the execution environment, it sends the Runtime API a shut down event.

Lambda runtime shut down event

Lambda runtime shut down event

11. Lambda then sends a shut down event to the extensions. The extension finishes processing and then shuts down after the runtime.

Lambda extension shut down event

Lambda extension shut down event

The demo shows the steps the runtime, function, and extensions take during the Lambda lifecycle.

An external extension registers and starts before the runtime. When Lambda receives an invocation event, it sends it to the runtime. It then sends an event to the extension informing it about the invocation. The runtime invokes the function handler, and the extension does its own processing of the event. The extension continues processing after the function invocation completes. When Lambda is about to shut down the execution environment, it sends a shut down event to the runtime. It then sends one to the extension, so it can finish processing.

To see a sequence diagram of this flow, see the Extensions API documentation.

Pricing

Extensions share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served and the combined compute time used to run your code and all extensions, in 100 ms increments. To learn more about the billing for extensions, visit the Lambda FAQs page.

Conclusion

Lambda extensions enable you to extend Lambda’s execution environment to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Extensions can run additional code; before, during, and after a function invocation. There are extensions available today from AWS Lambda Ready Partners. These cover use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. Extensions make it easier to use your existing tools with your serverless applications. For more information on the available extensions, see the companion post “Introducing Lambda Extensions – In preview“.

You can also build your own extensions to integrate your own tooling using the new Extensions API. For example extensions and wrapper scripts, see the GitHub repository.

Extensions are now available in preview in the following Regions: us-east-1, us-east-2, us-west-1, us-west-2, ca-central-1, eu-west-1, eu-west-2, eu-west-3, eu-central-1, eu-north-1, eu-south-1, sa-east-1, me-south-1, ap-northeast-1, ap-northeast-2, ap-northeast-3, ap-southeast-1, ap-southeast-2, ap-south-1, and ap-east-1.

For more serverless learning resources, visit https://serverlessland.com.

Introducing AWS Lambda Extensions – In preview

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-extensions-in-preview/

AWS Lambda is announcing a preview of Lambda Extensions, a new way to easily integrate Lambda with your favorite monitoring, observability, security, and governance tools. In this post I explain how Lambda extensions work, how you can begin using them, and the extensions from AWS Lambda Ready Partners that are available today.

Extensions help solve a common request from customers to make it easier to integrate their existing tools with Lambda. Previously, customers told us that integrating Lambda with their preferred tools required additional operational and configuration tasks. In addition, tools such as log agents, which are long-running processes, could not easily run on Lambda.

Extensions are a new way for tools to integrate deeply into the Lambda environment. There is no complex installation or configuration, and this simplified experience makes it easier for you to use your preferred tools across your application portfolio today. You can use extensions for use-cases such as:

  • capturing diagnostic information before, during, and after function invocation
  • automatically instrumenting your code without needing code changes
  • fetching configuration settings or secrets before the function invocation
  • detecting and alerting on function activity through hardened security agents, which can run as separate processes from the function

You can use extensions from AWS, AWS Lambda Ready Partners, and open source projects. There are extensions available today for AppDynamics, Check Point, Datadog, Dynatrace, Epsagon, HashiCorp, Lumigo, New Relic, Thundra, Splunk SignalFX, AWS AppConfig, and Amazon CloudWatch Lambda Insights.

You can learn how to build your own extensions, in the companion post “Building Extensions for AWS Lambda – In preview“.

Overview

Lambda Extensions is designed to be the easiest way to plug in the tools you use today without complex installation or configuration management. You deploy extensions as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform. You can use Stackery to automate the integration of extensions from Epsagon, New Relic, Lumigo, and Thundra.

There are two components to the Lambda Extensions capability: the Extensions API and extensions themselves. Extensions are built using the new Lambda Extensions API which provides a way for tools to get greater control during function initialization, invocation, and shut down. This API builds on the existing Lambda Runtime API, which enables you to bring custom runtimes to Lambda.

AWS Lambda execution environment with the Extensions API

AWS Lambda execution environment with the Extensions API

Most customers will use extensions without needing to know about the capabilities of the Extensions API that enables them. You can just consume capabilities of an extension by configuring the options in your Lambda functions. Developers who build extensions use the Extensions API to register for function and execution environment lifecycle events.

Extensions can run in either of two modes – internal and external.

  • Internal extensions run as part of the runtime process, in-process with your code. They allow you to modify the startup of the runtime process using language-specific environment variables and wrapper scripts. Internal extensions enable use cases such as automatically instrumenting code.
  • External extensions allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start before the runtime process, and can continue after the runtime shuts down. External extensions enable use cases such as fetching secrets before the invocation, or sending telemetry to a custom destination outside of the function invocation. These extensions run as companion processes to Lambda functions.

For more information on the Extensions API and the changes to the Lambda lifecycle, see “Building Extensions for AWS Lambda – In preview

AWS Lambda Ready Partners extensions available at launch

Today, you can use extensions with the following AWS and AWS Lambda Ready Partner’s tools, and there are more to come:

  • AppDynamics provides end-to-end transaction tracing for AWS Lambda. With the AppDynamics extension, it is no longer mandatory for developers to include the AppDynamics tracer as a dependency in their function code, making tracing transactions across hybrid architectures even simpler.
  • The Datadog extension brings comprehensive, real-time visibility to your serverless applications. Combined with Datadog’s existing AWS integration, you get metrics, traces, and logs to help you monitor, detect, and resolve issues at any scale. The Datadog extension makes it easier than ever to get telemetry from your serverless workloads.
  • The Dynatrace extension makes it even easier to bring AWS Lambda metrics and traces into the Dynatrace platform for intelligent observability and automatic root cause detection. Get comprehensive, end-to-end observability with the flip of a switch and no code changes.
  • Epsagon helps you monitor, troubleshoot, and lower the cost for your Lambda functions. Epsagon’s extension reduces the overhead of sending traces to the Epsagon service, with minimal performance impact to your function.
  • HashiCorp Vault allows you to secure, store, and tightly control access to your application’s secrets and sensitive data. With the Vault extension, you can now authenticate and securely retrieve dynamic secrets before your Lambda function invokes.
  • Lumigo provides a monitoring and observability platform for serverless and microservices applications. The Lumigo extension enables the new Lumigo Lambda Profiler to see a breakdown of function resources, including CPU, memory, and network metrics. Receive actionable insights to reduce Lambda runtime duration and cost, fix bottlenecks, and increase efficiency.
  • Check Point CloudGuard provides full lifecycle security for serverless applications. The CloudGuard extension enables Function Self Protection data aggregation as an out-of-process extension, providing detection and alerting on application layer attacks.
  • New Relic provides a unified observability experience for your entire software stack. The New Relic extension uses a simpler companion process to report function telemetry data. This also requires fewer AWS permissions to add New Relic to your application.
  • Thundra provides an application debugging, observability and security platform for serverless, container and virtual machine (VM) workloads. The Thundra extension adds asynchronous telemetry reporting functionality to the Thundra agents, getting rid of network latency.
  • Splunk offers an enterprise-grade cloud monitoring solution for real-time full-stack visibility at scale. The Splunk extension provides a simplified runtime-independent interface to collect high-resolution observability data with minimal overhead. Monitor, manage, and optimize the performance and cost of your serverless applications with Splunk Observability solutions.
  • AWS AppConfig helps you manage, store, and safely deploy application configurations to your hosts at runtime. The AWS AppConfig extension integrates Lambda and AWS AppConfig seamlessly. Lambda functions have simple access to external configuration settings quickly and easily. Developers can now dynamically change their Lambda function’s configuration safely using robust validation features.
  • Amazon CloudWatch Lambda Insights enables you to efficiently monitor, troubleshoot, and optimize Lambda functions. The Lambda Insights extension simplifies the collection, visualization, and investigation of detailed compute performance metrics, errors, and logs. You can more easily isolate and correlate performance problems to optimize your Lambda environments.

You can also build and use your own extensions to integrate your organization’s tooling. For instance, the Cloud Foundations team at Square has built their own extension. They say:

The Cloud Foundations team at Square works to make the cloud accessible and secure. We partnered with the Security Infrastructure team, who builds infrastructure to secure Square’s sensitive data, to enable serverless applications at Square,​ and ​provide mTLS identities to Lambda​.

Since beginning work on Lambda, we have focused on creating a streamlined developer experience. Teams adopting Lambda need to learn a lot about AWS, and we see extensions as a way to abstract away common use cases. For our initial exploration, we wanted to make accessing secrets easy, as with our current tools each Lambda function usually pulls 3-5 secrets.

The extension we built and open source fetches secrets on cold starts, before the Lambda function is invoked. Each function includes a configuration file that specifies which secrets to pull. We knew this configuration was key, as Lambda functions should only be doing work they need to do. The secrets are cached in the local /tmp directory, which the function reads when it needs the secret data. This makes Lambda functions not only faster, but reduces the amount of code for accessing secrets.

Showing extensions in action with AWS AppConfig

This demo shows an example of using the AWS AppConfig with a Lambda function. AWS AppConfig is a capability of AWS Systems Manager to create, manage, and quickly deploy application configurations. It lets you dynamically deploy external configuration without having to redeploy your applications. As AWS AppConfig has robust validation features, all configuration changes can be tested safely before rolling out to your applications.

AWS AppConfig has an available extension which gives Lambda functions access to external configuration settings quickly and easily. The extension runs a separate local process to retrieve and cache configuration data from the AWS AppConfig service. The function code can then fetch configuration data faster using a local call rather than over the network.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

The example creates an AWS AppConfig application, environment, and configuration profile. It stores a loglevel value, initially set to normal.

AWS AppConfig application, environment, and configuration profile

AWS AppConfig application, environment, and configuration profile

An AWS AppConfig deployment runs to roll out the initial configuration.

AWS AppConfig deployment

AWS AppConfig deployment

The example contains two Lambda functions that include the AWS AppConfig extension. For a list of the layers that have the AppConfig extension, see the blog post “AWS AppConfig Lambda Extension”.

As extensions share the same permissions as Lambda functions, the functions have execution roles that allow access to retrieve the AWS AppConfig configuration.

Lambda function add layer

Lambda function add layer

The functions use the extension to retrieve the loglevel value from AWS AppConfig, returning the value as a response. In a production application, this value could be used within function code to determine what level of information to send to CloudWatch Logs. For example, to troubleshoot an application issue, you can change the loglevel value centrally. Subsequent function invocations for both functions use the updated value.

Both Lambda functions are configured with an environment variable that specifies which AWS AppConfig configuration profile and value to use.

Lambda environment variable specifying AWS AppConfig profile

Lambda environment variable specifying AWS AppConfig profile

The functions also return whether the invocation is a cold start.

Running the functions with a test payload returns the loglevel value normal. The first invocation is a cold start.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": true,
  "LogLevel": "normal"
}

Subsequent invocations return the same value with ColdStart set to false.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": false,
  "LogLevel": "normal"
}

Create a new AWS Config hosted configuration profile version setting the loglevel value to verbose. Run a new AWS AppConfig deployment to update the value. The extension for both functions retrieves the new value. The function configuration itself is not changed.

Running another test invocation for both functions returns the updated value still without a cold start.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": false,
  "LogLevel": "verbose"
}

AWS AppConfig has worked seamlessly with Lambda to update a dynamic external configuration setting for multiple Lambda functions without having to redeploy the function configuration.

The only function configuration required is to add the layer which contains the AWS AppConfig extension.

Pricing

Extensions share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served and the combined compute time used to run your code and all extensions, in 100 ms increments. To learn more about the billing for extensions, visit the Lambda FAQs page.

Resources, security, and performance with extensions

Extensions run in the same execution environment as the function code. Therefore, they share resources with the function, such as CPU, memory, disk storage, and environment variables. They also share permissions, using the same AWS Identity and Access Management (IAM) role as the function.

You can configure up to 10 extensions per function, using up to five layers at a time. Multiple extensions can be included in a single layer.

The size of the extensions counts towards the deployment package limit. This cannot exceed the unzipped deployment package size limit of 250 MB.

External extensions are initialized before the runtime is started so can increase the delay before the function is invoked. Today, the function invocation response is returned after all extensions have completed. An extension that takes time to complete can increase the delay before the function response is returned. If an extension performs compute-intensive operations, function execution duration may increase. To measure the additional time the extension runs after the function invocation, use the new PostRuntimeExtensionsDuration CloudWatch metric to measure the extra time the extension takes after the function execution. To understand the impact of a specific extension, you can use the Duration and MaxMemoryUsed CloudWatch metrics, and run different versions of your function with and without the extension. Adding more memory to a function also proportionally increases CPU and network throughput.

The function and all extensions must complete within the function’s configured timeout setting which applies to the entire invoke phase.

Conclusion

Lambda extensions enable you to extend the Lambda service to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Today, you can install a number of available extensions from AWS Lambda Ready Partners. These cover use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. Extensions make it easier to use your existing tools with your serverless applications.

You can also build extensions to integrate your own tooling using the new Extensions API. For more information, see the companion post “Building Extensions for AWS Lambda – In preview“.

Extensions are now available in preview in the following Regions: us-east-1, us-east-2, us-west-1, us-west-2, ca-central-1, eu-west-1, eu-west-2, eu-west-3, eu-central-1, eu-north-1, eu-south-1, sa-east-1, me-south-1, ap-northeast-1, ap-northeast-2, ap-northeast-3, ap-southeast-1, ap-southeast-2, ap-south-1, and ap-east-1.

For more serverless learning resources, visit https://serverlessland.com.

Enhanced Ransomware Protection: Announcing Data Immutability With Backblaze B2 and Veeam

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/object-lock-data-immutability/

Protecting businesses and organizations from ransomware has become one of the most, if not the most, essential responsibilities for IT directors and CIOs. Ransomware attacks are on the rise, occuring every 14 seconds, but you likely already know that. That’s why a top requested feature for Backblaze’s S3 Compatible APIs is Veeam® immutability—to increase your organization’s protection from ransomware and malicious attacks.

We heard you and are happy to announce that Backblaze B2 Cloud Storage now supports data immutability for Veeam backups. It is available immediately.

The solution, which earned a Veeam Ready-Object with Immutability qualification, means a good, clean backup is just clicks away when reliable recovery is needed.

It is the only public cloud storage alternative to Amazon S3 to earn Veeam’s certifications for both compatibility and immutability. And it offers this at a fraction of the cost.

“I am happy to see Backblaze leading the way here as the first cloud storage vendor outside of AWS to give us this feature. It will hit our labs soon, and we’re eager to test this to be able to deploy it in production.”—Didier Van Hoye, Veeam Vanguard and Technology Strategist

Using Veeam Backup & Replication™, you can now simply check a box and make recent backups immutable for a specified period of time. Once that option is selected, nobody can modify, encrypt, tamper with, or delete your protected data. Recovering from ransomware is as simple as restoring from your clean, safe backup.

Freedom From Tape, Wasted Resources, and Concern

Prevention is the most pragmatic ransomware protection to implement. Ensuring that backups are up-to-date, off-site, and protected with a 3-2-1 strategy is the industry standard for this approach. But up to now, this meant that IT directors who wanted to create truly air-gapped backups were often shuttling tapes off-site—adding time, the necessity for on-site infrastructure, and the risk of data loss in transit to the process.

With object lock functionality, there is no longer a need for tapes or a Veeam virtual tape library. You can now create virtual air-gapped backups directly in the capacity tier of a Scale-out Backup Repository (SOBR). In doing so, data is Write Once, Read Many (WORM) protected, meaning that even during the locked period, data can be restored on demand. Once the lock expires, data can safely be modified or deleted as needed.

Some organizations have already been using immutability with Veeam and Amazon S3, a storage option more complex and expensive than needed for their backups. Now, Backblaze B2’s affordable pricing and clean functionality mean that you can easily opt in to our service to save up to 75% off of your storage invoice. And with our Cloud to Cloud Migration offers, it’s easier than ever to achieve these savings.

In either scenario, there’s an opportunity to enhance data protection while freeing up financial and personnel resources for other projects.

Backblaze B2 customer Alex Acosta, Senior Security Engineer at Gladstone
Institutes
—an independent life science research organization now focused on fighting COVID-19—explained that immutability can help his organization maintain healthy operations. “Immutability reduces the chance of data loss,” he noted, “so our researchers can focus on what they do best: transformative scientific research.”

Enabling Immutability

How to Set Object Lock:

Data immutability begins by creating a bucket that has object lock enabled. Then within your SOBR, you can simply check a box to make recent backups immutable and specify a period of time.

What Happens When Object Lock Is Set:

The true nature of immutability is to prevent modification, encryption, or deletion of protected data. As such, selecting object lock will ensure that no one can:

  • Manually remove backups from Capacity Tier.
  • Remove data using an alternate retention policy.
  • Remove data using lifecycle rules.
  • Remove data via tech support.
  • Remove by the “Remove deleted items data after” option in Veeam.

Once the lock period expires, data can be modified or deleted as needed.

Getting Started Today

With immutability set on critical data, administrators navigating a ransomware attack can quickly restore uninfected data from their immutable Backblaze backups, deploy them, and return to business as usual without painful interruption or expense.

Get started with improved ransomware protection today. If you already have Veeam, you can create a Backblaze B2 account to get started. It’s free, easy, and quick, and you can begin protecting your data right away.

The post Enhanced Ransomware Protection: Announcing Data Immutability With Backblaze B2 and Veeam appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

New Whitepaper: Selecting & Designing Your Hybrid Connectivity Model

Post Syndicated from Santiago Freitas original https://aws.amazon.com/blogs/architecture/new-whitepaper-selecting-designing-your-hybrid-connectivity-model/

Introduction

Many organizations need to connect their on-premises data centers, remote sites, and the cloud. A hybrid network connects these different environments.

A modern organization uses an extensive array of IT resources. In the past, it was common to host these resources in an on-premises data center or a colocation facility. With the increased adoption of cloud computing, IT resources are delivered and consumed from cloud service providers over a network connection. In some cases, organizations have opted to migrate all existing IT resources to the cloud. In other cases, organizations maintain IT resources both on premises and in the cloud. In both cases, a common network is required to connect on-premises and cloud resources. Coexistence of on-premises and cloud resources is called “hybrid cloud” and the common network connecting them is referred to as a “hybrid network. “ Even if your organization keeps all of its IT resources in the cloud, it may still require hybrid connectivity to remote sites.

There are several connectivity models to choose from. Although having options adds flexibility, selecting the best option requires analysis of the business and technical requirements and the elimination of options that are not suitable. Requirements can be grouped together across considerations, such as: security, time to deploy, performance, reliability, communication model, scalability, and more. Once requirements are carefully collected, analyzed, and considered, network and cloud architects identify applicable AWS hybrid network building blocks and solutions. To identify and select the optimal model(s), architects must understand advantages and disadvantages of each model. There are also technical limitations that might cause an otherwise good model to be excluded.

Consideration covered in the whitepaper

Figure 1 – Consideration covered on the whitepaper.

A new whitepaper on Hybrid Connectivity describes AWS building blocks and the key things to consider when deciding which hybrid connectivity model is right for you. To help you determine the best solution for your business and technical requirements, we provide decision trees to guide you through the logical selection process as well as a customer use case to show how to apply the considerations and decision trees in practice.

Decision tree applied to Example Corp. Automotive use case

Figure 2: Example Corp. Automotive connection type decision tree

Contributors

Contributors to this new whitepaper on Hybrid Connectivity are: Marwan Al Shawi, AWS Solutions Architect; Santiago Freitas, AWS Head of Technology; Evgeny Vaganov, AWS Specialist Solutions Architect – Networking; and Tom Adamski, AWS Specialist Solutions Architect – Networking. Special thanks to Stephen Bird, AWS Senior Program Manager – Content.

How teachers train in Computing with our free online courses

Post Syndicated from Michael Conterio original https://www.raspberrypi.org/blog/how-teachers-train-computing-free-online-courses/

Since 2017 we’ve been training Computing educators in England and around the world through our suite of free online courses on FutureLearn. Thanks to support from Google and the National Centre for Computing Education (NCCE), all of these courses are free for anyone to take, whether you are a teacher or not!

An illustration of a bootcamp for computing teachers

We’re excited that Computer Science educators at all stages in their computing journey have embraced our courses — from teachers just moving into the field to experienced educators looking for a refresher so that they can better support their colleagues.

Hear from two teachers about their experience of training with our courses and how they are benefitting!

Moving from Languages to IT to Computing

Rebecca Connell started out as a Modern Foreign Languages teacher, but now she is Head of Computing at The Cowplain School, a 11–16 secondary school in Hampshire.

Computing teacher Rebecca Connell
Computing teacher Rebecca finds our courses “really useful in building confidence and taking [her] skills further”.

Although she had plenty of experience with Microsoft Office and was happy teaching IT, at first she was daunted by the technical nature of Computing:

“The biggest challenge for me has been the move away from an IT to a Computing curriculum. To say this has been a steep learning curve is an understatement!”

However, Rebecca has worked with our courses to improve her coding knowledge, especially in Python:

“Initially, I undertook some one-day programming courses in Python. Recently, I have found the Raspberry Pi courses to be really useful in building confidence and taking my skills further. So far, I have completed Programming 101 — great for revision and teaching ideas — and am now into Programming 102.”

GCSE Computing is more than just programming, and our courses are helping Rebecca develop the rest of her Computing knowledge too:

“I am now taking some online Raspberry Pi courses on computer systems and networks to firm up my knowledge — my greatest fear is saying something that’s not strictly accurate! These courses have some good ideas to help explain complex concepts to students.”

She also highly rates the new free Teach Computing Curriculum resources we have developed for the NCCE:

“I really like the new resources and supporting materials from Raspberry Pi — these have really helped me to look again at our curriculum. They are easy to follow and include everything you need to take students forward, including lesson plans.”

And Rebecca’s not the only one in her department who is benefitting from our courses and resources:

“Our department is supported by an excellent PE teacher who delivers lessons in Years 7, 8, and 9. She has enjoyed completing some of the Raspberry Pi courses to help her to deliver the new curriculum and is also enjoying her learning journey.”

Refreshing and sharing your knowledge

Julie Price, a CAS Master Teacher and NCCE Computer Science Champion, has been “engaging with the NCCE’s Computer Science Accelerator programme, [to] be in a better position to appreciate and help to resolve any issues raised by fellow participants.”

Computing teacher Julie Price
Computer science teacher Julie Price says she is “becoming addicted” to our online courses!

“I have encountered new learning for myself and also expressions of very familiar content which I have found to be seriously impressive and, in some cases, just amazing. I must say that I am becoming addicted to the Raspberry Pi Foundation’s online courses!”

She’s been appreciating the open nature of the courses, as we make all of the materials free to use under the Open Government Licence:

“Already I have made very good use of a wide range of the videos, animations, images, and ideas from the Foundation’s courses.”

Julie particularly recommends the Programming Pedagogy in Secondary Schools: Inspiring Computing Teaching course, describing it as “a ‘must’ for anyone wishing to strengthen their key stage 3 programming curriculum.”

Join in and train with us

Rebecca and Julie are just 2 of more than 140,000 active participants we have had on our online courses so far!

With 29 courses to choose from (and more on the way!), from Introduction to Web Development to Robotics with Raspberry Pi, we have something for everyone — whether you’re a complete beginner or an experienced computer science teacher. All of our courses are free to take, so find one that inspires you, and let us support you on your computing journey, along with Google and the NCCE.

If you’re a teacher in England, you are eligible for free course certification from FutureLearn via the NCCE.

The post How teachers train in Computing with our free online courses appeared first on Raspberry Pi.

[$] Fixing our broken internet

Post Syndicated from original https://lwn.net/Articles/833625/rss

In unusually stark terms, Mozilla is trying to rally the
troops to take back the
internet from the forces of evil—or at least “misinformation,
corruption and greed
“—that have overtaken it. In a
September 30 blog
post
, the organization behind the Firefox web browser
warned that “the internet needs our love“. While there is lots to
celebrate about the internet, it is increasingly under threat from
various types of bad actors, so Mozilla is starting a campaign to try to
push back against those threats.

Federating Amazon Redshift access from OneLogin

Post Syndicated from Veerendra Nayak original https://aws.amazon.com/blogs/big-data/federating-amazon-redshift-access-from-onelogin/

You can use federation to access AWS accounts using credentials from a corporate directory, utilizing open standards such as SAML, to exchange identity and security information between an identity provider (IdP) and an application.

With this integration, you manage user identities to AWS resources centrally in IdPs. This improves enterprise security and removes the need for separate database users and passwords.

In this post, we walk through the steps required to set up Amazon Redshift user federation from OneLogin. Amazon Redshift supports SAML 2.0, and can be easily configured to integrate with OneLogin. For information about integrating with other IdPs, see Federate Amazon Redshift access with Microsoft Azure AD single sign-on and Federate Amazon Redshift access with Okta as an identity provider, respectively.

Solution overview

Amazon Redshift federated login with OneLogin involves the following steps:

  1. Create a OneLogin SAML application, users, and roles.
  2. Create two AWS Identity and Access Management (IAM) roles to support OneLogin integration with Amazon Redshift:
    1. A role to establish the trust relationship between IdP and AWS.
    2. A role that defines Amazon Redshift access policies.
  3. Edit the OneLogin application configuration and parameters using the AWS roles created in the previous step.
  4. Configure JDBC and ODBC clients to connect to Amazon Redshift using corporate credentials

Setting up your OneLogin user

If you don’t have OneLogin set up, you can sign up for a 30-day free trial.

  1. Sign in to OneLogin using the following URL: https://<orgname>.onelogin.com/admin (<orgname> is the name used when setting up the OneLogin account).
  2. On the Users page (https://<orgname>.onelogin.com/users), choose New User.
  3. On the Applications page, choose Add app.
  4. Choose Amazon Redshift JDBC/ODBC.
  5. After the application is created, choose SSO from the navigation pane.
  6. From the More Actions drop-down menu, choose SAML Metadata.

Setting up IAM

In this step, you configure your IdP in IAM and create roles to support OneLogin integration with Amazon Redshift.

Configuring IdP in IAM

To configure your IdP, complete the following steps:

  1. On the IAM console, choose Identity providers.
  2. Choose Create Provider.
  3. For Provider Type, choose SAML.
  4. For Provider Name, enter OneloginRedshift.
  5. For Metadata Document, choose the file that you downloaded in the earlier step.
  6. Choose Next.
  7. Choose Create.

Creating your IAM role

In this step, you create a new IAM role that users federated from OneLogin can assume.

  1. On the IAM console, create an IAM policy with the following permissions. In this policy, we allow Amazon Redshift to query data, create users, and allow users to join groups. For this use case, the sales and marketing groups are already created in Redshift.
    {
    	"Version": "2012-10-17",
    	"Statement": [{
        	"Effect": "Allow",
           	"Action": [
                    "redshift:CreateClusterUser",
                	"redshift:JoinGroup",
                	"redshift:GetClusterCredentials",
                    "redshift:ListSchemas",
                    "redshift:ListTables",
                    "redshift:ListDatabases",
                    "redshift:ExecuteQuery",
                    "redshift:FetchResults",
                	"redshift:CancelQuery",
                    "redshift:DescribeClusters",
                    "redshift:DescribeQuery",
                    "redshift:DescribeTable"],
           	"Resource": [
                 "arn:aws:redshift:<region>:<account>:cluster:<cluster_identifier>",
        	     "arn:aws:redshift:<region>:<account>:dbuser:<cluster_identifier>/${redshift:DBUser}",
                 "arn:aws:redshift:<region>:<account>:dbgroup:<cluster_identifier>/marketing",
                 "arn:aws:redshift:<region>:<account>:dbgroup:<cluster_identifier>/sales",
             	"arn:aws:redshift:<region>:<account>:dbname:<cluster_identifier>/${redshift:DBName}"]
     	}]
    }
    

  2. On the Roles page, choose Create role.
  3. For Role name, enter OneloginRedshiftCluster.
  4. For Role description, enter a description.
  5. For Trusted entities, choose Redshift.
  6. Choose Next: Permissions.
  7. Choose the policy you created earlier (OneloginCustomPolicy).
  8. Choose Create role.

In the next steps, we edit the trust relationships.

  1. On the Summary page for your role, choose Edit trust relationship.
  2. Add the following policy document:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::<account>:saml-provider/OneloginRedshift"
          },
          "Action": "sts:AssumeRoleWithSAML",
          "Condition": {
              "StringLike": {
                  "SAML:aud": "http://localhost:7890/redshift/"
              }
          }
        }
      ]
    }

Setting up your OneLogin application

In this step, you edit the OneLogin application configuration and parameters using the AWS roles created in the previous step.

  1. Go to your application in OneLogin and confirm Redshift Local Host URL is set to http://localhost:7890/redshift/.
  2. On the Parameters page, add the following fields:
Field Name Value Flags: Include in SAML Assertion
DbUser Email Default (checked)
Role

Enter the role created for Amazon Redshift access and the IDP ARN separated by a comma:

RoleARNforRedshiftCluster,RoleARNforIAMIDP.

For example, arn:aws:iam::4XXXXXXXX4:role/ OneloginRedshiftCluster,arn:aws:iam::4XXXXXXX X4:saml-provider/OneloginRedshift.

Default (checked)

RoleSessionName

 

Email Default (checked)
DbGroups

Choose your AD groups.

If no AD integration is in place, choose user roles with semicolon-delimited input. This is to handle users associated with multiple groups.

The following section shows how to create roles and attach them to users.

Associating user roles

If you don’t have an AD association in OneLogin and you need to authorize access using groups in Amazon Redshift, complete the following steps.

  1. On the OneLogin page, under Users, choose Roles.
  2. Choose New Role.
  3. Create new roles that correspond to the Amazon Redshift user groups. Make sure that the role names are lowercase.
  4. Add the Amazon Redshift JDBC ODBC application created earlier.
  5. Choose Save.

We associate users to the role we created earlier so we can map users to Amazon Redshift groups.

  1. We assign the Fred Taylor user to the marketing role and Joe Bloggs to the sales role.

These roles are used to assign the users to the appropriate groups when they log in. You can also add users automatically to roles by using rules.

  1. Go to user profile and check if the role is associated with the user.
    1. If it’s not selected, choose New Role and add the application.

In the next steps, we set up the JDBC and ODBC tools.

Setting up JDBC and ODBC connections

In this post, we use SQL Workbench to demonstrate the JDBC setup, but you can extend the solution to other JDBC-compliant tools.

  1. Download the Amazon Redshift driver and ensure that the driver version is 1.2.41 or above with SDK included.
  2. In SQL Workbench/J, on the Manage drivers page, create a new Amazon Redshift driver profile and point it to the file downloaded in the previous step.
  3. Create a connection to the Amazon Redshift cluster using the driver you downloaded.
  4. For URL, enter the URL in the following format: jdbc:redshift:iam://<clusterendpoint>:5439/dev.
  5. Leave Username and Password blank (they are federated from OneLogin).
  6. Select Save password.
  7. Choose Extended properties and add the following values:
    1. login_urlhttps://exampleinc.onelogin.com/trust/saml2/http-post/sso/613ac582-9999999999 (from OneLogin application setup)
    2. plugin_namecom.amazon.redshift.plugin.BrowserSamlCredentialsProvider
    3. idp_response_timeout15
  8. Choose Test or Connect to open the OneLogin page.
  9. Enter your corporate user name and password.

You should see the following message upon successful authentication: “Thank you for using Amazon Redshift! You can now close this window.”

  1. Navigate back to SQL Workbench and you should be connected to the Amazon Redshift cluster with the OneLogin user name and the role assigned to you in OneLogin.
  2. Verify the user name passed in via OneLogin by running the following SQL command:
    select current_user

You can now verify that the users have been associated with the correct groups. For our use case, Fred Taylor has access to the tables in the marketing schema only. The user Joe Bloggs has access to tables in the sales schema only. Using the Joe Bloggs user, you get the following results when trying to query data from each schema:

select productid from sales.monthly_sales


productid	
-------
7890
5654
2998
[…]


select * from marketing.employee


An error occurred when executing the SQL command:
select * from marketing.employee

[Amazon](500310) Invalid operation: permission denied for schema marketing;
 [SQL State=42501, DB Errorcode=500310]
1 statement failed.

For client tools that support ODBC, you can configure the ODBC driver to connect Redshift to integrate with OneLogin. In this post, we show ODBC connectivity using the command line tool isql and Python.

isql is an interactive ODBC test tool to test your DSNs for their connectivity to databases and run SQL statements when you’re connected to a database. It is installed with PSQL.

  1. Download and install the ODBC driver (use ODBC – macOS X driver version 1.4.16 or higher).
  2. On MacOS, the installation process installs the driver files in the following directories:
    /opt/amazon/redshift/lib
    /opt/amazon/redshift/ErrorMessages
    /opt/amazon/redshift/Setup

  3. Open the /usr/local/etc/odbc.ini directory and add Amazon Redshift DSN and Login_URL. See the following screenshot.
  4. After odbc.ini is set up, we connect using isql. On terminal, enter the following code:
    isql -v "Amazon Redshift ODBC DSN"

isql should open the browser window to ask for credentials (use your OneLogin credentials).

  1. We can also use Python3 to connect to Amazon Redshift using ODBC. See the following example code:
    import pyodbc
    cnxn = pyodbc.connect('DRIVER={/opt/amazon/redshift/lib/libamazonredshiftodbc.dylib};\
    SERVER=redshift-cluster-1.XXXXXXX.us-west-2.redshift.amazonaws.com;\
    plugin_name=BrowserSAML;\
    IAM=1;\
    AutoCreate=1;\
    Login_URL= https://exampleinc.onelogin.com/trust/saml2/http-post/sso/613ac582-9999999999;\
    IdP_Response_Timeout=15;\
    Database=dev;\
    Port=5439;')	
    cursor = cnxn.cursor()
    cursor.execute("SELECT current_date;")
    for row in cursor.fetchall():
    	print (row)
    

Summary

In this post, we demonstrated how to set up federated login to Amazon Redshift using OneLogin. We also showed how to pass along group membership within your IdP, enabling you to manage user access to Amazon Redshift resources from within your IdP.

If you have any questions or suggestions, please leave us a comment.


About the Authors

Veerendra Nayak is a Senior Database Solution Architect with Amazon Web Services.

 

 

 

 

Sam Selvan is a Senior Database Solution Architect with Amazon Web Services.

 

 

New – Redis 6 Compatibility for Amazon ElastiCache

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-redis-6-compatibility-for-amazon-elasticache/

After the last Redis 5.0 compatibility for Amazon ElastiCache, there has been lots of improvements to Amazon ElastiCache for Redis including upstream supports such as 5.0.6.

Earlier this year, we announced Global Datastore for Redis that lets you replicate a cluster in one region to clusters in up to two other regions. Recently we improved your ability to monitor your Redis fleet by enabling 18 additional engine and node-level CloudWatch metrics. Also, we added support for resource-level permission policies, allowing you to assign AWS Identity and Access Management (IAM) principal permissions to specific ElastiCache resource or resources.

Today, I am happy to announce Redis 6 compatibility to Amazon ElastiCache for Redis. This release brings several new and important features to Amazon ElastiCache for Redis:

  • Managed Role-Based Access Control – Amazon ElastiCache for Redis 6 now provides you with the ability to create and manage users and user groups that can be used to set up Role-Based Access Control (RBAC) for Redis commands. You can now simplify your architecture while maintaining security boundaries by having several applications use the same Redis cluster without being able to access each other’s data. You can also take advantage of granular access control and authorization to create administration and read-only user groups. Amazon ElastiCache enhances the new Access Control Lists (ACL) introduced in open source Redis 6 to provide a managed RBAC experience, making it easy to set up access control across several Amazon ElastiCache for Redis clusters.
  • Client-Side Caching – Amazon ElastiCache for Redis 6 comes with server-side enhancements to deliver efficient client-side caching to further improve your application performance. Redis clusters now support client-side caching by tracking client requests and sending invalidation messages for data stored on the client. In addition, you can also take advantage of a broadcast mode that allows clients to subscribe to a set of notifications from Redis clusters.
  • Significant Operational Improvements – This release also includes several enhancements that improve application availability and reliability. Specifically, Amazon ElastiCache has improved replication under low memory conditions, especially for workloads with medium/large sized keys, by reducing latency and the time it takes to perform snapshots. Open source Redis enhancements include improvements to expiry algorithm for faster eviction of expired keys and various bug fixes.

Note that open source Redis 6 also announced support for encryption-in-transit, a capability that is already available in Amazon ElastiCache for Redis 4.0.10 onwards. This release of Amazon ElastiCache for Redis 6 does not impact Amazon ElastiCache for Redis’ existing support for encryption-in-transit.

In order to apply RBAC to a new or existing Redis 6 cluster, we first need to ensure you have a user and user group created. We’ll review the process to do this below.

Using Role-Based Access Control – How it works
An alternative to Authenticating Users with the Redis AUTH Command, Amazon ElastiCache for Redis 6 offers Role-Based Access Control (RBAC). With RBAC, you create users and assign them specific permissions via an Access String.

If you want to create, modify, and delete users and user groups, you will need to select to the User Management and User Group Management sections in the ElastiCache console.

ElastiCache will automatically configure a default user with user ID and user name “default”, and then you can add it or new created users to new groups in User Group Management.

If you want to change the default user with your own password and access setting, you need to create a new user with the username set to “default” and can then swap it with the original default user. We recommend using your own strong password for a default user.

The following example shows how to swap the original default user with another default that has a modified access string via AWS CLI.

$ aws elasticache create-user \
 --user-id "new-default-user" \
 --user-name "default" \
 --engine "REDIS" \
 --passwords "a-str0ng-pa))word" \ 
 --access-string "off +get ~keys*"

Create a user group and add the user you created previously.

$ aws elasticache create-user-group \
  --user-group-id "new-default-group" \
  --engine "REDIS" \
  --user-ids "default"

Swap the new default user with the original default user.

$ aws elasticache modify-user-group \
    --user-group-id "new-default-group" \
    --user-ids-to-add "new-default-user" \
    --user-ids-to-remove "default"

Also, you can modify a user’s password or change its access permissions using modify-user command, or remove a specific user using delete-user command. It will be removed from any user groups to which it belongs.

Similarly you can modify a user group by adding new users and/or removing current users using modify-user-group command, or delete a user group using delete-user-group command. Note that the user group itself, not the users belonging to the group, will be deleted.

Once you have created a user group and added users, you can assign the user group to a replication group, or migrate between Redis AUTH and RBAC. For more information, see the documentation in detail.

Redis 6 cluster for ElastiCache – Getting Started
As usual, you can use the ElastiCache Console, CLI, APIs, or a CloudFormation template to create to new Redis 6 cluster. I’ll use the Console, choose Redis from the navigation pane and click Create with the following settings:

Select “Encryption in-transit” checkbox to ensure you can see the “Access Control” options. You can select an option of Access Control either User Group Access Control List by RBAC features or Redis AUTH default user. If you select RBAC, you can choose one of the available user groups.

My cluster is up and running within minutes. You can also use the in-place upgrade feature on existing cluster. By selecting the cluster, click Action and Modify. You can change the Engine Version from 5.0.6-compatible engine to 6.x.

Now Available
Amazon ElastiCache for Redis 6 is now available in all AWS regions. For a list of ElastiCache for Redis supported versions, refer to the documentation. Please send us feedback either in the AWS forum for Amazon ElastiCache or through AWS support, or your account team.

Channy;

10 additional AWS services authorized at DoD Impact Level 6 for the AWS Secret Region

Post Syndicated from Tyler Harding original https://aws.amazon.com/blogs/security/10-additional-aws-services-authorized-dod-impact-level-6-for-aws-secret-region/

The Defense Information Systems Agency (DISA) has authorized 10 additional AWS services in the AWS Secret Region for production workloads at the Department of Defense (DoD) Impact Level (IL) 6 under the DoD’s Cloud Computing Security Requirements Guide (DoD CC SRG). With this authorization at DoD IL 6, DoD Mission Owners can process classified and mission critical workloads for National Security Systems in the AWS Secret Region. The AWS Secret Region is available to the Department of Defense on the AWS’s GSA IT Multiple Award Schedule.

AWS successfully completed an independent evaluation by members of the Intelligence Community (IC) that confirmed AWS effectively implemented 859 security controls using applicable criteria from NIST SP 800-53 Rev 4, the DoD CC SRG, and the Committee on National Security Systems Instruction No. 1253 at the Moderate Confidentiality, Moderate Integrity, and Moderate Availability impact levels.

The 10 AWS services newly authorized by DISA at IL 6 provide additional choices for DoD Mission Owners to use the capabilities of the AWS Cloud in service areas such as compute and storage, management and developer tools, analytics, and networking. With the addition of these 10 newly authorized AWS services (listed with links below), AWS expands the capabilities for DoD Mission Owners to use a total of 36 services and features.

Compute and Storage:

Management and Developer Tools:

  • AWS Personal Health Dashboard: Monitor, manage, and optimize your AWS environment with a personalized view into the performance and availability of the AWS services underlying your AWS resources.
  • AWS Systems Manager: Automatically collect software inventory, apply OS patches, create system images, configure Windows and Linux operating systems, and seamlessly bridge your existing infrastructure with AWS.
  • AWS CodeDeploy: A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Lambda, and on-premises servers.

Analytics:

  • AWS Data Pipeline: Reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.

Networking:

  • AWS PrivateLink: Use secure private connectivity between Amazon Virtual Private Cloud (Amazon VPC), AWS services, and on-premises applications on the AWS network, and eliminate the exposure of data to the public internet.
  • AWS Transit Gateway: Easily connect Amazon VPC, AWS accounts, and on-premises networks to a single gateway.
Figure 1: 10 additional AWS services authorized at DoD Impact Level 6

Figure 1: 10 additional AWS services authorized at DoD Impact Level 6

Newly authorized AWS services and features at DoD Impact Level 6

  1. Amazon Elastic Container Registry (ECR)
  2. Amazon Elastic Container Service (ECS)
  3. AWS CodeDeploy
  4. AWS Data Pipeline
  5. AWS Lambda
  6. AWS Personal Health Dashboard
  7. AWS PrivateLink
  8. AWS Snowball Edge
  9. AWS Systems Manager
  10. AWS Transit Gateway

Existing authorized AWS services and features at DoD Impact Level 6

  1. Amazon CloudWatch
  2. Amazon DynamoDB (DDB)
  3. Amazon Elastic Block Store (EBS)
  4. Amazon Elastic Compute Cloud (EC2)
  5. Amazon Elastic Compute Cloud (EC2) – Auto Scaling
  6. Amazon Elastic Compute Cloud (EC2) – Elastic Load Balancing (ELB) (Classic and Application Load Balancer)
  7. Amazon ElastiCache
  8. Amazon Kinesis Data Streams
  9. Amazon Redshift
  10. Amazon S3 Glacier
  11. Amazon Simple Notification Service (SNS)
  12. Amazon Simple Queue Service (SQS)
  13. Amazon Simple Storage Service (S3)
  14. Amazon Simple Workflow (SWF)
  15. Amazon Virtual Private Cloud (VPC)
  16. AWS CloudFormation
  17. AWS CloudTrail
  18. AWS Config
  19. AWS Database Migration Service (DMS)
  20. AWS Direct Connect (Dx)
  21. AWS Identity and Access Management (IAM)
  22. AWS Key Management Service (KMS)
  23. Amazon Relational Database Service (RDS) (including MariaDB, MySQL, Oracle, Postgres, and SQL Server)
  24. AWS Snowball
  25. AWS Step Functions
  26. AWS Trusted Advisor

To learn more about AWS solutions for DoD, please see our AWS solution offerings. Follow the AWS Security Blog for future updates on our Services in Scope by Compliance Program page. If you have feedback about this post, let us know in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Tyler Harding

Tyler is the DoD Compliance Program Manager within AWS Security Assurance. He has over 20 years of experience providing information security solutions to federal civilian, DoD, and intelligence agencies.

Amazon SageMaker Continues to Lead the Way in Machine Learning and Announces up to 18% Lower Prices on GPU Instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-leads-way-in-machine-learning/

Since 2006, Amazon Web Services (AWS) has been helping millions of customers build and manage their IT workloads. From startups to large enterprises to public sector, organizations of all sizes use our cloud computing services to reach unprecedented levels of security, resiliency, and scalability. Every day, they’re able to experiment, innovate, and deploy to production in less time and at lower cost than ever before. Thus, business opportunities can be explored, seized, and turned into industrial-grade products and services.

As Machine Learning (ML) became a growing priority for our customers, they asked us to build an ML service infused with the same agility and robustness. The result was Amazon SageMaker, a fully managed service launched at AWS re:Invent 2017 that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly.

Today, Amazon SageMaker is helping tens of thousands of customers in all industry segments build, train and deploy high quality models in production: financial services (Euler Hermes, Intuit, Slice Labs, Nerdwallet, Root Insurance, Coinbase, NuData Security, Siemens Financial Services), healthcare (GE Healthcare, Cerner, Roche, Celgene, Zocdoc), news and media (Dow Jones, Thomson Reuters, ProQuest, SmartNews, Frame.io, Sportograf), sports (Formula 1, Bundesliga, Olympique de Marseille, NFL, Guiness Six Nations Rugby), retail (Zalando, Zappos, Fabulyst), automotive (Atlas Van Lines, Edmunds, Regit), dating (Tinder), hospitality (Hotels.com, iFood), industry and manufacturing (Veolia, Formosa Plastics), gaming (Voodoo), customer relationship management (Zendesk, Freshworks), energy (Kinect Energy Group, Advanced Microgrid Systems), real estate (Realtor.com), satellite imagery (Digital Globe), human resources (ADP), and many more.

When we asked our customers why they decided to standardize their ML workloads on Amazon SageMaker, the most common answer was: “SageMaker removes the undifferentiated heavy lifting from each step of the ML process.” Zooming in, we identified five areas where SageMaker helps them most.

#1 – Build Secure and Reliable ML Models, Faster
As many ML models are used to serve real-time predictions to business applications and end users, making sure that they stay available and fast is of paramount importance. This is why Amazon SageMaker endpoints have built-in support for load balancing across multiple AWS Availability Zones, as well as built-in Auto Scaling to dynamically adjust the number of provisioned instances according to incoming traffic.

For even more robustness and scalability, Amazon SageMaker relies on production-grade open source model servers such as TensorFlow Serving, the Multi-Model Server, and TorchServe. A collaboration between AWS and Facebook, TorchServe is available as part of the PyTorch project, and makes it easy to deploy trained models at scale without having to write custom code.

In addition to resilient infrastructure and scalable model serving, you can also rely on Amazon SageMaker Model Monitor to catch prediction quality issues that could happen on your endpoints. By saving incoming requests as well as outgoing predictions, and by comparing them to a baseline built from a training set, you can quickly identify and fix problems like missing features or data drift.

Says Aude Giard, Chief Digital Officer at Veolia Water Technologies: “In 8 short weeks, we worked with AWS to develop a prototype that anticipates when to clean or change water filtering membranes in our desalination plants. Using Amazon SageMaker, we built a ML model that learns from previous patterns and predicts the future evolution of fouling indicators. By standardizing our ML workloads on AWS, we were able to reduce costs and prevent downtime while improving the quality of the water produced. These results couldn’t have been realized without the technical experience, trust, and dedication of both teams to achieve one goal: an uninterrupted clean and safe water supply.” You can learn more in this video.

#2 – Build ML Models Your Way
When it comes to building models, Amazon SageMaker gives you plenty of options. You can visit AWS Marketplace, pick an algorithm or a model shared by one of our partners, and deploy it on SageMaker in just a few clicks. Alternatively, you can train a model using one of the built-in algorithms, or your own code written for a popular open source ML framework (TensorFlow, PyTorch, and Apache MXNet), or your own custom code packaged in a Docker container.

You could also rely on Amazon SageMaker AutoPilot, a game-changing AutoML capability. Whether you have little or no ML experience, or you’re a seasoned practitioner who needs to explore hundreds of datasets, SageMaker AutoPilot takes care of everything for you with a single API call. It automatically analyzes your dataset, figures out the type of problem you’re trying to solve, builds several data processing and training pipelines, trains them, and optimizes them for maximum accuracy. In addition, the data processing and training source code is available in auto-generated notebooks that you can review, and run yourself for further experimentation. SageMaker Autopilot also now creates machine learning models up to 40% faster with up to 200% higher accuracy, even with small and imbalanced datasets.

Another popular feature is Automatic Model Tuning. No more manual exploration, no more costly grid search jobs that run for days: using ML optimization, SageMaker quickly converges to high-performance models, saving you time and money, and letting you deploy the best model to production quicker.

NerdWallet relies on data science and ML to connect customers with personalized financial products“, says Ryan Kirkman, Senior Engineering Manager. “We chose to standardize our ML workloads on AWS because it allowed us to quickly modernize our data science engineering practices, removing roadblocks and speeding time-to-delivery. With Amazon SageMaker, our data scientists can spend more time on strategic pursuits and focus more energy where our competitive advantage is—our insights into the problems we’re solving for our users.” You can learn more in this case study.
Says Tejas Bhandarkar, Senior Director of Product, Freshworks Platform: “We chose to standardize our ML workloads on AWS because we could easily build, train, and deploy machine learning models optimized for our customers’ use cases. Thanks to Amazon SageMaker, we have built more than 30,000 models for 11,000 customers while reducing training time for these models from 24 hours to under 33 minutes. With SageMaker Model Monitor, we can keep track of data drifts and retrain models to ensure accuracy. Powered by Amazon SageMaker, Freddy AI Skills is constantly-evolving with smart actions, deep-data insights, and intent-driven conversations.

#3 – Reduce Costs
Building and managing your own ML infrastructure can be costly, and Amazon SageMaker is a great alternative. In fact, we found out that the total cost of ownership (TCO) of Amazon SageMaker over a 3-year horizon is over 54% lower compared to other options, and developers can be up to 10 times more productive. This comes from the fact that Amazon SageMaker manages all the training and prediction infrastructure that ML typically requires, allowing teams to focus exclusively on studying and solving the ML problem at hand.

Furthermore, Amazon SageMaker includes many features that help training jobs run as fast and as cost-effectively as possible: optimized versions of the most popular machine learning libraries, a wide range of CPU and GPU instances with up to 100GB networking, and of course Managed Spot Training which lets you save up to 90% on your training jobs. Last but not least, Amazon SageMaker Debugger automatically identifies complex issues developing in ML training jobs. Unproductive jobs are terminated early, and you can use model information captured during training to pinpoint the root cause.

Amazon SageMaker also helps you slash your prediction costs. Thanks to Multi-Model Endpoints, you can deploy several models on a single prediction endpoint, avoiding the extra work and cost associated with running many low-traffic endpoints. For models that require some hardware acceleration without the need for a full-fledged GPU, Amazon Elastic Inference lets you save up to 90% on your prediction costs. At the other end of the spectrum, large-scale prediction workloads can rely on AWS Inferentia, a custom chip designed by AWS, for up to 30% higher throughput and up to 45% lower cost per inference compared to GPU instances.

Lyft, one of the largest transportation networks in the United States and Canada, launched its Level 5 autonomous vehicle division in 2017 to develop a self-driving system to help millions of riders. Lyft Level 5 aggregates over 10 terabytes of data each day to train ML models for their fleet of autonomous vehicles. Managing ML workloads on their own was becoming time-consuming and expensive. Says Alex Bain, Lead for ML Systems at Lyft Level 5: “Using Amazon SageMaker distributed training, we reduced our model training time from days to couple of hours. By running our ML workloads on AWS, we streamlined our development cycles and reduced costs, ultimately accelerating our mission to deliver self-driving capabilities to our customers.

#4 – Build Secure and Compliant ML Systems
Security is always priority #1 at AWS. It’s particularly important to customers operating in regulated industries such as financial services or healthcare, as they must implement their solutions with the highest level of security and compliance. For this purpose, Amazon SageMaker implements many security features, making it compliant with the following global standards: SOC 1/2/3, PCI, ISO, FedRAMP, DoD CC SRG, IRAP, MTCS, C5, K-ISMS, ENS High, OSPAR, and HITRUST CSF. It’s also HIPAA BAA eligible.

Says Ashok Srivastava, Chief Data Officer, Intuit: “With Amazon SageMaker, we can accelerate our Artificial Intelligence initiatives at scale by building and deploying our algorithms on the platform. We will create novel large-scale machine learning and AI algorithms and deploy them on this platform to solve complex problems that can power prosperity for our customers.”

#5 – Annotate Data and Keep Humans in the Loop
As ML practitioners know, turning data into a dataset requires a lot of time and effort. To help you reduce both, Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to annotate and build highly accurate training datasets at any scale (text, image, video, and 3D point cloud datasets).

Says Magnus Soderberg, Director, Pathology Research, AstraZeneca: “AstraZeneca has been experimenting with machine learning across all stages of research and development, and most recently in pathology to speed up the review of tissue samples. The machine learning models first learn from a large, representative data set. Labeling the data is another time-consuming step, especially in this case, where it can take many thousands of tissue sample images to train an accurate model. AstraZeneca uses Amazon SageMaker Ground Truth, a machine learning-powered, human-in-the-loop data labeling and annotation service to automate some of the most tedious portions of this work, resulting in reduction of time spent cataloging samples by at least 50%.

Amazon SageMaker is Evaluated
The hundreds of new features added to Amazon SageMaker since launch are testimony to our relentless innovation on behalf of customers. In fact, the service was highlighted in February 2020 as the overall leader in Gartner’s Cloud AI Developer Services Magic Quadrant. Gartner subscribers can click here to learn more about why we have an overall score of 84/100 in their “Solution Scorecard for Amazon SageMaker, July 2020”, the highest rating among our peer group. According to Gartner, we met 87% of required criteria, 73% of preferred, and 85% of optional.

Announcing a Price Reduction on GPU Instances

To thank our customers for their trust and to show our continued commitment to make Amazon SageMaker the best and most cost-effective ML service, I’m extremely happy to announce a significant price reduction on all ml.p2 and ml.p3 GPU instances. It will apply starting October 1st for all SageMaker components and across the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), EU (London), Canada (Central), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Tokyo), Asia Pacific (Mumbai), and AWS GovCloud (US-Gov-West).

Instance Name Price Reduction
ml.p2.xlarge -11%
ml.p2.8xlarge -14%
ml.p2.16xlarge -18%
ml.p3.2xlarge -11%
ml.p3.8xlarge -14%
ml.p3.16xlarge -18%
ml.p3dn.24xlarge -18%

Getting Started with Amazon SageMaker
As you can see, there are a lot of exciting features in Amazon SageMaker, and I encourage you to try them out! Amazon SageMaker is available worldwide, so chances are you can easily get to work on your own datasets. The service is part of the AWS Free Tier, letting new users work with it for free for hundreds of hours during the first two months.

If you’d like to kick the tires, this tutorial will get you started in minutes. You’ll learn how to use SageMaker Studio to build, train, and deploy a classification model based on the XGBoost algorithm.

Last but not least, I just published a book named “Learn Amazon SageMaker“, a 500-page detailed tour of all SageMaker features, illustrated by more than 60 original Jupyter notebooks. It should help you get up to speed in no time.

As always, we’re looking forward to your feedback. Please share it with your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Field Notes: Powering the Connected Vehicle with Amazon Alexa

Post Syndicated from Amit Kumar original https://aws.amazon.com/blogs/architecture/field-notes-powering-the-connected-vehicle-with-amazon-alexa/

Alexa has improved the in-home experience and has potential to greatly enhance the in-car experience. This blog is a continuation of my previous blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT. Multiple OEMs (Original Equipment Manufacturers) have showcased this capability during CES 2020. Use cases include; a person seating at the rear seat can play a song, control HVAC (Heating, ventilation, and air conditioning), pay for gas/coffee, all while using Alexa. In this blog, I cover how you create a connected vehicle using Alexa, to initiate a command, such as; ‘Alexa, open my trunk’.

Solution Architecture

“Alexa, open my trunk”

The preceding architecture shows a message flowing in the following example:

  1. A user of a connected vehicle wants to open their trunk using an Alexa voice command. Alexa will identify the right intent based on utterances and invoke a Lambda function. The Lambda function updates the device shadow with (desired {““trunk””: ““open””}).
  2. Vehicle TCU registered the callback function shadowRegisterDeltaCallback(). Listen to delta topics for the device shadow by subscribing to delta topics. Whenever there is a difference between the desired and reported state, the registered callback will be called.  The delta payload will be available in the callback. Update performed in #1 will be received in delta callback.
  3. Now, the vehicle must act on the desired state. In this case, it acts on the trunk status change. After performing the required action for the trunk change, the vehicle TCU will update the device shadow with the reported state (reported : { “trunk”: “open”} )
  4. The web/mobile app subscribed to the topic $aws/things/tcu/shadow/update/accepted”. Therefore, as soon as the vehicle TCU updates the shadow, the Web/Mobile app received the update and synchronized the UI state.

As part of the previous blog, we implemented #2, #3 and #4. Lets implement #1 and incorporate into the solution.

The source code (vehicle-command) of this blog is available in this code repository.

The Alexa voice command required the implementation of three key areas:

  1. Configure Alexa – which will listen to utterances and identify the right intent and invoke a Lambda function.
  2. Set up the Lambda function – which will interpret the command and invoke the AWS IoT Core device shadow API.
  3. Handle Command at Vehicle tcu and App – Vehicle tcu must register shadowRegisterDeltaCallback so any update in the device shadow will receive a call message to perform the  actual command by the vehicle and synchronize the state with a web/mobile app.

Let’s ‘Open a trunk’ using Alexa voice command. First set up the environment:

  • Open AWS Cloud9 IDE created in an earlier lab and run the following command:

Set up permanent credentials. Note: Alexa doesn’t work with temporary credentials.  Configure it with permanent credentials for ASK command line interface (CLI).

  1. Open Cloud9 Preferences by clicking AWS Cloud9 > Preference or  by clicking on the “gear” icon in the upper right corner of the  Cloud9 window
  2. Select “AWS  Settings”
  3. Disable “AWS  managed temporary credentials”
  4. $ aws  configure
  5. Enter the Access Key  and Secret Access Key of a user that has required access credentials
  6. Use us-east-1 as the region. It will store in ~/.aws/config

Verify that everything worked by examining the file ~/.aws/credentials. It should resemble the following:

[default]
 aws_access_key_id = <access_key>
 aws_secret_access_key = <secrect_key>
 aws_session_token=

*Remove aws_session_token line from credentials file.

Next, install the Alexa CLI:

$ npm install ask-cli --global

Initialize ASK CLI by issuing the following command. This will initialize the ASK CLI with a profile associated with your Amazon developer credentials.

$ ask configure --no-browser

Check you are linking AWS account with Alexa:

Do you want to link your AWS account in order to host your Alexa skills? Yes

#At the end output should look as follows:

------------------------- Initialization Complete -------------------------
Here is the summary for the profile setup:
ASK Profile: default
AWS Profile: default
Vendor ID: MXXXXXXXXXX

As part of the previous blog, you have already cloned the following git repository in AWS Cloud9 IDE. It has a baseline code to jump start.

$ git clone

Configure Alexa Skills

The Alexa Developer console GUI can be used but we are doing it programmatically so it can be done at scale and allows versioning.

1. Open connected-vehicle-lab/vehicle-command/skill-package/skill.json . We have 2 locale en-US, en-IN are defined in the base code for Alexa command. Let’s add en-GB locale in the json file located at “manifest”/”publishingInformation”/”locales”.  Similarly, you can add locale for your preferred language:

"en-GB": {
"name": "vehicle-command",
"summary": "Control Vehicle using voice command",
"description": "Allow you to control vehicle using voice command",
"examplePhrases": [
    "Alexa open genie",
    "ask genie to lower window",
    "window up"
    ],
"keywords": []
}

If you are inserting into the middle then make sure it is separated by a comma.

2. Let’s create a copy of models connected-vehicle-lab/vehicle-command/skill-package/interactionModels/custom/en-US.json and rename it to en-GB.json and add our intent

  • We have “invocationName”: “genie”.  Here, we  are using “genie” as a command to invoke our Alexa skill. You  can change if needed
  • The key elements in this json file is intent, slots, sample utterance and slot types. Let’s define the  slot types t_action_type for ‘open’, ‘close’, ‘lock’, ‘unlock’. under “types”: [].
        {
        "name": "t_action_type",
        "values": [
            {
                "name": {
                "value": "unlock"
                }
            },
            {
                "name": {
                "value": "lock"
                }
            },
            {
                "name": {
                "value": "close"
                }
            },
            {
                "name": {
                "value": "open"
                }
            }
          ]
        }
  • Let’s add intent under “intents”: [] for trunk  ‘TrunkCommandIntent’ and define the sample utterance speech like ‘lock my trunk’,  ‘open trunk’. We are using slot types to simplify the utterance and  understand the operation requested by a user.
        {
            "name": "TrunkCommandIntent",
            "slots": [
            {
                "name": "t_action",
                "type": "t_action_type"
            }
            ],
            "samples": [
                "{t_action} trunk",
                "trunk {t_action}",
                "{t_action} my trunk",
                "{t_action} trunk"
            ]
}
  • Now add the same intent, slots, slot type and sample utterances  for other locales files (en-US.json and en-IN.json) as well.

3. Let’s add response message under languageString.js (available at /connected-vehicle-lab/vehicle-command/lambda/custom).

TRUNK_OPEN: 'Trunk Open',
TRUNK_CLOSE: 'Trunk Close' 

If you are inserting into the middle then make sure it is separated by a comma.

Set up the Lambda function

1. Add a Lambda function which will get invoked by Alexa. This Lambda function will handle  the intent and invoke IoT Core Device Shadow API and execute the actual command of ‘Trunk open/unlock or lock/close’.

  • Open /connected-vehicle-lab/vehicle-command/lambda/custom/index.js  and add our TrunkCommandIntent
const TrunkCommandIntentHandler = {
                canHandle(handlerInput) {
                return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
                && Alexa.getIntentName(handlerInput.requestEnvelope) === 'TrunkCommandIntent';
                },
                    handle(handlerInput) {
                    var t_action_value = handlerInput.requestEnvelope.request.intent.slots.t_action.value;
                    console.log(t_action_value);
                    var speakOutput;
                    const obj = "trunk";
                    if (t_action_value == "lock" || t_action_value == "open")
                    {
                        updateDeviceShadow(obj, "open");
                        speakOutput = handlerInput.t('TRUNK_OPEN')
                    }
                    else 
                    {
                        updateDeviceShadow(obj, "close");
                        speakOutput = handlerInput.t('TRUNK_CLOSE')
                    } 
                    console.log(speakOutput);
                    return handlerInput.responseBuilder
                    .speak(speakOutput)
                    //.reprompt('add a reprompt if you want to keep the session open for the user to respond')
                    .getResponse();
                }
            };
  • We have  UpdateDeviceShadow(“vehicle_part”, “command”) function  which actually invokes the IoT core Device Shadow API
 function updateDeviceShadow (obj, command)
    {
        shadowMessage.state.desired[obj] = command;
        var iotdata = new AWS.IotData({endpoint: ioT_EndPoint});
        var params = {
        payload: JSON.stringify(shadowMessage) , /* required */
        thingName: deviceName /* required */ 
        };
        iotdata.updateThingShadow(params, function(err, data) {
            if (err) 
            console.log(err, err.stack); // an error occurred
            else 
            console.log(data); 
            //reset the shadow 
            shadowMessage.state.desired = {}
        });
} 

2. Update the value of ioT_EndPoint from AWS IoT Core > Settings > Custom Endpoint

3.  Add Trunk CommandIntent in request handler

exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        WindowCommandIntentHandler,
        DoorCommandIntentHandler,
        TrunkCommandIntentHandler,

4. Deploy Alexa Skills

$ cd ~/environment/connected-vehicle-lab/vehicle-command
$ ask deploy 

Handle Command at Vehicle tcu and App

For more detail on this section, refer to part 1 of this blog: Field Notes: Implementing a Digital Shadow of a Connected Vehicle with AWS IoT.

@ Vehicle tcu – tcuShadowRead.py has trunk_handle() function to receive a message from device shadow

def trunk_handle(status):
  if status is not None:
    shadowClient.reportedShadowMessage['state']['reported']['trunk'] = status
    print ('Perform action on trunk status change : ' + str(status))

@web App – demo-car/js/websocket.js has handleTrunkCommand function receive callback message as soon any update happened on Device Shadow

//this function will be called by onMessageArrive
function handleTrunkCommand(trunkStatus) {
    obj = document.getElementsByClassName("action trunk")[0];
    obj.checked = trunkStatus == "open" ? true : false;
    console.log(obj.getAttribute("data-text") + " : " + obj.checked);
}

demo-car/js/demo-car.js has handleTrunkCommand function to handle UI input and invoke IoT Core Device Gateway API to update the desired state.

//this function will be called when user will click on trunk checkbox
    handleTrunkCommand: function(obj) {
        obj.checked ? demoCar.shadowMessage.state.desired.trunk = "open" : demoCar.shadowMessage.state.desired.trunk = "close";
        console.log(obj.getAttribute("data-text") + " : " + demoCar.shadowMessage.state.desired.trunk);
        demoCar.accessIoTDevice();
    },

Use Alexa skill to invoke a command

Let’s test or command ‘Alexa, open my trunk’. We can use a command line and execute:

$ask dialog --locale "en-GB" 

Using Alexa GUI, provides an interesting visualization, as shown in the following screenshot.

  1. Open the Alexa GUI,  Select ‘vehicle command’ skill and select test tab. Allow “developer.amazon.com” to use your microphone?
  2. Open a demo.html web app side by side of the Alexa GUI to check an actual operation happened at the Vehicle tcu and synchronize the  status with virtual car model.
  3. Now test the Alexa skill. You can use an audio command as well. You can ask or write ‘ask genie’.

Alexa developer console

Clean Up

What a fun exploration this has been! Now clean up AWS resources created for this and the previous post to avoid incurring any future AWS services costs. Resources created by CDK can be deleted by deleting the stack on the CloudFormation console. Resources created manually need to be deleted individually.

Conclusion

In this blog post, I showed how you can enable voice command for a connected vehicle and enhance in-vehicle user experience.  Similarly, you can also extend this solution for the use cases like Alexa ‘open my garage’. AWS IoT Core Device Shadow API does all the heavy-lifting in this case. Any update in device shadow allows both device and user application to act. Alexa skill is acting as an interface to capture the user command and invoke the lambda function.

Since these are all serverless services, that means this implementation can scale without making any change in the application and you only pay when someone invokes a command. Creating an engaging, high-quality interaction with Alexa in the vehicle is critical. You can refer to Alexa Automotive Documentation for an Alexa Built-in automotive experience.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close