Tag Archives: directories

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

Congratulations to Oracle on MySQL 8.0

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html

Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.

I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git
I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing.

Now it’s was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld –defaults-file=/tmp/my.cnf –install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again,

> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn’t work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition.
….
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um”
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld –skip-grant-tables

> ./client/mysql –socket=/tmp/mysql.sock –user=root
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

Apparently –skip-grant-tables is not good enough anymore. Let’s try again with:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –socket=/tmp/mysql.sock –user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix “root”  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER’ root’@’localhost’ identified by ‘test’  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”;
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”;

Now restart and test worked:

> ./mysqld –default_authentication_plugin=mysql_native_password

>./client/mysql –user=root –password=”” mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld –help –verbose again. I noticed the option:

–initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure

2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option.

Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’.

> ./mysqld –defaults-file=/tmp/my.cnf

> ./client/mysql –user=root –password=”” mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning 🙁

To get things to work with old clients, one has to initialize the database with:
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the “test” database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the ‘test’ database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests –user=root

The first run failed in test-ATIS:

Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1

This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

———————————–
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
———————————–

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn’t have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:

  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.

Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:

  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don’t remove features/options/functions that are used)
  • Minimize reserved words, don’t remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.

Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done).

According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

How to Delegate Administration of Your AWS Managed Microsoft AD Directory to Your On-Premises Active Directory Users

Post Syndicated from Vijay Sharma original https://aws.amazon.com/blogs/security/how-to-delegate-administration-of-your-aws-managed-microsoft-ad-directory-to-your-on-premises-active-directory-users/

You can now enable your on-premises users administer your AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD. Using an Active Directory (AD) trust and the new AWS delegated AD security groups, you can grant administrative permissions to your on-premises users by managing group membership in your on-premises AD directory. This simplifies how you manage who can perform administration. It also makes it easier for your administrators because they can sign in to their existing workstation with their on-premises AD credential to administer your AWS Managed Microsoft AD.

AWS created new domain local AD security groups (AWS delegated groups) in your AWS Managed Microsoft AD directory. Each AWS delegated group has unique AD administrative permissions. Users that are members in the new AWS delegated groups get permissions to perform administrative tasks, such as add users, configure fine-grained password policies and enable Microsoft enterprise Certificate Authority. Because the AWS delegated groups are domain local in scope, you can use them through an AD Trust to your on-premises AD. This eliminates the requirement to create and use separate identities to administer your AWS Managed Microsoft AD. Instead, by adding selected on-premises users to desired AWS delegated groups, you can grant your administrators some or all of the permissions. You can simplify this even further by adding on-premises AD security groups to the AWS delegated groups. This enables you to add and remove users from your on-premises AD security group so that they can manage administrative permissions in your AWS Managed Microsoft AD.

In this blog post, I will show you how to delegate permissions to your on-premises users to perform an administrative task–configuring fine-grained password policies–in your AWS Managed Microsoft AD directory. You can follow the steps in this post to delegate other administrative permissions, such as configuring group Managed Service Accounts and Kerberos constrained delegation, to your on-premises users.

Background

Until now, AWS Managed Microsoft AD delegated administrative permissions for your directory by creating AD security groups in your Organization Unit (OU) and authorizing these AWS delegated groups for common administrative activities. The admin user in your directory created user accounts within your OU, and granted these users permissions to administer your directory by adding them to one or more of these AWS delegated groups.

However, if you used your AWS Managed Microsoft AD with a trust to an on-premises AD forest, you couldn’t add users from your on-premises directory to these AWS delegated groups. This is because AWS created the AWS delegated groups with global scope, which restricts adding users from another forest. This necessitated that you create different user accounts in AWS Managed Microsoft AD for the purpose of administration. As a result, AD administrators typically had to remember additional credentials for AWS Managed Microsoft AD.

To address this, AWS created new AWS delegated groups with domain local scope in a separate OU called AWS Delegated Groups. These new AWS delegated groups with domain local scope are more flexible and permit adding users and groups from other domains and forests. This allows your admin user to delegate your on-premises users and groups administrative permissions to your AWS Managed Microsoft AD directory.

Note: If you already have an existing AWS Managed Microsoft AD directory containing the original AWS delegated groups with global scope, AWS preserved the original AWS delegated groups in the event you are currently using them with identities in AWS Managed Microsoft AD. AWS recommends that you transition to use the new AWS delegated groups with domain local scope. All newly created AWS Managed Microsoft AD directories have the new AWS delegated groups with domain local scope only.

Now, I will show you the steps to delegate administrative permissions to your on-premises users and groups to configure fine-grained password policies in your AWS Managed Microsoft AD directory.

Prerequisites

For this post, I assume you are familiar with AD security groups and how security group scope rules work. I also assume you are familiar with AD trusts.

The instructions in this blog post require you to have the following components running:

Solution overview

I will now show you how to manage which on-premises users have delegated permissions to administer your directory by efficiently using on-premises AD security groups to manage these permissions. I will do this by:

  1. Adding on-premises groups to an AWS delegated group. In this step, you sign in to management instance connected to AWS Managed Microsoft AD directory as admin user and add on-premises groups to AWS delegated groups.
  2. Administer your AWS Managed Microsoft AD directory as on-premises user. In this step, you sign in to a workstation connected to your on-premises AD using your on-premises credentials and administer your AWS Managed Microsoft AD directory.

For the purpose of this blog, I already have an on-premises AD directory (in this case, on-premises.com). I also created an AWS Managed Microsoft AD directory (in this case, corp.example.com) that I use with Amazon RDS for SQL Server. To enable Integrated Windows Authentication to my on-premises.com domain, I established a one-way outgoing trust from my AWS Managed Microsoft AD directory to my on-premises AD directory. To administer my AWS Managed Microsoft AD, I created an Amazon EC2 for Windows Server instance (in this case, Cloud Management). I also have an on-premises workstation (in this case, On-premises Management), that is connected to my on-premises AD directory.

The following diagram represents the relationships between the on-premises AD and the AWS Managed Microsoft AD directory.

The left side represents the AWS Cloud containing AWS Managed Microsoft AD directory. I connected the directory to the on-premises AD directory via a 1-way forest trust relationship. When AWS created my AWS Managed Microsoft AD directory, AWS created a group called AWS Delegated Fine Grained Password Policy Administrators that has permissions to configure fine-grained password policies in AWS Managed Microsoft AD.

The right side of the diagram represents the on-premises AD directory. I created a global AD security group called On-premises fine grained password policy admins and I configured it so all members can manage fine grained password policies in my on-premises AD. I have two administrators in my company, John and Richard, who I added as members of On-premises fine grained password policy admins. I want to enable John and Richard to also manage fine grained password policies in my AWS Managed Microsoft AD.

While I could add John and Richard to the AWS Delegated Fine Grained Password Policy Administrators individually, I want a more efficient way to delegate and remove permissions for on-premises users to manage fine grained password policies in my AWS Managed Microsoft AD. In fact, I want to assign permissions to the same people that manage password policies in my on-premises directory.

Diagram showing delegation of administrative permissions to on-premises users

To do this, I will:

  1. As admin user, add the On-premises fine grained password policy admins as member of the AWS Delegated Fine Grained Password Policy Administrators security group from my Cloud Management machine.
  2. Manage who can administer password policies in my AWS Managed Microsoft AD directory by adding and removing users as members of the On-premises fine grained password policy admins. Doing so enables me to perform all my delegation work in my on-premises directory without the need to use a remote desktop protocol (RDP) session to my Cloud Management instance. In this case, Richard, who is a member of On-premises fine grained password policy admins group can now administer AWS Managed Microsoft AD directory from On-premises Management workstation.

Although I’m showing a specific case using fine grained password policy delegation, you can do this with any of the new AWS delegated groups and your on-premises groups and users.

Let’s get started.

Step 1 – Add on-premises groups to AWS delegated groups

In this step, open an RDP session to the Cloud Management instance and sign in as the admin user in your AWS Managed Microsoft AD directory. Then, add your users and groups from your on-premises AD to AWS delegated groups in AWS Managed Microsoft AD directory. In this example, I do the following:

  1. Sign in to the Cloud Management instance with the user name admin and the password that you set for the admin user when you created your directory.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view and navigate to corp.example.com > AWS Delegated Groups. Right-click AWS Delegated Fine Grained Password Policy Administrators and select Properties.
  4. In the AWS Delegated Fine Grained Password Policy window, switch to Members tab and choose Add.
  5. In the Select Users, Contacts, Computers, Service Accounts, or Groups window, choose Locations.
  6. In the Locations window, select on-premises.com domain and choose OK.
  7. In the Enter the object names to select box, enter on-premises fine grained password policy admins and choose Check Names.
  8. Because I have a 1-way trust from AWS Managed Microsoft AD to my on-premises AD, Windows prompts me to enter credentials for an on-premises user account that has permissions to complete the search. If I had a 2-way trust and the admin account in my AWS Managed Microsoft AD has permissions to read my on-premises directory, Windows will not prompt me.In the Windows Security window, enter the credentials for an account with permissions for on-premises.com and choose OK.
  9. Click OK to add On-premises fine grained password policy admins group as a member of the AWS Delegated Fine Grained Password Policy Administrators group in your AWS Managed Microsoft AD directory.

At this point, any user that is a member of On-premises fine grained password policy admins group has permissions to manage password policies in your AWS Managed Microsoft AD directory.

Step 2 – Administer your AWS Managed Microsoft AD as on-premises user

Any member of the on-premises group(s) that you added to an AWS delegated group inherited the permissions of the AWS delegated group.

In this example, Richard signs in to the On-premises Management instance. Because Richard inherited permissions from Delegated Fine Grained Password Policy Administrators, he can now administer fine grained password policies in the AWS Managed Microsoft AD directory using on-premises credentials.

  1. Sign in to the On-premises Management instance as Richard.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view, right-click Active Directory Users and Computers, and then select Change Domain.
  4. In the Change Domain window, enter corp.example.com, and then choose OK.
  5. You’ll be connected to your AWS Managed Microsoft AD domain:

Richard can now administer the password policies. Because John is also a member of the AWS delegated group, John can also perform password policy administration the same way.

In future, if Richard moves to another division within the company and you hire Judy as a replacement for Richard, you can simply remove Richard from On-premises fine grained password policy admins group and add Judy to this group. Richard will no longer have administrative permissions, while Judy can now administer password policies for your AWS Managed Microsoft AD directory.

Summary

We’ve tried to make it easier for you to administer your AWS Managed Microsoft AD directory by creating AWS delegated groups with domain local scope. You can add your on-premises AD groups to the AWS delegated groups. You can then control who can administer your directory by managing group membership in your on-premises AD directory. Your administrators can sign in to their existing on-premises workstations using their on-premises credentials and administer your AWS Managed Microsoft AD directory. I encourage you to explore the new AWS delegated security groups by using Active Directory Users and Computers from the management instance for your AWS Managed Microsoft AD. To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions, please post them on the Directory Service forum. If you have comments about this post, submit them in the “Comments” section below.

 

FilePursuit Finds Amazing Files All Year Round, Not Just at Christmas

Post Syndicated from Andy original https://torrentfreak.com/filepursuit-finds-amazing-files-all-year-round-not-just-at-christmas-171225/

Ask someone to name a search engine and it’s likely that 95 out of 100 will say ‘Google’. There are plenty of others, of course, but its sheer dominance means that even giants like Bing have to wait around for a mention.

However, if people are looking for something special, such as video and music files, for example, there’s an interesting search engine that’s largely flying under the radar. FilePursuit, accessible via the web or directly from its dedicated Android apps, is somewhat of a revelation.

What FilePursuit does is trawl the Internet looking for web servers that are not only packed with content but are readily accessible to the outside world. This means that a search on the site invariably turns up treasure troves of material, all of it for immediate and direct HTTP download.

TorrentFreak caught up with the operator of the site who himself is a very interesting character.

“I’m a 21-year-old undergrad student from New Delhi, India, currently studying engineering. I started this file search engine project all by myself to learn web development and this is my first project,” he informs TF.

“I picked this project because I was surprised to find that there are lots of ‘open directory’ websites and no one is maintaining any type of record or database on them. There are thousands of ‘open directory’ websites containing a lot of amazing stuff not discovered yet, so I made them discoverable.”

Plenty of files from almost any search

FilePursuit began its life around September 2016 and since then has been receiving website submission requests (sites to be indexed by FilePursuit) from people all over the world. As such the platform is somewhat of a community effort but in respect of running the operation, it’s all done by one man.

“FilePursuit saves time in two ways: by eliminating the need to find file manually, and by performing searches at high speeds efficiently. Without this, you would have to look at sites one by one and pore over the contents of each carefully – a tedious prospect,” he explains.

“FilePursuit automatically compares your criteria to billions of webpages and gives you results in a fraction of a second. You can perform hundreds of searches in the course of a few minutes, altering the criteria as you narrow down results.”

So if Google dominates the search space, why doesn’t it do a better job of finding files than the relatively low-key FilePursuit? Its operator says it’s all about functionality.

“FilePursuit is a file search engine, it generates file links as results while other search engines give out webpages as results. However, it’s possible to search for file links directly from Google too but it’s limited to documents only. On FilePursuit you can search for almost any filetype just by selecting ‘custom’ and typing filetype in search results.”

Of course, it would be impossible for FilePursuit to find any files if webmasters and server operators didn’t leave them open to the public. Considering it’s simplicity itself to find all the latest movies and TV shows widely accessible, is this a question of stupidity, kindness, carelessness, or something else?

“In my opinion, most people are unaware that they have created an open directory and on the other hand some people want to share interesting files from their servers, which is very generous of them,” FilePursuit’s creator says.

When carrying out searches it really is amazing what FilePursuit can turn up. Files lead to directory results and some can contain many thousands of files, from every music artist one can think of through to otherwise private text files that people really should take more care over. Other things are really quite odd.

“When I look for ‘open directory’ websites, sometimes I find really amazing stuff and sometimes even bizarre stuff too. This one time, I found a collection of funeral recordings,” FilePursuit’s owner says.

While even funeral recordings can have a copyright owner somewhere, it’s the more regular mainstream content that’s most easily found with the service. The site doesn’t carry any copyrighted content at all but that doesn’t mean it’s unresponsive to takedown demands.

“I have more than three million file links indexed in my database so it can be a bit hard for me to check for copyrighted content. Although whenever I receive a mail from copyright holders or someone representing copyright holders, I always uphold their request of deleting the file link from my database and also explain to them that the file link they requested me to delete, that particular file may still exist.”

In recent months, FilePursuit has enjoyed a significant upsurge in traffic but it’s still a relatively small player in the search engine space with around 7,000 to 10,000 hits per day. However, this clever site is able to deal with five times that traffic and upgrading servers to cope with surges can be carried out in two to three minutes, “at most.”

So the big question remains – What will you find under the tree today?

FilePursuit website here, Android apps (free, pro)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

How to Easily Apply Amazon Cloud Directory Schema Changes with In-Place Schema Upgrades

Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-easily-apply-amazon-cloud-directory-schema-changes-with-in-place-schema-upgrades/

Now, Amazon Cloud Directory makes it easier for you to apply schema changes across your directories with in-place schema upgrades. Your directory now remains available while Cloud Directory applies backward-compatible schema changes such as the addition of new fields. Without migrating data between directories or applying code changes to your applications, you can upgrade your schemas. You also can view the history of your schema changes in Cloud Directory by using version identifiers, which help you track and audit schema versions across directories. If you have multiple instances of a directory with the same schema, you can view the version history of schema changes to manage your directory fleet and ensure that all directories are running with the same schema version.

In this blog post, I demonstrate how to perform an in-place schema upgrade and use schema versions in Cloud Directory. I add additional attributes to an existing facet and add a new facet to a schema. I then publish the new schema and apply it to running directories, upgrading the schema in place. I also show how to view the version history of a directory schema, which helps me to ensure my directory fleet is running the same version of the schema and has the correct history of schema changes applied to it.

Note: I share Java code examples in this post. I assume that you are familiar with the AWS SDK and can use Java-based code to build a Cloud Directory code example. You can apply the concepts I cover in this post to other programming languages such as Python and Ruby.

Cloud Directory fundamentals

I will start by covering a few Cloud Directory fundamentals. If you are already familiar with the concepts behind Cloud Directory facets, schemas, and schema lifecycles, you can skip to the next section.

Facets: Groups of attributes. You use facets to define object types. For example, you can define a device schema by adding facets such as computers, phones, and tablets. A computer facet can track attributes such as serial number, make, and model. You can then use the facets to create computer objects, phone objects, and tablet objects in the directory to which the schema applies.

Schemas: Collections of facets. Schemas define which types of objects can be created in a directory (such as users, devices, and organizations) and enforce validation of data for each object class. All data within a directory must conform to the applied schema. As a result, the schema definition is essentially a blueprint to construct a directory with an applied schema.

Schema lifecycle: The four distinct states of a schema: Development, Published, Applied, and Deleted. Schemas in the Published and Applied states have version identifiers and cannot be changed. Schemas in the Applied state are used by directories for validation as applications insert or update data. You can change schemas in the Development state as many times as you need them to. In-place schema upgrades allow you to apply schema changes to an existing Applied schema in a production directory without the need to export and import the data populated in the directory.

How to add attributes to a computer inventory application schema and perform an in-place schema upgrade

To demonstrate how to set up schema versioning and perform an in-place schema upgrade, I will use an example of a computer inventory application that uses Cloud Directory to store relationship data. Let’s say that at my company, AnyCompany, we use this computer inventory application to track all computers we give to our employees for work use. I previously created a ComputerSchema and assigned its version identifier as 1. This schema contains one facet called ComputerInfo that includes attributes for SerialNumber, Make, and Model, as shown in the following schema details.

Schema: ComputerSchema
Version: 1

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer
Attribute: Make, type: String
Attribute: Model, type: String

AnyCompany has offices in Seattle, Portland, and San Francisco. I have deployed the computer inventory application for each of these three locations. As shown in the lower left part of the following diagram, ComputerSchema is in the Published state with a version of 1. The Published schema is applied to SeattleDirectory, PortlandDirectory, and SanFranciscoDirectory for AnyCompany’s three locations. Implementing separate directories for different geographic locations when you don’t have any queries that cross location boundaries is a good data partitioning strategy and gives your application better response times with lower latency.

Diagram of ComputerSchema in Published state and applied to three directories

Legend for the diagrams in this post

The following code example creates the schema in the Development state by using a JSON file, publishes the schema, and then creates directories for the Seattle, Portland, and San Francisco locations. For this example, I assume the schema has been defined in the JSON file. The createSchema API creates a schema Amazon Resource Name (ARN) with the name defined in the variable, SCHEMA_NAME. I can use the putSchemaFromJson API to add specific schema definitions from the JSON file.

// The utility method to get valid Cloud Directory schema JSON
String validJson = getJsonFile("ComputerSchema_version_1.json")

String SCHEMA_NAME = "ComputerSchema";

String developmentSchemaArn = client.createSchema(new CreateSchemaRequest()
        .withName(SCHEMA_NAME))
        .getSchemaArn();

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(validJson));

The following code example takes the schema that is currently in the Development state and publishes the schema, changing its state to Published.

String SCHEMA_VERSION = "1";
String publishedSchemaArn = client.publishSchema(
        new PublishSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withVersion(SCHEMA_VERSION))
        .getPublishedSchemaArn();

// Our Published schema ARN is as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1

The following code example creates a directory named SeattleDirectory and applies the published schema. The createDirectory API call creates a directory by using the published schema provided in the API parameters. Note that Cloud Directory stores a version of the schema in the directory in the Applied state. I will use similar code to create directories for PortlandDirectory and SanFranciscoDirectory.

String DIRECTORY_NAME = "SeattleDirectory"; 

CreateDirectoryResult directory = client.createDirectory(
        new CreateDirectoryRequest()
        .withName(DIRECTORY_NAME)
        .withSchemaArn(publishedSchemaArn));

String directoryArn = directory.getDirectoryArn();
String appliedSchemaArn = directory.getAppliedSchemaArn();

// This code section can be reused to create directories for Portland and San Francisco locations with the appropriate directory names

// Our directory ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX

// Our applied schema ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

Revising a schema

Now let’s say my company, AnyCompany, wants to add more information for computers and to track which employees have been assigned a computer for work use. I modify the schema to add two attributes to the ComputerInfo facet: Description and OSVersion (operating system version). I make Description optional because it is not important for me to track this attribute for the computer objects I create. I make OSVersion mandatory because it is critical for me to track it for all computer objects so that I can make changes such as applying security patches or making upgrades. Because I make OSVersion mandatory, I must provide a default value that Cloud Directory will apply to objects that were created before the schema revision, in order to handle backward compatibility. Note that you can replace the value in any object with a different value.

I also add a new facet to track computer assignment information, shown in the following updated schema as the ComputerAssignment facet. This facet tracks these additional attributes: Name (the name of the person to whom the computer is assigned), EMail (the email address of the assignee), Department, and department CostCenter. Note that Cloud Directory refers to the previously available version identifier as the Major Version. Because I can now add a minor version to a schema, I also denote the changed schema as Minor Version A.

Schema: ComputerSchema
Major Version: 1
Minor Version: A 

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer 
Attribute: Make, type: String
Attribute: Model, type: Integer
Attribute: Description, type: String, required: NOT_REQUIRED
Attribute: OSVersion, type: String, required: REQUIRED_ALWAYS, default: "Windows 7"

Facet: ComputerAssignment
Attribute: Name, type: String
Attribute: EMail, type: String
Attribute: Department, type: String
Attribute: CostCenter, type: Integer

The following diagram shows the changes that were made when I added another facet to the schema and attributes to the existing facet. The highlighted area of the diagram (bottom left) shows that the schema changes were published.

Diagram showing that schema changes were published

The following code example revises the existing Development schema by adding the new attributes to the ComputerInfo facet and by adding the ComputerAssignment facet. I use a new JSON file for the schema revision, and for the purposes of this example, I am assuming the JSON file has the full schema including planned revisions.

// The utility method to get a valid CloudDirectory schema JSON
String schemaJson = getJsonFile("ComputerSchema_version_1_A.json")

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(
        new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(schemaJson));

Upgrading the Published schema

The following code example performs an in-place schema upgrade of the Published schema with schema revisions (it adds new attributes to the existing facet and another facet to the schema). The upgradePublishedSchema API upgrades the Published schema with backward-compatible changes from the Development schema.

// From an earlier code example, I know the publishedSchemaArn has this value: "arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1"

// Upgrade publishedSchemaArn to minorVersion A. The Development schema must be backward compatible with 
// the existing publishedSchemaArn. 

String minorVersion = "A"

UpgradePublishedSchemaResult upgradePublishedSchemaResult = client.upgradePublishedSchema(new UpgradePublishedSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withPublishedSchemaArn(publishedSchemaArn)
        .withMinorVersion(minorVersion));

String upgradedPublishedSchemaArn = upgradePublishedSchemaResult.getUpgradedSchemaArn();

// The Published schema ARN after the upgrade shows a minor version as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1/A

Upgrading the Applied schema

The following diagram shows the in-place schema upgrade for the SeattleDirectory directory. I am performing the schema upgrade so that I can reflect the new schemas in all three directories. As a reminder, I added new attributes to the ComputerInfo facet and also added the ComputerAssignment facet. After the schema and directory upgrade, I can create objects for the ComputerInfo and ComputerAssignment facets in the SeattleDirectory. Any objects that were created with the old facet definition for ComputerInfo will now use the default values for any additional attributes defined in the new schema.

Diagram of the in-place schema upgrade for the SeattleDirectory directory

I use the following code example to perform an in-place upgrade of the SeattleDirectory to a Major Version of 1 and a Minor Version of A. Note that you should change a Major Version identifier in a schema to make backward-incompatible changes such as changing the data type of an existing attribute or dropping a mandatory attribute from your schema. Backward-incompatible changes require directory data migration from a previous version to the new version. You should change a Minor Version identifier in a schema to make backward-compatible upgrades such as adding additional attributes or adding facets, which in turn may contain one or more attributes. The upgradeAppliedSchema API lets me upgrade an existing directory with a different version of a schema.

// This upgrades ComputerSchema version 1 of the Applied schema in SeattleDirectory to Major Version 1 and Minor Version A
// The schema must be backward compatible or the API will fail with IncompatibleSchemaException

UpgradeAppliedSchemaResult upgradeAppliedSchemaResult = client.upgradeAppliedSchema(new UpgradeAppliedSchemaRequest()
        .withDirectoryArn(directoryArn)
        .withPublishedSchemaArn(upgradedPublishedSchemaArn));

String upgradedAppliedSchemaArn = upgradeAppliedSchemaResult.getUpgradedSchemaArn();

// The Applied schema ARN after the in-place schema upgrade will appear as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

// This code section can be reused to upgrade directories for the Portland and San Francisco locations with the appropriate directory ARN

Note: Cloud Directory has excluded returning the Minor Version identifier in the Applied schema ARN for backward compatibility and to enable the application to work across older and newer versions of the directory.

The following diagram shows the changes that are made when I perform an in-place schema upgrade in the two remaining directories, PortlandDirectory and SanFranciscoDirectory. I make these calls sequentially, upgrading PortlandDirectory first and then upgrading SanFranciscoDirectory. I use the same code example that I used earlier to upgrade SeattleDirectory. Now, all my directories are running the most current version of the schema. Also, I made these schema changes without having to migrate data and while maintaining my application’s high availability.

Diagram showing the changes that are made with an in-place schema upgrade in the two remaining directories

Schema revision history

I can now view the schema revision history for any of AnyCompany’s directories by using the listAppliedSchemaArns API. Cloud Directory maintains the five most recent versions of applied schema changes. Similarly, to inspect the current Minor Version that was applied to my schema, I use the getAppliedSchemaVersion API. The listAppliedSchemaArns API returns the schema ARNs based on my schema filter as defined in withSchemaArn.

I use the following code example to query an Applied schema for its version history.

// This returns the five most recent Minor Versions associated with a Major Version
ListAppliedSchemaArnsResult listAppliedSchemaArnsResult = client.listAppliedSchemaArns(new ListAppliedSchemaArnsRequest()
        .withDirectoryArn(directoryArn)
        .withSchemaArn(upgradedAppliedSchemaArn));

// Note: The listAppliedSchemaArns API without the SchemaArn filter returns all the Major Versions in a directory

The listAppliedSchemaArns API returns the two ARNs as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

The following code example queries an Applied schema for current Minor Version by using the getAppliedSchemaVersion API.

// This returns the current Applied schema's Minor Version ARN 

GetAppliedSchemaVersion getAppliedSchemaVersionResult = client.getAppliedSchemaVersion(new GetAppliedSchemaVersionRequest()
	.withSchemaArn(upgradedAppliedSchemaArn));

The getAppliedSchemaVersion API returns the current Applied schema ARN with a Minor Version, as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

If you have a lot of directories, schema revision API calls can help you audit your directory fleet and ensure that all directories are running the same version of a schema. Such auditing can help you ensure high integrity of directories across your fleet.

Summary

You can use in-place schema upgrades to make changes to your directory schema as you evolve your data set to match the needs of your application. An in-place schema upgrade allows you to maintain high availability for your directory and applications while the upgrade takes place. For more information about in-place schema upgrades, see the in-place schema upgrade documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread in the Directory Service forum or contact AWS Support.

– Mahendra

 

NSA "Red Disk" Data Leak

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/nsa_red_disk_da.html

ZDNet is reporting about another data leak, this one from US Army’s Intelligence and Security Command (INSCOM), which is also within to the NSA.

The disk image, when unpacked and loaded, is a snapshot of a hard drive dating back to May 2013 from a Linux-based server that forms part of a cloud-based intelligence sharing system, known as Red Disk. The project, developed by INSCOM’s Futures Directorate, was slated to complement the Army’s so-called distributed common ground system (DCGS), a legacy platform for processing and sharing intelligence, surveillance, and reconnaissance information.

[…]

Red Disk was envisioned as a highly customizable cloud system that could meet the demands of large, complex military operations. The hope was that Red Disk could provide a consistent picture from the Pentagon to deployed soldiers in the Afghan battlefield, including satellite images and video feeds from drones trained on terrorists and enemy fighters, according to a Foreign Policy report.

[…]

Red Disk was a modular, customizable, and scalable system for sharing intelligence across the battlefield, like electronic intercepts, drone footage and satellite imagery, and classified reports, for troops to access with laptops and tablets on the battlefield. Marking files found in several directories imply the disk is “top secret,” and restricted from being shared to foreign intelligence partners.

A couple of points. One, this isn’t particularly sensitive. It’s an intelligence distribution system under development. It’s not raw intelligence. Two, this doesn’t seem to be classified data. Even the article hedges, using the unofficial term of “highly sensitive.” Three, it doesn’t seem that Chris Vickery, the researcher that discovered the data, has published it.

Chris Vickery, director of cyber risk research at security firm UpGuard, found the data and informed the government of the breach in October. The storage server was subsequently secured, though its owner remains unknown.

This doesn’t feel like a big deal to me.

Slashdot thread.

UI Testing at Scale with AWS Lambda

Post Syndicated from Stas Neyman original https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

This is a guest blog post by Wes Couch and Kurt Waechter from the Blackboard Internal Product Development team about their experience using AWS Lambda.

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Here’s how we did it.

The backstory:

Blackboard is a global leader in delivering robust and innovative education software and services to clients in higher education, government, K12, and corporate training. We have a large product development team working across the globe in at least 10 different time zones, with an internal tools team providing support for quality and workflows. We have been using Selenium Webdriver to perform automated cross-browser UI testing since 2007. Because we are now practicing continuous delivery, the automated UI testing challenge has grown due to the faster release schedule. On top of that, every commit made to each branch triggers an execution of our automated UI test suite. If you have ever implemented an automated UI testing infrastructure, you know that it can be very challenging to scale and maintain. Although there are services that are useful for testing different browser/OS combinations, they don’t meet our scale needs.

It used to take three hours to synchronously run our functional UI suite, which revealed the obvious need for parallel execution. Previously, we used Mesos to orchestrate a Selenium Grid Docker container for each test run. This way, we were able to run eight concurrent threads for test execution, which took an average of 16 minutes. Although this setup is fine for a single workflow, the cracks started to show when we reached the scale required for Blackboard’s mature product lines. Going beyond eight concurrent sessions on a single container introduced performance problems that impact the reliability of tests (for example, issues in Webdriver or the browser popping up frequently). We tried Mesos and considered Kubernetes for Selenium Grid orchestration, but the answer to scaling a Selenium Grid was to think smaller, not larger. This led to our breakthrough with AWS Lambda.

The solution:

We started using AWS Lambda for UI testing because it doesn’t require costly infrastructure or countless man hours to maintain. The steps we outline in this blog post took one work day, from inception to implementation. By simply packaging the UI test suite into a Lambda function, we can execute these tests in parallel on a massive scale. We use a custom JUnit test runner that invokes the Lambda function with a request to run each test from the suite. The runner then aggregates the results returned from each Lambda test execution.

Selenium is the industry standard for testing UI at scale. Although there are other options to achieve the same thing in Lambda, we chose this mature suite of tools. Selenium is backed by Google, Firefox, and others to help the industry drive their browsers with code. This makes Lambda and Selenium a compelling stack for achieving UI testing at scale.

Making Chrome Run in Lambda

Currently, Chrome for Linux will not run in Lambda due to an absent mount point. By rebuilding Chrome with a slight modification, as Marco Lüthy originally demonstrated, you can run it inside Lambda anyway! It took about two hours to build the current master branch of Chromium to build on a c4.4xlarge. Unfortunately, the current version of ChromeDriver, 2.33, does not support any version of Chrome above 62, so we’ll be using Marco’s modified version of version 60 for the near future.

Required System Libraries

The Lambda runtime environment comes with a subset of common shared libraries. This means we need to include some extra libraries to get Chrome and ChromeDriver to work. Anything that exists in the java resources folder during compile time is included in the base directory of the compiled jar file. When this jar file is deployed to Lambda, it is placed in the /var/task/ directory. This allows us to simply place the libraries in the java resources folder under a folder named lib/ so they are right where they need to be when the Lambda function is invoked.

To get these libraries, create an EC2 instance and choose the Amazon Linux AMI.

Next, use ssh to connect to the server. After you connect to the new instance, search for the libraries to find their locations.

sudo find / -name libgconf-2.so.4
sudo find / -name libORBit-2.so.0

Now that you have the locations of the libraries, copy these files from the EC2 instance and place them in the java resources folder under lib/.

Packaging the Tests

To deploy the test suite to Lambda, we used a simple Gradle tool called ShadowJar, which is similar to the Maven Shade Plugin. It packages the libraries and dependencies inside the jar that is built. Usually test dependencies and sources aren’t included in a jar, but for this instance we want to include them. To include the test dependencies, add this section to the build.gradle file.

shadowJar {
   from sourceSets.test.output
   configurations = [project.configurations.testRuntime]
}

Deploying the Test Suite

Now that our tests are packaged with the dependencies in a jar, we need to get them into a running Lambda function. We use  simple SAM  templates to upload the packaged jar into S3, and then deploy it to Lambda with our settings.

{
   "AWSTemplateFormatVersion": "2010-09-09",
   "Transform": "AWS::Serverless-2016-10-31",
   "Resources": {
       "LambdaTestHandler": {
           "Type": "AWS::Serverless::Function",
           "Properties": {
               "CodeUri": "./build/libs/your-test-jar-all.jar",
               "Runtime": "java8",
               "Handler": "com.example.LambdaTestHandler::handleRequest",
               "Role": "<YourLambdaRoleArn>",
               "Timeout": 300,
               "MemorySize": 1536
           }
       }
   }
}

We use the maximum timeout available to ensure our tests have plenty of time to run. We also use the maximum memory size because this ensures our Lambda function can support Chrome and other resources required to run a UI test.

Specifying the handler is important because this class executes the desired test. The test handler should be able to receive a test class and method. With this information it will then execute the test and respond with the results.

public LambdaTestResult handleRequest(TestRequest testRequest, Context context) {
   LoggerContainer.LOGGER = new Logger(context.getLogger());
  
   BlockJUnit4ClassRunner runner = getRunnerForSingleTest(testRequest);
  
   Result result = new JUnitCore().run(runner);

   return new LambdaTestResult(result);
}

Creating a Lambda-Compatible ChromeDriver

We provide developers with an easily accessible ChromeDriver for local test writing and debugging. When we are running tests on AWS, we have configured ChromeDriver to run them in Lambda.

To configure ChromeDriver, we first need to tell ChromeDriver where to find the Chrome binary. Because we know that ChromeDriver is going to be unzipped into the root task directory, we should point the ChromeDriver configuration at that location.

The settings for getting ChromeDriver running are mostly related to Chrome, which must have its working directories pointed at the tmp/ folder.

Start with the default DesiredCapabilities for ChromeDriver, and then add the following settings to enable your ChromeDriver to start in Lambda.

public ChromeDriver createLambdaChromeDriver() {
   ChromeOptions options = new ChromeOptions();

   // Set the location of the chrome binary from the resources folder
   options.setBinary("/var/task/chrome");

   // Include these settings to allow Chrome to run in Lambda
   options.addArguments("--disable-gpu");
   options.addArguments("--headless");
   options.addArguments("--window-size=1366,768");
   options.addArguments("--single-process");
   options.addArguments("--no-sandbox");
   options.addArguments("--user-data-dir=/tmp/user-data");
   options.addArguments("--data-path=/tmp/data-path");
   options.addArguments("--homedir=/tmp");
   options.addArguments("--disk-cache-dir=/tmp/cache-dir");
  
   DesiredCapabilities desiredCapabilities = DesiredCapabilities.chrome();
   desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
  
   return new ChromeDriver(desiredCapabilities);
}

Executing Tests in Parallel

You can approach parallel test execution in Lambda in many different ways. Your approach depends on the structure and design of your test suite. For our solution, we implemented a custom test runner that uses reflection and JUnit libraries to create a list of test cases we want run. When we have the list, we create a TestRequest object to pass into the Lambda function that we have deployed. In this TestRequest, we place the class name, test method, and the test run identifier. When the Lambda function receives this TestRequest, our LambdaTestHandler generates and runs the JUnit test. After the test is complete, the test result is sent to the test runner. The test runner compiles a result after all of the tests are complete. By executing the same Lambda function multiple times with different test requests, we can effectively run the entire test suite in parallel.

To get screenshots and other test data, we pipe those files during test execution to an S3 bucket under the test run identifier prefix. When the tests are complete, we link the files to each test execution in the report generated from the test run. This lets us easily investigate test executions.

Pro Tip: Dynamically Loading Binaries

AWS Lambda has a limit of 250 MB of uncompressed space for packaged Lambda functions. Because we have libraries and other dependencies to our test suite, we hit this limit when we tried to upload a function that contained Chrome and ChromeDriver (~140 MB). This test suite was not originally intended to be used with Lambda. Otherwise, we would have scrutinized some of the included libraries. To get around this limit, we used the Lambda functions temporary directory, which allows up to 500 MB of space at runtime. Downloading these binaries at runtime moves some of that space requirement into the temporary directory. This allows more room for libraries and dependencies. You can do this by grabbing Chrome and ChromeDriver from an S3 bucket and marking them as executable using built-in Java libraries. If you take this route, be sure to point to the new location for these executables in order to create a ChromeDriver.

private static void downloadS3ObjectToExecutableFile(String key) throws IOException {
   File file = new File("/tmp/" + key);

   GetObjectRequest request = new GetObjectRequest("s3-bucket-name", key);

   FileUtils.copyInputStreamToFile(s3client.getObject(request).getObjectContent(), file);
   file.setExecutable(true);
}

Lambda-Selenium Project Source

We have compiled an open source example that you can grab from the Blackboard Github repository. Grab the code and try it out!

https://blackboard.github.io/lambda-selenium/

Conclusion

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Thanks to AWS Lambda, we can reduce our build times and perform automated UI testing at scale!

How to Enable Caching for AWS CodeBuild

Post Syndicated from Karthik Thirugnanasambandam original https://aws.amazon.com/blogs/devops/how-to-enable-caching-for-aws-codebuild/

AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.

A typical application build process includes phases like preparing the environment, updating the configuration, downloading dependencies, running unit tests, and finally, packaging the built artifact.

Downloading dependencies is a critical phase in the build process. These dependent files can range in size from a few KBs to multiple MBs. Because most of the dependent files do not change frequently between builds, you can noticeably reduce your build time by caching dependencies.

In this post, I will show you how to enable caching for AWS CodeBuild.

Requirements

  • Create an Amazon S3 bucket for storing cache archives (You can use existing s3 bucket as well).
  • Create a GitHub account (if you don’t have one).

Create a sample build project:

1. Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/.

2. If a welcome page is displayed, choose Get started.

If a welcome page is not displayed, on the navigation pane, choose Build projects, and then choose Create project.

3. On the Configure your project page, for Project name, type a name for this build project. Build project names must be unique across each AWS account.

4. In Source: What to build, for Source provider, choose GitHub.

5. In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild.

  • For Operating system, choose Ubuntu.
  • For Runtime, choose Java.
  • For Version,  choose aws/codebuild/java:openjdk-8.
  • For Build specification, select Insert build commands.

Note: The build specification file (buildspec.yml) can be configured in two ways. You can package it along with your source root directory, or you can override it by using a project environment configuration. In this example, I will use the override option and will use the console editor to specify the build specification.

6. Under Build commands, click Switch to editor to enter the build specification.

Copy the following text.

version: 0.2

phases:
  build:
    commands:
      - mvn install
      
cache:
  paths:
    - '/root/.m2/**/*'

Note: The cache section in the build specification instructs AWS CodeBuild about the paths to be cached. Like the artifacts section, the cache paths are relative to $CODEBUILD_SRC_DIR and specify the directories to be cached. In this example, Maven stores the downloaded dependencies to the /root/.m2/ folder, but other tools use different folders. For example, pip uses the /root/.cache/pip folder, and Gradle uses the /root/.gradle/caches folder. You might need to configure the cache paths based on your language platform.

7. In Artifacts: Where to put the artifacts from this build project:

  • For Type, choose No artifacts.

8. In Cache:

  • For Type, choose Amazon S3.
  • For Bucket, choose your S3 bucket.
  • For Path prefix, type cache/archives/

9. In Service role, the Create a service role in your account option will display a default role name.  You can accept the default name or type your own.

If you already have an AWS CodeBuild service role, choose Choose an existing service role from your account.

10. Choose Continue.

11. On the Review page, to run a build, choose Save and build.

Review build and cache behavior:

Let us review our first build for the project.

In the first run, where no cache exists, overall build time would look something like below (notice the time for DOWNLOAD_SOURCE, BUILD and POST_BUILD):

If you check the build logs, you will see log entries for dependency downloads. The dependencies are downloaded directly from configured external repositories. At the end of the log, you will see an entry for the cache uploaded to your S3 bucket.

Let’s review the S3 bucket for the cached archive. You’ll see the cache from our first successful build is uploaded to the configured S3 path.

Let’s try another build with the same CodeBuild project. This time the build should pick up the dependencies from the cache.

In the second run, there was a cache hit (cache was generated from the first run):

You’ll notice a few things:

  1. DOWNLOAD_SOURCE took slightly longer. Because, in addition to the source code, this time the build also downloaded the cache from user’s s3 bucket.
  2. BUILD time was faster. As the dependencies didn’t need to get downloaded, but were reused from cache.
  3. POST_BUILD took slightly longer, but was relatively the same.

Overall, build duration was improved with cache.

Best practices for cache

  • By default, the cache archive is encrypted on the server side with the customer’s artifact KMS key.
  • You can expire the cache by manually removing the cache archive from S3. Alternatively, you can expire the cache by using an S3 lifecycle policy.
  • You can override cache behavior by updating the project. You can use the AWS CodeBuild the AWS CodeBuild console, AWS CLI, or AWS SDKs to update the project. You can also invalidate cache setting by using the new InvalidateProjectCache API. This API forces a new InvalidationKey to be generated, ensuring that future builds receive an empty cache. This API does not remove the existing cache, because this could cause inconsistencies with builds currently in flight.
  • The cache can be enabled for any folders in the build environment, but we recommend you only cache dependencies/files that will not change frequently between builds. Also, to avoid unexpected application behavior, don’t cache configuration and sensitive information.

Conclusion

In this blog post, I showed you how to enable and configure cache setting for AWS CodeBuild. As you see, this can save considerable build time. It also improves resiliency by avoiding external network connections to an artifact repository.

I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.

Backing Up the Modern Enterprise with Backblaze for Business

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/endpoint-backup-solutions/

Endpoint backup diagram

Organizations of all types and sizes need reliable and secure backup. Whether they have as few as 3 or as many as 300,000 computer users, an organization’s computer data is a valuable business asset that needs to be protected.

Modern organizations are changing how they work and where they work, which brings new challenges to making sure that company’s data assets are not only available, but secure. Larger organizations have IT departments that are prepared to address these needs, but often times in smaller and newer organizations the challenge falls upon office management who might not be as prepared or knowledgeable to face a work environment undergoing dramatic changes.

Whether small or large, local or world-wide, for-profit or non-profit, organizations need a backup strategy and solution that matches the new ways of working in the enterprise.

The Enterprise Has Changed, and So Has Data Use

More and more, organizations are working in the cloud. These days organizations can operate just fine without their own file servers, database servers, mail servers, or other IT infrastructure that used to be standard for all but the smallest organization.

The reality is that for most organizations, though, it’s a hybrid work environment, with a combination of cloud-based and PC and Macintosh-based applications. Legacy apps aren’t going away any time soon. They will be with us for a while, with their accompanying data scattered amongst all the desktops, laptops and other endpoints in corporate headquarters, home offices, hotel rooms, and airport waiting areas.

In addition, the modern workforce likely combines regular full-time employees, remote workers, contractors, and sometimes interns, volunteers, and other temporary workers who also use company IT assets.

The Modern Enterprise Brings New Challenges for IT

These changes in how enterprises work present a problem for anyone tasked with making sure that data — no matter who uses it or where it lives — is adequately backed-up. Cloud-based applications, when properly used and managed, can be adequately backed up, provided that users are connected to the internet and data transfers occur regularly — which is not always the case. But what about the data on the laptops, desktops, and devices used by remote employees, contractors, or just employees whose work keeps them on the road?

The organization’s backup solution must address all the needs of the modern organization or enterprise using both cloud and PC and Mac-based applications, and not be constrained by employee or computer location.

A Ten-Point Checklist for the Modern Enterprise for Backing Up

What should the modern enterprise look for when evaluating a backup solution?

1) Easy to deploy to workers’ computers

Whether installed by the computer user or an IT person locally or remotely, the backup solution must be easy to implement quickly with minimal demands on the user or administrator.

2) Fast and unobtrusive client software

Backups should happen in the background by efficient (native) PC and Macintosh software clients that don’t consume valuable processing power or take memory away from applications the user needs.

3) Easy to configure

The backup solutions must be easy to configure for both the user and the IT professional. Ease-of-use means less time to deploy, configure, and manage.

4) Defaults to backing up all valuable data

By default, the solution backs up commonly used files and folders or directories, including desktops. Some backup solutions are difficult and intimidating because they require that the user chose what needs to be backed up, often missing files and folders/directories that contain valuable data.

5) Works automatically in the background

Backups should happen automatically, no matter where the computer is located. The computer user, especially the remote or mobile one, shouldn’t be required to attach cables or drives, or remember to initiate backups. A working solution backs up automatically without requiring action by the user or IT administrator.

6) Data restores are fast and easy

Whether it’s a single file, directory, or an entire system that must be restored, a user or IT sysadmin needs to be able to restore backed up data as quickly as possible. In cases of large restores to remote locations, the ability to send a restore via physical media is a must.

7) No limitations on data

Throttling, caps, and data limits complicate backups and require guesses about how much storage space will be needed.

8) Safe & Secure

Organizations require that their data is secure during all phases of initial upload, storage, and restore.

9) Easy-to-manage

The backup solution needs to provide a clear and simple web management interface for all functions. Designing for ease-of-use leads to efficiency in management and operation.

10) Affordable and transparent pricing

Backup costs should be predictable, understandable, and without surprises.

Two Scenarios for the Modern Enterprise

Enterprises exist in many forms and types, but wanting to meet the above requirements is common across all of them. Below, we take a look at two common scenarios showing how enterprises face these challenges. Three case studies are available that provide more information about how Backblaze customers have succeeded in these environments.

Enterprise Profile 1

The needs of a smaller enterprise differ from those of larger, established organizations. This organization likely doesn’t have anyone who is devoted full-time to IT. The job of on-boarding new employees and getting them set up with a computer likely falls upon an executive assistant or office manager. This person might give new employees a checklist with the software and account information and lets users handle setting up the computer themselves.

Organizations in this profile need solutions that are easy to install and require little to no configuration. Backblaze, by default, backs up all user data, which lets the organization be secure in knowing all the data will be backed up to the cloud — including files left on the desktop. Combined with Backblaze’s unlimited data policy, organizations have a truly “set it and forget it” platform.

Customizing Groups To Meet Teams’ Needs

The Groups feature of Backblaze for Business allows an organization to decide whether an individual client’s computer will be Unmanaged (backups and restores under the control of the worker), or Managed, in which an administrator can monitor the status and frequency of backups and handle restores should they become necessary. One group for the entire organization might be adequate at this stage, but the organization has the option to add additional groups as it grows and needs more flexibility and control.

The organization, of course, has the choice of managing and monitoring users using Groups. With Backblaze’s Groups, organizations can set user-based access rules, which allows the administrator to create restores for lost files or entire computers on an employee’s behalf, to centralize billing for all client computers in the organization, and to redeploy a recovered computer or new computer with the backed up data.

Restores

In this scenario, the decision has been made to let each user manage her own backups, including restores, if necessary, of individual files or entire systems. If a restore of a file or system is needed, the restore process is easy enough for the user to handle it by herself.

Case Study 1

Read about how PagerDuty uses Backblaze for Business in a mixed enterprise of cloud and desktop/laptop applications.

PagerDuty Case Study

In a common approach, the employee can retrieve an accidentally deleted file or an earlier version of a document on her own. The Backblaze for Business interface is easy to navigate and was designed with feedback from thousands of customers over the course of a decade.

In the event of a lost, damaged, or stolen laptop,  administrators of Managed Groups can  initiate the restore, which could be in the form of a download of a restore ZIP file from the web management console, or the overnight shipment of a USB drive directly to the organization or user.

Enterprise Profile 2

This profile is for an organization with a full-time IT staff. When a new worker joins the team, the IT staff is tasked with configuring the computer and delivering it to the new employee.

Backblaze for Business Groups

Case Study 2

Global charitable organization charity: water uses Backblaze for Business to back up workers’ and volunteers’ laptops as they travel to developing countries in their efforts to provide clean and safe drinking water.

charity: water Case Study

This organization can take advantage of additional capabilities in Groups. A Managed Group makes sense in an organization with a geographically dispersed work force as it lets IT ensure that workers’ data is being regularly backed up no matter where they are. Billing can be company-wide or assigned to individual departments or geographical locations. The organization has the choice of how to divide the organization into Groups (location, function, subsidiary, etc.) and whether the Group should be Managed or Unmanaged. Using Managed Groups might be suitable for most of the organization, but there are exceptions in which sensitive data might dictate using an Unmanaged Group, such as could be the case with HR, the executive team, or finance.

Deployment

By Invitation Email, Link, or Domain

Backblaze for Business allows a number of options for deploying the client software to workers’ computers. Client installation is fast and easy on both Windows and Macintosh, so sending email invitations to users or automatically enrolling users by domain or invitation link, is a common approach.

By Remote Deployment

IT might choose to remotely and silently deploy Backblaze for Business across specific Groups or the entire organization. An administrator can silently deploy the Backblaze backup client via the command-line, or use common RMM (Remote Monitoring and Management) tools such as Jamf and Munki.

Restores

Case Study 3

Read about how Bright Bear Technology Solutions, an IT Managed Service Provider (MSP), uses the Groups feature of Backblaze for Business to manage customer backups and restores, deploy Backblaze licenses to their customers, and centralize billing for all their client-based backup services.

Bright Bear Case Study

Some organizations are better equipped to manage or assist workers when restores become necessary. Individual users will be pleased to discover they can roll-back files to an earlier version if they wish, but IT will likely manage any complete system restore that involves reconfiguring a computer after a repair or requisitioning an entirely new system when needed.

This organization might chose to retain a client’s entire computer backup for archival purposes, using Backblaze B2 as the cloud storage solution. This is another advantage of having a cloud storage provider that combines both endpoint backup and cloud object storage among its services.

The Next Step: Server Backup & Data Archiving with B2 Cloud Storage

As organizations grow, they have increased needs for cloud storage beyond Macintosh and PC data backup. Backblaze’s object cloud storage, Backblaze B2, provides low-cost storage and archiving of records, media, and server data that can grow with the organization’s size and needs.

B2 Cloud Storage is available through the same Backblaze management console as Backblaze Computer Backup. This means that Admins have one console for billing, monitoring, deployment, and role provisioning. B2 is priced at 1/4 the cost of Amazon S3, or $0.005 per month per gigabyte (which equals $5/month per terabyte).

Why Modern Enterprises Chose Backblaze

Backblaze for Business

Businesses and organizations select Backblaze for Business for backup because Backblaze is designed to meet the needs of the modern enterprise. Backblaze customers are part of a a platform that has a 10+ year track record of innovation and over 400 petabytes of customer data already under management.

Backblaze’s backup model is proven through head-to-head comparisons to back up data that other backup solutions overlook in their default configurations — including valuable files that are needed after an accidental deletion, theft, or computer failure.

Backblaze is the only enterprise-level backup company that provides TOTP (Time-based One-time Password) via both SMS and Authentication app to all accounts at no incremental charge. At just $50/year/computer, Backblaze is affordable for any size of enterprise.

Modern Enterprises can Meet The Challenge of The Changing Data Environment

With the right backup solution and strategy, the modern enterprise will be prepared to ensure that its data is protected from accident, disaster, or theft, whether its data is in one office or dispersed among many locations, and remote and mobile employees.

Backblaze for Business is an affordable solution that enables organizations to meet the evolving data demands facing the modern enterprise.

The post Backing Up the Modern Enterprise with Backblaze for Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

dirsearch – Website Directory Scanner For Files & Structure

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/10/dirsearch-website-directory-scanner-files-structure/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

dirsearch – Website Directory Scanner For Files & Structure

dirsearch is a Python-based command-line website directory scanner designed to brute force site structure including directories and files in websites.

dirsearch Website Directory Scanner Features

dirsearch supports the following:

  • Multithreaded
  • Keep alive connections
  • Support for multiple extensions (-e|–extensions asp,php)
  • Reporting (plain text, JSON)
  • Heuristically detects invalid web pages
  • Recursive brute forcing
  • HTTP proxy support
  • User agent randomization
  • Batch processing
  • Request delaying

dirsearch Web Directory Structure Scanner & Wordlists

Dictionaries must be text files.

Read the rest of dirsearch – Website Directory Scanner For Files & Structure now! Only available at Darknet.

Federate Database User Authentication Easily with IAM and Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/federate-database-user-authentication-easily-with-iam-and-amazon-redshift/

Managing database users though federation allows you to manage authentication and authorization procedures centrally. Amazon Redshift now supports database authentication with IAM, enabling user authentication though enterprise federation. No need to manage separate database users and passwords to further ease the database administration. You can now manage users outside of AWS and authenticate them for access to an Amazon Redshift data warehouse. Do this by integrating IAM authentication and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta. In addition, database users can also be automatically created at their first login based on corporate permissions.

In this post, I demonstrate how you can extend the federation to enable single sign-on (SSO) to the Amazon Redshift data warehouse.

SAML and Amazon Redshift

AWS supports Security Assertion Markup Language (SAML) 2.0, which is an open standard for identity federation used by many IdPs. SAML enables federated SSO, which enables your users to sign in to the AWS Management Console. Users can also make programmatic calls to AWS API actions by using assertions from a SAML-compliant IdP. For example, if you use Microsoft Active Directory for corporate directories, you may be familiar with how Active Directory and AD FS work together to enable federation. For more information, see the Enabling Federation to AWS Using Windows Active Directory, AD FS, and SAML 2.0 AWS Security Blog post.

Amazon Redshift now provides the GetClusterCredentials API operation that allows you to generate temporary database user credentials for authentication. You can set up an IAM permissions policy that generates these credentials for connecting to Amazon Redshift. Extending the IAM authentication, you can configure the federation of AWS access though a SAML 2.0–compliant IdP. An IAM role can be configured to permit the federated users call the GetClusterCredentials action and generate temporary credentials to log in to Amazon Redshift databases. You can also set up policies to restrict access to Amazon Redshift clusters, databases, database user names, and user group.

Amazon Redshift federation workflow

In this post, I demonstrate how you can use a JDBC– or ODBC-based SQL client to log in to the Amazon Redshift cluster using this feature. The SQL clients used with Amazon Redshift JDBC or ODBC drivers automatically manage the process of calling the GetClusterCredentials action, retrieving the database user credentials, and establishing a connection to your Amazon Redshift database. You can also use your database application to programmatically call the GetClusterCredentials action, retrieve database user credentials, and connect to the database. I demonstrate these features using an example company to show how different database users accounts can be managed easily using federation.

The following diagram shows how the SSO process works:

  1. JDBC/ODBC
  2. Authenticate using Corp Username/Password
  3. IdP sends SAML assertion
  4. Call STS to assume role with SAML
  5. STS Returns Temp Credentials
  6. Use Temp Credentials to get Temp cluster credentials
  7. Connect to Amazon Redshift using temp credentials

Walkthrough

Example Corp. is using Active Directory (idp host:demo.examplecorp.com) to manage federated access for users in its organization. It has an AWS account: 123456789012 and currently manages an Amazon Redshift cluster with the cluster ID “examplecorp-dw”, database “analytics” in us-west-2 region for its Sales and Data Science teams. It wants the following access:

  • Sales users can access the examplecorp-dw cluster using the sales_grp database group
  • Sales users access examplecorp-dw through a JDBC-based SQL client
  • Sales users access examplecorp-dw through an ODBC connection, for their reporting tools
  • Data Science users access the examplecorp-dw cluster using the data_science_grp database group.
  • Partners access the examplecorp-dw cluster and query using the partner_grp database group.
  • Partners are not federated through Active Directory and are provided with separate IAM user credentials (with IAM user name examplecorpsalespartner).
  • Partners can connect to the examplecorp-dw cluster programmatically, using language such as Python.
  • All users are automatically created in Amazon Redshift when they log in for the first time.
  • (Optional) Internal users do not specify database user or group information in their connection string. It is automatically assigned.
  • Data warehouse users can use SSO for the Amazon Redshift data warehouse using the preceding permissions.

Step 1:  Set up IdPs and federation

The Enabling Federation to AWS Using Windows Active Directory post demonstrated how to prepare Active Directory and enable federation to AWS. Using those instructions, you can establish trust between your AWS account and the IdP and enable user access to AWS using SSO.  For more information, see Identity Providers and Federation.

For this walkthrough, assume that this company has already configured SSO to their AWS account: 123456789012 for their Active Directory domain demo.examplecorp.com. The Sales and Data Science teams are not required to specify database user and group information in the connection string. The connection string can be configured by adding SAML Attribute elements to your IdP. Configuring these optional attributes enables internal users to conveniently avoid providing the DbUser and DbGroup parameters when they log in to Amazon Redshift.

The user-name attribute can be set up as follows, with a user ID (for example, nancy) or an email address (for example. [email protected]):

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbUser">  
  <AttributeValue>user-name</AttributeValue>
</Attribute>

The AutoCreate attribute can be defined as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/AutoCreate">
    <AttributeValue>true</AttributeValue>
</Attribute>

The sales_grp database group can be included as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
    <AttributeValue>sales_grp</AttributeValue>
</Attribute>

For more information about attribute element configuration, see Configure SAML Assertions for Your IdP.

Step 2: Create IAM roles for access to the Amazon Redshift cluster

The next step is to create IAM policies with permissions to call GetClusterCredentials and provide authorization for Amazon Redshift resources. To grant a SQL client the ability to retrieve the cluster endpoint, region, and port automatically, include the redshift:DescribeClusters action with the Amazon Redshift cluster resource in the IAM role.  For example, users can connect to the Amazon Redshift cluster using a JDBC URL without the need to hardcode the Amazon Redshift endpoint:

Previous:  jdbc:redshift://endpoint:port/database

Current:  jdbc:redshift:iam://clustername:region/dbname

Use IAM to create the following policies. You can also use an existing user or role and assign these policies. For example, if you already created an IAM role for IdP access, you can attach the necessary policies to that role. Here is the policy created for sales users for this example:

Sales_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:userid": "AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
            ]
        }
    ]
}

The policy uses the following parameter values:

  • Region: us-west-2
  • AWS Account: 123456789012
  • Cluster name: examplecorp-dw
  • Database group: sales_grp
  • IAM role: AIDIODR4TAW7CSEXAMPLE
Policy Statement Description
{
"Effect":"Allow",
"Action":[
"redshift:DescribeClusters"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
]
}

Allow users to retrieve the cluster endpoint, region, and port automatically for the Amazon Redshift cluster examplecorp-dw. This specification uses the resource format arn:aws:redshift:region:account-id:cluster:clustername. For example, the SQL client JDBC can be specified in the format jdbc:redshift:iam://clustername:region/dbname.

For more information, see Amazon Resource Names.

{
"Effect":"Allow",
"Action":[
"redshift:GetClusterCredentials"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
],
"Condition":{
"StringEquals":{
"aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
}
}
}

Generates a temporary token to authenticate into the examplecorp-dw cluster. “arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}” restricts the corporate user name to the database user name for that user. This resource is specified using the format: arn:aws:redshift:region:account-id:dbuser:clustername/dbusername.

The Condition block enforces that the AWS user ID should match “AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com”, so that individual users can authenticate only as themselves. The AIDIODR4TAW7CSEXAMPLE role has the Sales_DW_IAM_Policy policy attached.

{
"Effect":"Allow",
"Action":[
"redshift:CreateClusterUser"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
]
}
Automatically creates database users in examplecorp-dw, when they log in for the first time. Subsequent logins reuse the existing database user.
{
"Effect":"Allow",
"Action":[
"redshift:JoinGroup"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
]
}
Allows sales users to join the sales_grp database group through the resource “arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp” that is specified in the format arn:aws:redshift:region:account-id:dbgroup:clustername/dbgroupname.

Similar policies can be created for Data Science users with access to join the data_science_grp group in examplecorp-dw. You can now attach the Sales_DW_IAM_Policy policy to the role that is mapped to IdP application for SSO.
 For more information about how to define the claim rules, see Configuring SAML Assertions for the Authentication Response.

Because partners are not authorized using Active Directory, they are provided with IAM credentials and added to the partner_grp database group. The Partner_DW_IAM_Policy is attached to the IAM users for partners. The following policy allows partners to log in using the IAM user name as the database user name.

Partner_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:username}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/partner_grp"
            ]
        }
    ]
}

redshift:DbUser“: “${aws:username}” forces an IAM user to use the IAM user name as the database user name.

With the previous steps configured, you can now establish the connection to Amazon Redshift through JDBC– or ODBC-supported clients.

Step 3: Set up database user access

Before you start connecting to Amazon Redshift using the SQL client, set up the database groups for appropriate data access. Log in to your Amazon Redshift database as superuser to create a database group, using CREATE GROUP.

Log in to examplecorp-dw/analytics as superuser and create the following groups and users:

CREATE GROUP sales_grp;
CREATE GROUP datascience_grp;
CREATE GROUP partner_grp;

Use the GRANT command to define access permissions to database objects (tables/views) for the preceding groups.

Step 4: Connect to Amazon Redshift using the JDBC SQL client

Assume that sales user “nancy” is using the SQL Workbench client and JDBC driver to log in to the Amazon Redshift data warehouse. The following steps help set up the client and establish the connection:

  1. Download the latest Amazon Redshift JDBC driver from the Configure a JDBC Connection page
  2. Build the JDBC URL with the IAM option in the following format:
    jdbc:redshift:iam://examplecorp-dw:us-west-2/sales_db

Because the redshift:DescribeClusters action is assigned to the preceding IAM roles, it automatically resolves the cluster endpoints and the port. Otherwise, you can specify the endpoint and port information in the JDBC URL, as described in Configure a JDBC Connection.

Identify the following JDBC options for providing the IAM credentials (see the “Prepare your environment” section) and configure in the SQL Workbench Connection Profile:

plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider 
idp_host=demo.examplecorp.com (The name of the corporate identity provider host)
idp_port=443  (The port of the corporate identity provider host)
user=examplecorp\nancy(corporate user name)
password=***(corporate user password)

The SQL workbench configuration looks similar to the following screenshot:

Now, “nancy” can connect to examplecorp-dw by authenticating using the corporate Active Directory. Because the SAML attributes elements are already configured for nancy, she logs in as database user nancy and is assigned the sales_grp. Similarly, other Sales and Data Science users can connect to the examplecorp-dw cluster. A custom Amazon Redshift ODBC driver can also be used to connect using a SQL client. For more information, see Configure an ODBC Connection.

Step 5: Connecting to Amazon Redshift using JDBC SQL Client and IAM Credentials

This optional step is necessary only when you want to enable users that are not authenticated with Active Directory. Partners are provided with IAM credentials that they can use to connect to the examplecorp-dw Amazon Redshift clusters. These IAM users are attached to Partner_DW_IAM_Policy that assigns them to be assigned to the public database group in Amazon Redshift. The following JDBC URLs enable them to connect to the Amazon Redshift cluster:

jdbc:redshift:iam//examplecorp-dw/analytics?AccessKeyID=XXX&SecretAccessKey=YYY&DbUser=examplecorpsalespartner&DbGroup= partner_grp&AutoCreate=true

The AutoCreate option automatically creates a new database user the first time the partner logs in. There are several other options available to conveniently specify the IAM user credentials. For more information, see Options for providing IAM credentials.

Step 6: Connecting to Amazon Redshift using an ODBC client for Microsoft Windows

Assume that another sales user “uma” is using an ODBC-based client to log in to the Amazon Redshift data warehouse using Example Corp Active Directory. The following steps help set up the ODBC client and establish the Amazon Redshift connection in a Microsoft Windows operating system connected to your corporate network:

  1. Download and install the latest Amazon Redshift ODBC driver.
  2. Create a system DSN entry.
    1. In the Start menu, locate the driver folder or folders:
      • Amazon Redshift ODBC Driver (32-bit)
      • Amazon Redshift ODBC Driver (64-bit)
      • If you installed both drivers, you have a folder for each driver.
    2. Choose ODBC Administrator, and then type your administrator credentials.
    3. To configure the driver for all users on the computer, choose System DSN. To configure the driver for your user account only, choose User DSN.
    4. Choose Add.
  3. Select the Amazon Redshift ODBC driver, and choose Finish. Configure the following attributes:
    Data Source Name =any friendly name to identify the ODBC connection 
    Database=analytics
    user=uma(corporate user name)
    Auth Type-Identity Provider: AD FS
    password=leave blank (Windows automatically authenticates)
    Cluster ID: examplecorp-dw
    idp_host=demo.examplecorp.com (The name of the corporate IdP host)

This configuration looks like the following:

  1. Choose OK to save the ODBC connection.
  2. Verify that uma is set up with the SAML attributes, as described in the “Set up IdPs and federation” section.

The user uma can now use this ODBC connection to establish the connection to the Amazon Redshift cluster using any ODBC-based tools or reporting tools such as Tableau. Internally, uma authenticates using the Sales_DW_IAM_Policy  IAM role and is assigned the sales_grp database group.

Step 7: Connecting to Amazon Redshift using Python and IAM credentials

To enable partners, connect to the examplecorp-dw cluster programmatically, using Python on a computer such as Amazon EC2 instance. Reuse the IAM users that are attached to the Partner_DW_IAM_Policy policy defined in Step 2.

The following steps show this set up on an EC2 instance:

  1. Launch a new EC2 instance with the Partner_DW_IAM_Policy role, as described in Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances. Alternatively, you can attach an existing IAM role to an EC2 instance.
  2. This example uses Python PostgreSQL Driver (PyGreSQL) to connect to your Amazon Redshift clusters. To install PyGreSQL on Amazon Linux, use the following command as the ec2-user:
    sudo easy_install pip
    sudo yum install postgresql postgresql-devel gcc python-devel
    sudo pip install PyGreSQL

  1. The following code snippet demonstrates programmatic access to Amazon Redshift for partner users:
    #!/usr/bin/env python
    """
    Usage:
    python redshift-unload-copy.py <config file> <region>
    
    * Copyright 2014, Amazon.com, Inc. or its affiliates. All Rights Reserved.
    *
    * Licensed under the Amazon Software License (the "License").
    * You may not use this file except in compliance with the License.
    * A copy of the License is located at
    *
    * http://aws.amazon.com/asl/
    *
    * or in the "license" file accompanying this file. This file is distributed
    * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
    * express or implied. See the License for the specific language governing
    * permissions and limitations under the License.
    """
    
    import sys
    import pg
    import boto3
    
    REGION = 'us-west-2'
    CLUSTER_IDENTIFIER = 'examplecorp-dw'
    DB_NAME = 'sales_db'
    DB_USER = 'examplecorpsalespartner'
    
    options = """keepalives=1 keepalives_idle=200 keepalives_interval=200
                 keepalives_count=6"""
    
    set_timeout_stmt = "set statement_timeout = 1200000"
    
    def conn_to_rs(host, port, db, usr, pwd, opt=options, timeout=set_timeout_stmt):
        rs_conn_string = """host=%s port=%s dbname=%s user=%s password=%s
                             %s""" % (host, port, db, usr, pwd, opt)
        print "Connecting to %s:%s:%s as %s" % (host, port, db, usr)
        rs_conn = pg.connect(dbname=rs_conn_string)
        rs_conn.query(timeout)
        return rs_conn
    
    def main():
        # describe the cluster and fetch the IAM temporary credentials
        global redshift_client
        redshift_client = boto3.client('redshift', region_name=REGION)
        response_cluster_details = redshift_client.describe_clusters(ClusterIdentifier=CLUSTER_IDENTIFIER)
        response_credentials = redshift_client.get_cluster_credentials(DbUser=DB_USER,DbName=DB_NAME,ClusterIdentifier=CLUSTER_IDENTIFIER,DurationSeconds=3600)
        rs_host = response_cluster_details['Clusters'][0]['Endpoint']['Address']
        rs_port = response_cluster_details['Clusters'][0]['Endpoint']['Port']
        rs_db = DB_NAME
        rs_iam_user = response_credentials['DbUser']
        rs_iam_pwd = response_credentials['DbPassword']
        # connect to the Amazon Redshift cluster
        conn = conn_to_rs(rs_host, rs_port, rs_db, rs_iam_user,rs_iam_pwd)
        # execute a query
        result = conn.query("SELECT sysdate as dt")
        # fetch results from the query
        for dt_val in result.getresult() :
            print dt_val
        # close the Amazon Redshift connection
        conn.close()
    
    if __name__ == "__main__":
        main()

You can save this Python program in a file (redshiftscript.py) and execute it at the command line as ec2-user:

python redshiftscript.py

Now partners can connect to the Amazon Redshift cluster using the Python script, and authentication is federated through the IAM user.

Summary

In this post, I demonstrated how to use federated access using Active Directory and IAM roles to enable single sign-on to an Amazon Redshift cluster. I also showed how partners outside an organization can be managed easily using IAM credentials.  Using the GetClusterCredentials API action, now supported by Amazon Redshift, lets you manage a large number of database users and have them use corporate credentials to log in. You don’t have to maintain separate database user accounts.

Although this post demonstrated the integration of IAM with AD FS and Active Directory, you can replicate this solution across with your choice of SAML 2.0 third-party identity providers (IdP), such as PingFederate or Okta. For the different supported federation options, see Configure SAML Assertions for Your IdP.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to establish federated access to your AWS resources by using Active Directory user attributes.


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Implementing Default Directory Indexes in Amazon S3-backed Amazon CloudFront Origins Using [email protected]

Post Syndicated from Ronnie Eichler original https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/

With the recent launch of [email protected], it’s now possible for you to provide even more robust functionality to your static websites. Amazon CloudFront is a content distribution network service. In this post, I show how you can use [email protected] along with the CloudFront origin access identity (OAI) for Amazon S3 and still provide simple URLs (such as www.example.com/about/ instead of www.example.com/about/index.html).

Background

Amazon S3 is a great platform for hosting a static website. You don’t need to worry about managing servers or underlying infrastructure—you just publish your static to content to an S3 bucket. S3 provides a DNS name such as <bucket-name>.s3-website-<AWS-region>.amazonaws.com. Use this name for your website by creating a CNAME record in your domain’s DNS environment (or Amazon Route 53) as follows:

www.example.com -> <bucket-name>.s3-website-<AWS-region>.amazonaws.com

You can also put CloudFront in front of S3 to further scale the performance of your site and cache the content closer to your users. CloudFront can enable HTTPS-hosted sites, by either using a custom Secure Sockets Layer (SSL) certificate or a managed certificate from AWS Certificate Manager. In addition, CloudFront also offers integration with AWS WAF, a web application firewall. As you can see, it’s possible to achieve some robust functionality by using S3, CloudFront, and other managed services and not have to worry about maintaining underlying infrastructure.

One of the key concerns that you might have when implementing any type of WAF or CDN is that you want to force your users to go through the CDN. If you implement CloudFront in front of S3, you can achieve this by using an OAI. However, in order to do this, you cannot use the HTTP endpoint that is exposed by S3’s static website hosting feature. Instead, CloudFront must use the S3 REST endpoint to fetch content from your origin so that the request can be authenticated using the OAI. This presents some challenges in that the REST endpoint does not support redirection to a default index page.

CloudFront does allow you to specify a default root object (index.html), but it only works on the root of the website (such as http://www.example.com > http://www.example.com/index.html). It does not work on any subdirectory (such as http://www.example.com/about/). If you were to attempt to request this URL through CloudFront, CloudFront would do a S3 GetObject API call against a key that does not exist.

Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of [email protected], you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.

Solution

In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.

When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

To get started, create an S3 bucket to be the origin for CloudFront:

Create bucket

On the other screens, you can just accept the defaults for the purposes of this walkthrough. If this were a production implementation, I would recommend enabling bucket logging and specifying an existing S3 bucket as the destination for access logs. These logs can be useful if you need to troubleshoot issues with your S3 access.

Now, put some content into your S3 bucket. For this walkthrough, create two simple webpages to demonstrate the functionality:  A page that resides at the website root, and another that is in a subdirectory.

<s3bucketname>/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>

<s3bucketname>/subdirectory/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>

When uploading the files into S3, you can accept the defaults. You add a bucket policy as part of the CloudFront distribution creation that allows CloudFront to access the S3 origin. You should now have an S3 bucket that looks like the following:

Root of bucket

Subdirectory in bucket

Next, create a CloudFront distribution that your users will use to access the content. Open the CloudFront console, and choose Create Distribution. For Select a delivery method for your content, under Web, choose Get Started.

On the next screen, you set up the distribution. Below are the options to configure:

  • Origin Domain Name:  Select the S3 bucket that you created earlier.
  • Restrict Bucket Access: Choose Yes.
  • Origin Access Identity: Create a new identity.
  • Grant Read Permissions on Bucket: Choose Yes, Update Bucket Policy.
  • Object Caching: Choose Customize (I am changing the behavior to avoid having CloudFront cache objects, as this could affect your ability to troubleshoot while implementing the Lambda code).
    • Minimum TTL: 0
    • Maximum TTL: 0
    • Default TTL: 0

You can accept all of the other defaults. Again, this is a proof-of-concept exercise. After you are comfortable that the CloudFront distribution is working properly with the origin and Lambda code, you can re-visit the preceding values and make changes before implementing it in production.

CloudFront distributions can take several minutes to deploy (because the changes have to propagate out to all of the edge locations). After that’s done, test the functionality of the S3-backed static website. Looking at the distribution, you can see that CloudFront assigns a domain name:

CloudFront Distribution Settings

Try to access the website using a combination of various URLs:

http://<domainname>/:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET / HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "cb7e2634fe66c1fd395cf868087dd3b9"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: -D2FSRwzfcwyKZKFZr6DqYFkIf4t7HdGw2MkUF5sE6YFDxRJgi0R1g==
< Content-Length: 209
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:16 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This is because CloudFront is configured to request a default root object (index.html) from the origin.

http://<domainname>/subdirectory/:  Doesn’t work

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "d41d8cd98f00b204e9800998ecf8427e"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: Iqf0Gy8hJLiW-9tOAdSFPkL7vCWBrgm3-1ly5tBeY_izU82ftipodA==
< Content-Length: 0
< Content-Type: application/x-directory
< Last-Modified: Wed, 19 Jul 2017 19:21:24 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

If you use a tool such like cURL to test this, you notice that CloudFront and S3 are returning a blank response. The reason for this is that the subdirectory does exist, but it does not resolve to an S3 object. Keep in mind that S3 is an object store, so there are no real directories. User interfaces such as the S3 console present a hierarchical view of a bucket with folders based on the presence of forward slashes, but behind the scenes the bucket is just a collection of keys that represent stored objects.

http://<domainname>/subdirectory/index.html:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/index.html
*   Trying 54.192.192.130...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.130) port 80 (#0)
> GET /subdirectory/index.html HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 20:35:15 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: RefreshHit from cloudfront
< X-Amz-Cf-Id: bkh6opXdpw8pUomqG3Qr3UcjnZL8axxOH82Lh0OOcx48uJKc_Dc3Cg==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3f2788d309d30f41de96da6f931d4ede.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This request works as expected because you are referencing the object directly. Now, you implement the [email protected] function to return the default index.html page for any subdirectory. Looking at the example JavaScript code, here’s where the magic happens:

var newuri = olduri.replace(/\/$/, '\/index.html');

You are going to use a JavaScript regular expression to match any ‘/’ that occurs at the end of the URI and replace it with ‘/index.html’. This is the equivalent to what S3 does on its own with static website hosting. However, as I mentioned earlier, you can’t rely on this if you want to use a policy on the bucket to restrict it so that users must access the bucket through CloudFront. That way, all requests to the S3 bucket must be authenticated using the S3 REST API. Because of this, you implement a [email protected] function that takes any client request ending in ‘/’ and append a default ‘index.html’ to the request before requesting the object from the origin.

In the Lambda console, choose Create function. On the next screen, skip the blueprint selection and choose Author from scratch, as you’ll use the sample code provided.

Next, configure the trigger. Choosing the empty box shows a list of available triggers. Choose CloudFront and select your CloudFront distribution ID (created earlier). For this example, leave Cache Behavior as * and CloudFront Event as Origin Request. Select the Enable trigger and replicate box and choose Next.

Lambda Trigger

Next, give the function a name and a description. Then, copy and paste the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

Next, define a role that grants permissions to the Lambda function. For this example, choose Create new role from template, Basic Edge Lambda permissions. This creates a new IAM role for the Lambda function and grants the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ]
        }
    ]
}

In a nutshell, these are the permissions that the function needs to create the necessary CloudWatch log group and log stream, and to put the log events so that the function is able to write logs when it executes.

After the function has been created, you can go back to the browser (or cURL) and re-run the test for the subdirectory request that failed previously:

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.202...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.202) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 21:18:44 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: rwFN7yHE70bT9xckBpceTsAPcmaadqWB9omPBv2P6WkIfQqdjTk_4w==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3572de112011f1b625bb77410b0c5cca.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

You have now configured a way for CloudFront to return a default index page for subdirectories in S3!

Summary

In this post, you used [email protected] to be able to use CloudFront with an S3 origin access identity and serve a default root object on subdirectory URLs. To find out some more about this use-case, see [email protected] integration with CloudFront in our documentation.

If you have questions or suggestions, feel free to comment below. For troubleshooting or implementation help, check out the Lambda forum.