Tag Archives: IAM

Recovering from a rough Monday morning: An Amazon GuardDuty threat detection and remediation scenario

Post Syndicated from Greg McConnel original https://aws.amazon.com/blogs/security/amazon-guardduty-threat-detection-and-remediation-scenario/

Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. Given the many log types that Amazon GuardDuty analyzes (Amazon Virtual Private Cloud (VPC) Flow Logs, AWS CloudTrail, and DNS logs), you never know what it might discover in your AWS account. After enabling GuardDuty, you might quickly find serious threats lurking in your account or, preferably, just end up staring at a blank dashboard for weeks…or even longer.

A while back at an AWS Loft event, one of the customers enabled GuardDuty in their AWS account for a lab we were running. Soon after, GuardDuty alerts (findings) popped up that indicated multiple Amazon Elastic Compute Cloud (EC2) instances were communicating with known command and control servers. This means that GuardDuty detected activity commonly seen in the situation where an EC2 instance has been taken over as part of a botnet. The customer asked if this was part of the lab, and we explained it wasn’t and that the findings should be immediately investigated. This led to an investigation by that customer’s security team and luckily the issue was resolved quickly.

Then there was the time we spoke to a customer that had been running GuardDuty for a few days but had yet to see any findings in the dashboard. They were concerned that the service wasn’t working. We explained that the lack of findings was actually a good thing, and we discussed how to generate sample findings to test GuardDuty and their remediation pipeline.

This post, and the corresponding GitHub repository, will help prepare you for either type of experience by walking you through a threat detection and remediation scenario. The scenario will show you how to quickly enable GuardDuty, generate and examine test findings, and then review automated remediation examples using AWS Lambda.

Scenario overview

The instructions and AWS CloudFormation template for setting everything up are provided in a GitHub repository. The CloudFormation template sets up a test environment in your AWS Account, configures everything needed to run through the scenario, generates GuardDuty findings and provides automatic remediation for the simulated threats in the scenario. All you need to do is run the CloudFormation template in the GitHub repository and then follow the instructions to investigate what occurred.

The scenario presented is that you manage an IT organization and Alice, your security engineer, has enabled GuardDuty in a production AWS Account and configured a few automated remediations. In threat detection and remediation, the standard pattern starts with a threat which is then investigated and finally remediated. These remediations can be manual or automated. Alice focused on a few specific attack vectors, which represent a small sample of what GuardDuty is capable of detecting. Alice has set all this up on Thursday but isn’t in the office on Monday. Unfortunately, as soon as you arrive at the office, GuardDuty notifies you that multiple threats have been detected (and given the automated remediation setup, these threats have been addressed but you still need to investigate.) The documentation in GitHub will guide you through the analysis of the findings and discuss how the automatic remediation works. You will also have the opportunity to manually trigger a GuardDuty finding and view that automated remediation.

The GuardDuty findings generated in the scenario are listed here:

You can view all of the GuardDuty findings here.

You can get started immediately by browsing to the GitHub repository for this scenario where you will find the instructions and AWS CloudFormation template. This scenario will show you how easy it is to enable GuardDuty in addition to demonstrating some of the threats GuardDuty can discover. To learn more about Amazon GuardDuty please see the GuardDuty site and GuardDuty documentation.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon GuardDuty forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Control access to your APIs using Amazon API Gateway resource policies

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/control-access-to-your-apis-using-amazon-api-gateway-resource-policies/

This post courtesy of Tapodipta Ghosh, AWS Solutions Architect

Amazon API Gateway provides you with a simple, flexible, secure, and fully managed service that lets you focus on building core business services. API Gateway supports multiple mechanisms of access control using AWS Identity and Access Management (IAM), AWS Lambda authorizers, and Amazon Cognito.

You may want to enforce strict control on the locations from which your APIs are invoked. For example, if you are an AWS Partner who offers APIs over a SaaS model, you can take advantage of the new Amazon API Gateway resource policies feature to control access to your APIs using predefined IP address ranges. API Gateway resource policies are JSON policy documents that you attach to an API to control whether a specified principal (typically, an IAM user or role) can invoke the API.

After a customer subscribes to your SaaS product in AWS Marketplace, you can ask for IP address ranges in the registration information. Then you can enable access to your API from only those IP addresses, making it a secure integration. For example, if you know that your customers are spread across a certain geography, you could blacklist all other countries. Alternately, if you have global customers, you can whitelist only specific IP address ranges.

What problems do resource policies solve?

In a distributed development team with separate AWS accounts, integration testing can be challenging. Allowing users from a different AWS account to access your API requires writing and maintaining code for assuming the role in the API owners account. Also, if you work with a third party, you have to write a Lambda authorizer to implement a bearer token–based authorization scheme.

Now, you can use resource policies much like S3 bucket policies, to provide overarching controls on your APIs without writing custom authorizers or complicated application logic. In this post, I demonstrate how you can use API Gateway resource policies to enable users from a different AWS account to access your API securely. You can also allow the API to be invoked only from specified source IP address ranges or CIDR blocks, without writing any code.

Solution overview

Imagine a company has two teams, Team A and Team B. Team B has created an API that is backed by a Lambda function and a DynamoDB database. They want to make the API public to third parties. First, they want Team A to run integration tests. After the API goes live, Team B wants to allow only users who access the API from a known IP address range.

The following diagram shows the sequence:
Flow Diagram

Start with building an API. For this walkthrough, use a SAM template and the AWS CLI to create the API. For the code to create an API and attach the resource policy to it, see the Sam-moviesapi-resourcepolicy GitHub repo.

Here’s a walkthrough of the steps, so you can get a deeper understanding of what’s happening under the covers.

  • Create the API
  • Turn on IAM authentication
  • Grant user access
  • Test the access permissions

Create the API

Assume that you are hosting the API in AccountB. Run the following commands:

git clone https://github.com/aws-samples/aws-sam-movies-api-resource-policy.git
mkdir ./build

cp -p -r ./movies
./build/movies

pip install -r
requirements.txt -t ./build

aws cloudformation package --template-file template.yaml --output-template-file template-out.yaml --s3-bucket $S3Bucket –profile AccountB

aws cloudformation deploy --template-file template-out.yaml --stack-name apigw-resource-policies-demo --capabilities CAPABILITY_IAM –profile AccountB

Note: You’ll need an S3 bucket to store your artifact for the “package” step.

Turn on IAM authentication

After the movie API is set up, turn on IAM authentication, so that it’s protected from unauthenticated attempts.
It should look like the following screenshot:
iam-auth-on

Also, make sure that you are getting a valid response when you make a GET request, as shown in the following screenshot:

Grant user access

Now grant AccountA user access to your API. In the API Gateway console, choose Movies API, Resource Policy.

Note: All the IP address ranges recorded in this post are for illustration purposes only.

Here is a screenshot of how it would look in the console:

The entire policy is listed here:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<account_idA>:user/<user>",
                    "arn:aws:iam::<account_idA>:root"
                ]
            },
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*/*/*"
        },
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": " 203.0.113.0/24"
                }
            }
        }
    ]
}

Here are a few points worth noting. The first policy statement shows how you could provide granular access to certain API IDs down to the specific resource paths in the resource section of the policy. To provide the AccountA user with access only to GET requests, change the resource line to the following:

"Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*/GET/*"

In the second statement, you are whitelisting the entire 203.0.113.0/24 network to make all calls to the API.

While whitelisting IP addresses is a good way to start while launching the API for the first time, maintaining the updated list could provide challenging. For a stable product, blacklisting bad actors might be more practical.

A blacklist implementation could look like the following:

{
	"Effect": "Deny",
	"Principal": "*",
	"Action": "execute-api:Invoke",
	"Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*",
	"Condition": {
		"IpAddress": {
			"aws:SourceIp": "203.0.113.0/24"
		}
	}
}

You have access logs turned on for the API and your log analysis tool has flagged bad actor/s from a particular IP address range, for example 203.0.113.0/24. Now you can blacklist this IP address in the resource policy.

Test the access permissions

You can now test, using postman, to ensure that the user from AccountA can indeed call the API hosted in AccountB. Also verify that attempts from other accounts are rejected.

In the following examples, the AWS Signature is configured to the AccessKey and SecretKey values from an AccountB user, who was granted access to the API.

Successful response from an authorized user from AccountB – Got a 200 OK

Failure from an unauthorized account/user: Got 401 Unauthorized

Summary

In this post, I showed you the different ways that you can use resource policies to lock down access to your API. Want to restrict a dev API endpoint to the office IP address range? Now you can. Cross-account API access is also made much simpler without having to write complex authentication/authorization schemes.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Директивата за авторско право: ход на ревизията: да се действа сега

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/26/copyright-5/

Ново развитие в ревизията на авторското право в ЕС – става ясно от  съобщенията на българското председателство, участници в ревизията и Юлия Реда – защото тя имаше много ясен възглед какво иска да се промени в правната рамка (общ режим на изключенията, актуализиране – за да имаме правна рамка, адекватна на технологичното развитие) – и сега следи ангажирано законодателния процес.

Правителствата на държавите от ЕС  са приели позиция  относно реформата на авторските права  без съществени промени по чл.11 (новото право за издателите)  и чл.13 (филтрите на входа), проектът е на сайта на Реда,  Politico дава измененията, засягащи правото на издателите, в цвят.

Сега Парламентът трябва да ги спре, пише Реда.

 Сега имате шанса да окажете влияние – шанс, който ще изчезне след две години, когато всички “изведнъж” ще се сблъскат с предизвикателството да се  внедряват филтри   и link tax.  Експертите почти единодушно се съгласяват, че проектът за реформата на авторското право е наистина лош.

Update: Member State governments have just adopted their position on #copyright, with no significant changes to the #CensorshipMachines and #LinkTax provisions. It is now up to Parliament to stop them and #FixCopyright. https://t.co/1JwNvQn24n pic.twitter.com/KAgqV3YYG1

https://platform.twitter.com/widgets.js

Две графики от сайта на Реда – за двата текста,  против които се събира подкрепа (вж и преподавателите) – за  отношението по държави и по партии в ЕП:

 

 

Достъп до документи – триалози – още една стъпка напред

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/22/access_ep/

Документите за триалозите често се оказват в ръцете на  привилегировани лобисти само минути, след като се произвеждат. Даването на гражданите на един и същ достъп до тези документи не е просто въпрос на справедливост, необходимо е да се позволи обществен контрол върху целия законодателен процес. Парламентът многократно призовава за проактивно публикуване на документите за триалозите  на уебсайтовете на институциите.

Коментарът на Юлия Реда за новината, че Комисията по правни въпроси (JURI)  препоръча на Европейския парламент да не обжалва решението на Общия съд по делото De Capitani v. European Parliament , с което се дава достъп до документите от триалозите (вж за решението).

 

Williams: Introducing Git protocol version 2

Post Syndicated from corbet original https://lwn.net/Articles/754872/rss

Brandon Williams writes
about the new Git remote protocol
that will debut in the 2.18 release.
We recently rolled out support for protocol version 2 at Google and
have seen a performance improvement of 3x for no-op fetches of a single
branch on repositories containing 500k references. Protocol v2 has also
enabled a reduction of 8x of the overhead bytes (non-packfile) sent from
googlesource.com servers. A majority of this improvement is due to
filtering references advertised by the server to the refs the client has
expressed interest in.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

[$] The trouble with get_user_pages()

Post Syndicated from corbet original https://lwn.net/Articles/753027/rss

When kernel code needs to work directly with user-space pages, it often
calls get_user_pages()
(or one of several variants) to fault those pages into RAM and pin them
there. This function is not entirely easy to use, though, and recent
changes have made it harder to use safely. Jan Kara and Dan Williams led a
plenary session at the 2018 Linux Storage, Filesystem, and
Memory-Management Summit to discuss potential solutions, but it is not
entirely clear that any were found.

Easier way to control access to AWS regions using IAM policies

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/easier-way-to-control-access-to-aws-regions-using-iam-policies/

We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.

Condition concepts

Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.

Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.

AWS:RequestedRegion condition key

The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.

Condition key Description Operator(s) Value
aws:RequestedRegion Allows you to specify the region to which the IAM principal (user or role) can make API calls All string operators (for example, StringEquals Any AWS region (for example, us-east-1)

I’ll now demonstrate the use of the new global condition key.

Example: Policy with region-level control

Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.

My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:DescribeKeyPairs",
                "rds:Describe*",
                "iam:ListRolePolicies",
                "iam:ListRoles",
                "iam:GetRole",
                "iam:ListInstanceProfiles",
                "iam:AttachRolePolicy",
                "lambda:GetAccountSettings"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:RunInstances",
                "rds:CreateDBInstance",
                "rds:CreateDBCluster",
                "lambda:CreateFunction",
                "lambda:InvokeFunction"
            ],
            "Resource": "*",
      "Condition": {"StringEquals": {"aws:RequestedRegion": "eu-central-1"}}

        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::account-id:role/*"
        }
    ]
}

The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.

Summary

You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Want more AWS Security news? Follow us on Twitter.