Tag Archives: IAM

Guidelines for protecting your AWS account while using programmatic access

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/security/guidelines-for-protecting-your-aws-account-while-using-programmatic-access/

One of the most important things you can do as a customer to ensure the security of your resources is to maintain careful control over who has access to them. This is especially true if any of your AWS users have programmatic access. Programmatic access allows you to invoke actions on your AWS resources either through an application that you write or through a third-party tool. You use an access key ID and a secret access key to sign your requests for authorization to AWS. Programmatic access can be quite powerful, so implementing best practices to protect access key IDs and secret access keys is important in order to prevent accidental or malicious account activity. In this post, I’ll highlight some general guidelines to help you protect your account, as well as some of the options you have when you need to provide programmatic access to your AWS resources.

Protect your root account

Your AWS root account—the account that’s created when you initially sign up with AWS—has unrestricted access to all your AWS resources. There’s no way to limit permissions on a root account. For this reason, AWS always recommends that you do not generate access keys for your root account. This would give your users the power to do things like close the entire account—an ability that they probably don’t need. Instead, you should create individual AWS Identity and Access Management (IAM) users, then grant each user permissions based on the principle of least privilege: Grant them only the permissions required to perform their assigned tasks. To more easily manage the permissions of multiple IAM users, you should assign users with the same permissions to an IAM group.

Your root account should always be protected by Multi-Factor Authentication (MFA). This additional layer of security helps protect against unauthorized logins to your account by requiring two factors: something you know (a password) and something you have (for example, an MFA device). AWS supports virtual and hardware MFA devices, U2F security keys, and SMS text message-based MFA.

Decide how to grant access to your AWS account

To allow users access to the AWS Management Console and AWS Command Line Interface (AWS CLI), you have two options. The first one is to create identities and allow users to log in using a username and password managed by the IAM service. The second approach is to use federation
to allow your users to use their existing corporate credentials to log into the AWS console and CLI.

Each approach has its use cases. Federation is generally better for enterprises that have an existing central directory or plan to need more than the current limit of 5,000 IAM users.

Note: Access to all AWS accounts is managed by AWS IAM. Regardless of the approach you choose, make sure to familiarize yourself with and follow IAM best practices.

Decide when to use access keys

Applications running outside of an AWS environment will need access keys for programmatic access to AWS resources. For example, monitoring tools running on-premises and third-party automation tools will need access keys.

However, if the resources that need programmatic access are running inside AWS, the best practice is to use IAM roles instead. An IAM role is a defined set of permissions—it’s not associated with a specific user or group. Instead, any trusted entity can assume the role to perform a specific business task.

By utilizing roles, you can grant a resource access without hardcoding an access key ID and secret access key into the configuration file. For example, you can grant an Amazon Elastic Compute Cloud (EC2) instance access to an Amazon Simple Storage Service (Amazon S3) bucket by attaching a role with a policy that defines this access to the EC2 instance. This approach improves your security, as IAM will dynamically manage the credentials for you with temporary credentials that are rotated automatically.

Grant least privileges to service accounts

If you decided to create service accounts (that is, accounts used for programmatic access by applications running outside of the AWS environment) and generate access keys for them, you should create a dedicated service account for each use case. This will allow you to restrict the associated policy to only the permissions needed for the particular use case, limiting the blast radius if the credentials are compromised. For example, if a monitoring tool and a release management tool both require access to your AWS environment, create two separate service accounts with two separate policies that define the minimum set of permissions for each tool.

In addition to this, it’s also a best practice to add conditions to the policy that further restrict access—such as restricting access to only the source IP address range of your clients.

Below is an example policy that represents least privileges. It grants the needed permissions (PutObject) on to a specific resource (an S3 bucket named “examplebucket”) while adding further conditions (the client must come from IP range 203.0.113.0/24).


{
    "Version": "2012-10-17",
    "Id": "S3PolicyRestrictPut",
    "Statement": [
            {
            "Sid": "IPAllow",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::examplebucket/*",
            "Condition": {
                "IpAddress": {"aws:SourceIp": "203.0.113.0/24"}
            } 
        } 
    ]
}

Use temporary credentials from AWS STS

AWS Security Token Service (AWS STS) is a web service that enables you to request temporary credentials for use in your code, CLI, or third-party tools. It allows you to assume an IAM role with which you have a trusted relationship and then generate temporary, time-limited credentials based on the permissions associated with the role. These credentials can only be used during the validity period, which reduces your risk.

There are two ways to generate temporary credentials. You can generate them from the CLI, which is helpful when you need credentials for testing from your local machine or from an on-premises or third-party tool. You can also generate them from code using one of the AWS SDKs. This approach is helpful if you need credentials in your application, or if you have multiple user types that require different permission levels.

Create temporary credentials using the CLI

If you have access to the AWS CLI, you can use it to generate temporary credentials with limited permissions to use in your local testing or with third-party tools. To be able to use this approach, here’s what you need:

  • Access to the AWS CLI through your primary user account or through federation. To learn how to configure CLI access using your IAM credentials, follow this link. If you use federation, you still can use the CLI by following the instructions in this blog post.
  • An IAM role that represents the permissions needed for your test client. In the example below, I use “s3-read”. This role should have a policy attached that grants the least privileges needed for the use case.
  • A trusted relationship between the service role (“s3-read”) and your user account, to allow you to assume the service role and generate temporary credentials. Visit this link for the steps to create this trust relationship.

The example command below will generate a temporary access key ID and secret access key that are valid for 15 minutes, based on permissions associated with the role named “s3-read”. You can replace the values below with your own account number, service role, and duration, then use the secret access key and access key ID in your local clients.


aws sts assume-role --role-arn <arn:aws:iam::AWS-ACCOUNT-NUMBER:role/s3-read> --role-session-name <s3-access> --duration-seconds <900>

Here are my results from running the command:


{ "AssumedRoleUser": 
    { 
        "AssumedRoleId": "AROAIEGLQIIQUSJ2I5XRM:s3-access", 
        "Arn": "arn:aws:sts::AWS-ACCOUNT-NUMBER:assumed-role/s3-read/s3-access" 
    }, 
    "Credentials": { 
        "SecretAccessKey":"wZJph6PX3sn0ZU4g6yfXdkyXp5m+nwkEtdUHwC3w",  
        "SessionToken": "FQoGZXIvYXdzENr//////////<<REST-OF-TOKEN>>",
        "Expiration": "2018-11-02T16:46:23Z",
        "AccessKeyId": "ASIAXQZXUENECYQBAAQG" 
    } 
  }

Create temporary credentials from your code

If you have an application that already uses the AWS SDK, you can use AWS STS to generate temporary credentials right from the code instead of hard-coding credentials into your configurations. This approach is recommended if you have client-side code that requires credentials, or if you have multiple types of users (for example, admins, power-users, and regular users) since it allows you to avoid hardcoding multiple sets of credentials for each user type.

For more information about using temporary credentials from the AWS SDK, visit this link.

Utilize Access Advisor

The IAM console provides information about when an AWS service was last accessed by different principals. This information is called service last accessed data.

Using this tool, you can view when an IAM user, group, role, or policy last attempted to access services to which they have permissions. Based on this information, you can decide if certain permissions need to be revoked or restricted further.

Make this tool part of your periodic security check. Use it to evaluate the permissions of all your IAM entities and to revoke unused permissions until they’re needed. You can also automate the process of periodic permissions evaluation using Access Advisor APIs. If you want to learn how, this blog post is a good starting point.

Other tools for credentials management

While least privilege access and temporary credentials are important, it’s equally important that your users are managing their credentials properly—from rotation to storage. Below is a set of services and features that can help to securely store, retrieve, and rotate credentials.

AWS Systems Manager Parameter Store

AWS Systems Manager offers a capability called Parameter Store that provides secure, centralized storage for configuration parameters and secrets across your AWS account. You can store plain text or encrypted data like configuration parameters, credentials, and license keys. Once stored, you can configure granular access to specify who can obtain these parameters in your application, adding another layer of security to protect your data.

Parameter store is a good choice for use cases in which you need hierarchical storage for configuration data management across your account. For example, you can store database access credentials (username and password) in parameter store, encrypt them with an encryption key managed by AWS Key Management Service, and grant EC2 instances running your application permissions to read and decrypt those credentials.

For more information on using AWS Systems Manager Parameter Store, visit this link.

AWS Secrets Manager

AWS Secrets Manager is a service that allows you to centrally manage the lifecycle of secrets used in your organization, including rotation, audits, and access control. By enabling you to rotate secrets automatically, Secrets Manager can help you meet your security and compliance requirements. Secrets Manager also offers built-in integration for MySQL, PostgreSQL, and Amazon Aurora on Amazon RDS and can be extended to other services.

For more information about using AWS Secrets Manager to store and retrieve secrets, visit this link.

Amazon Cognito

Amazon Cognito lets you add user registration, sign-in, and access management features to your web and mobile applications.

Cognito can be used as an Identity Provider (IdP), where it stores and maintains users and credentials securely for your applications, or it can be integrated with OpenID Connect, SAML, and other popular web identity providers like Amazon.com.

Using Amazon Cognito, you can generate temporary access credentials for your clients to access AWS services, eliminating the need to store long-term credentials in client applications.

To learn more about using Amazon Cognito as an IdP, visit our developer guide to Amazon Cognito User Pools. If you’re interested in information about using Amazon Cognito with a third party IdP, review our guide to Amazon Cognito Identity Pools (Federated Identities).

AWS Trusted Advisor

AWS Trusted Advisor is a service that provides a real-time review of your AWS account and offers guidance on how to optimize your resources to reduce cost, increase performance, expand reliability, and improve security.

The “Security” section of AWS Trusted Advisor should be reviewed on regular basis to evaluate the health of your AWS account. Currently, there are multiple security specific checks that occur—from IAM access keys that haven’t been rotated to insecure security groups. Trusted Advisor is a tool to help you more easily perform a daily or weekly review of your AWS account.

git-secrets

git-secrets
, available from the AWS Labs GitHub account, helps you avoid committing passwords and other sensitive credentials to a git repository. It scans commits, commit messages, and –no-ff merges to prevent your users from inadvertently adding secrets to your repositories.

Conclusion

In this blog post, I’ve introduced some options to replace long-term credentials in your applications with temporary access credentials that can be generated using various tools and services on the AWS platform. Using temporary credentials can reduce the risk of falling victim to a compromised environment, further protecting your business.

I also discussed the concept of least privilege and provided some helpful services and procedures to maintain and audit the permissions given to various identities in your environment.

If you have questions or feedback about this blog post, submit comments in the Comments section below, or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Mahmoud Matouk

Mahmoud is part of our world-wide public sector Solutions Architects, helping higher education customers build innovative, secured, and highly available solutions using various AWS services.

Author

Joe Chapman

Joe is a Solutions Architect with Amazon Web Services. He primarily serves AWS EdTech customers, providing architectural guidance and best practice recommendations for new and existing workloads. Outside of work, he enjoys spending time with his wife and dog, and finding new adventures while traveling the world.

How to quickly find and update your access keys, password, and MFA setting using the AWS Management Console

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/how-to-find-update-access-keys-password-mfa-aws-management-console/

You can now more quickly view and update all your security credentials from one place using the “My Security Credentials” page in the AWS Management Console. When you grant your developers programmatic access or AWS Management Console access, they receive credentials, such as a password or access keys, to access AWS resources. For example, creating users in AWS Identity and Access Management (IAM) generates long-term credentials for your developers. Understanding how to use these credentials can be confusing, especially for people who are new to AWS; developers often end up reaching out to their administrators for guidance about using their credentials. Today, we’ve updated the My Security Credentials page to help developers discover, create, or modify security credentials for their IAM users on their own. This includes passwords to access the AWS console, access keys for programmatic AWS access, and multi-factor authentication (MFA) devices. By making it easier to discover and learn about AWS security credentials, developers can get started with AWS more quickly.

If you need to create IAM users, you can use the My Security Credentials page to manage long-term credentials. However, as a best practice, AWS recommends relying on temporary credentials using federation when accessing AWS accounts. Federation enables you to use your existing identity provider to access AWS. You can also use AWS Single Sign-On (SSO) to manage your identities and their access to multiple AWS accounts and business applications centrally. In this post, I review the IAM user experience in the AWS Management Console for retrieving and configuring security credentials.

Access your security credentials

When you interact with AWS, you need security credentials to verify who you are and whether you have permissions to access the resources that you’re requesting. For example, you need a user name and password to sign in to the AWS Management Console, and you need access keys to make programmatic calls to AWS API operations.

To access and manage your security credentials, sign into your AWS console as an IAM user, then navigate to your user name in the upper right section of the navigation bar. From the drop-down menu, select My Security Credentials, as shown in Figure 1.
 

Figure 1: How to find the “My Security Credentials” page

Figure 1: How to find the “My Security Credentials” page

The My Security Credentials page includes all your security credentials. As an IAM user, you should navigate to this central location (Figure 2) to manage all your credentials.
 

Figure 2: The “My security credentials” page

Figure 2: The “My security credentials” page

Next, I’ll show you how IAM users can make changes to their AWS console access password, generate access keys, configure MFA devices, and set AWS CodeCommit credentials using the My Security Credentials page.

Change your password for AWS console access

To change your password, navigate to the My Security Credentials page and, under the Password for console access section, select Change password. In this section, you can also see how old your current password is. In the example in Figure 3, my password is 121 days old. This information can help you determine whether you need to change your password. Based on AWS best practices, I need to update mine.
 

Figure 3: Where to find your password’s age

Figure 3: Where to find your password’s age

To update your password, select the Change password button.

Based on the permissions assigned to your IAM user, you might not see the password requirements set by your admin. The image below shows the password requirements that my administrator has set for my AWS account. I can see the password requirements since my IAM user has access to view the password policy.
 

Figure 4: How to change your password

Figure 4: How to change your password

Once you select Change password and the password meets all the requirements, your IAM user’s password will update.

Generate access keys for programmatic access

An access key ID and secret access key are required to sign requests that you make using the AWS Command Line, the AWS SDKs, or direct API calls. If you have created an access key previously, you might have forgotten to save the secret key. In such cases, AWS recommends deleting the existing access key and creating a new one. You can create new access keys from the My Security Credentials page.
 

Figure 5: How to create a new access key

Figure 5: How to create a new access key

To create a new key, select the Create access key button. This generates a new secret access key. This is the only time you can view or download the secret access key. As a security best practice, AWS does not allow retrieval of a secret access key after its initial creation.

Next, select the Download .csv file button (shown in the image below) and save this file in a secure location only accessible to you.
 

Figure 6: Select the “Download .csv file” button

Figure 6: Select the “Download .csv file” button

Note: If you already have the maximum of two access keys—active or inactive—you must delete one before creating a new key.

If you have a reason to believe someone has access to your access and secret keys, then you need to delete them immediately and create new ones. To delete your existing key, you can select Delete next to your access key ID, as shown below. You can learn more about the best practices by visiting best practices to manage access keys.
 

Figure 7: How to delete or suspend a key

Figure 7: How to delete or suspend a key

The Delete access key dialog now shows you the last time your key was used. This information is critical to helping you understand if an existing system is using the access key, and if deleting the key will break something.
 

Figure 8: The “Delete access key” confirmation window

Figure 8: The “Delete access key” confirmation window

Assign MFA devices

As a best practice, AWS recommends enabling multi-factor authentication (MFA) on all IAM users. MFA adds an extra layer of security because it requires users to provide unique authentication from an AWS-supported MFA mechanism in addition to their sign-in credentials when they access AWS. Now, IAM users can assign or view their current MFA settings through the My Security Credentials page.
 

Figure 9: How to view MFA settings

Figure 9: How to view MFA settings

To learn about MFA support in AWS and about configuring MFA devices for an IAM user, please visit Enabling MFA Devices.

Generate AWS CodeCommit credentials

The My Security Credentials page lets you configure Git credentials for AWS CodeCommit, a version control service for privately storing and managing assets such as documents and source code in the cloud. Additionally, to access the CodeCommit repositories without installing CLI, you can set up SSH connection by uploading the SSH public key on the My Security Credentials page, as shown below. To learn more about AWS CodeCommit and the different configuration options, visit the AWS CodeCommit User Guide.
 

Figure 10: How to generate CodeCommit credentials

Figure 10: How to generate CodeCommit credentials

Summary

The My Security Credentials page for IAM users makes it easier to manage and configure security credentials to help developers get up and running in AWS more quickly. To learn more about the security credentials and best practices, read the Identity and Access Management documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Want more AWS Security news? Follow us on Twitter.

The author

Sulay Shah

Sulay is the product manager for Identity and Access Management service at AWS. He strongly believes in the customer first approach and is always looking for new opportunities to assist customers. Outside of work, Sulay enjoys playing soccer and watching movies. Sulay holds a master’s degree in computer science from the North Carolina State University.

How to automate SAML federation to multiple AWS accounts from Microsoft Azure Active Directory

Post Syndicated from Sepehr Samiei original https://aws.amazon.com/blogs/security/how-to-automate-saml-federation-to-multiple-aws-accounts-from-microsoft-azure-active-directory/

You can use federation to centrally manage access to multiple AWS accounts using credentials from your corporate directory. Federation is the practice of establishing trust between a system acting as an identity provider and other systems, often called service providers, that accept authentication tokens from that identity provider. Amazon Web Services (AWS) supports open federation standards, including Security Assertion Markup Language (SAML) 2.0, to make it easier for the systems and service providers to interact. Here, I’m going to explain how to automate federation between AWS Identity and Access Management (IAM) in multiple AWS accounts and Microsoft Azure Active Directory (Azure AD). I’ll be following the same general patterns that allow SAML federation to AWS from any other identity provider that supports SAML 2.0, but I’m also adding some automation that is specific to Azure AD. I’ll show you how to perform the initial configuration, and then how to automatically keep Azure AD in sync with your AWS IAM roles.

AWS supports any SAML 2.0-compliant identity provider. If you’re interested in configuring federated access using an identity provider other than Azure AD, these links might be useful:

In this post, I’m going to focus on the nuances of using Azure AD as a SAML identity provider for AWS. The approach covered here gives you a solution that makes this option easier and adheres to AWS best practices. The primary objectives of this step-by-step walkthrough, along with the accompanying packaged solution, are:

  • Support any number of AWS accounts and roles, making it easier to scale.
  • Keep configuration of both sides updated automatically.
  • Use AWS short-term credentials so you don’t have to store your credentials with your application. This enhances your security posture because these credentials are dynamically generated, securely delivered, naturally expire after their limited lifetime, and are automatically rotated for you.

Solution overview

I’ll discuss:

  • How to configure Microsoft Azure Active Directory and show the steps needed to prepare it for federation with AWS.
  • How to configure AWS IAM Identity Providers and Roles, and explain the steps you need to carry out in your AWS accounts.
  • How to automatically import your AWS configuration into the Azure AD SSO app for AWS.

The following diagram shows the high-level flow of SAML authentication and how your users will be federated into the AWS Management console:
 

Figure 1: SAML federation between Azure AD and AWS

Figure 1: SAML federation between Azure AD and AWS

Key to the interactions in the diagram

  1. User opens a browser and navigates to Azure AD MyApps access panel (myapps.microsoft.com).
  2. If the user isn’t authenticated, she’ll be redirected to the login endpoint for authentication.
  3. User enters her credentials and the login endpoint will verify them against Azure AD tenant.
  4. Upon successful login, user will be redirected back to the access panel.
  5. User will see the list of available applications, including the AWS Console app, and will select the AWS Console app icon.
  6. The access panel redirects the user to the federated application endpoint, passing the application ID of the AWS SSO app.
  7. The AWS SSO application queries Azure AD and generates a SAML assertion, including all the AWS IAM roles assigned to the user.
  8. SAML assertion is sent back to the user.
  9. User is redirected to AWS federation endpoint, presenting the SAML assertion. The AWS federation endpoint verifies the SAML assertion. The user will choose which of their authorized roles they currently want to operate in. Note: If there’s only one role included, the selection is automatic.
  10. The AWS federation endpoint invokes the AssumeRoleWithSAML API of AWS Security Token Service (STS) and exchanges the SAML token with temporary AWS IAM credentials.
  11. Temporary IAM credentials are used to formulate a specific AWS Console URL that’s passed back to the client browser.
  12. User is redirected to AWS Management Console with permissions of the assumed role.

Automated solution components and flow

At the core of this automated solution, there’s a Docker container that runs inside an AWS ECS Fargate task. The container includes a number of PowerShell scripts that iterate through your IAM Roles, find roles that are associated with the Identity Provider of Azure AD, and update the Azure AD SSO app manifest with the necessary values.

The Fargate task is invoked through an AWS Lambda function that’s scheduled through a CloudWatch Rule to run with the frequency you specify during setup.

All of these components require a number of parameters to run correctly, and you provide these parameters through the setup.ps1 script. The setup.ps1 script is run once and acquires all required parameters from you. It then stores these parameters with encryption inside the SSM Parameter Store. Azure credentials are stored in AWS Secrets Manager. This means you could even go another step further and use Secrets Manager lifecycle management capabilities to automatically rotate your Azure credentials. For encryption of Azure credentials, the template creates a new KMS key, exclusive to this application. If you prefer to use an existing key or a Customer Managed Key (CMK), you can modify the CloudFormation template, or simply pass your own key name to the setup.ps1 script.

The following diagram shows all components of the solution:
 

Figure 2: Solution architecture

Figure 2: Solution architecture

  1. You’ll want any ongoing changes in AWS IAM roles to be replicated into Azure AD. Therefore, you need to have the update task run periodically. A CloudWatch Rule triggers an event and an AWS Lambda Function starts running as a result of this event.
  2. The Lambda Function runs an ECS Fargate Task.
  3. The ECS Task is associated with a Task Role with permission to fetch parameters from Systems Manager (SSM) Parameter Store and Secrets Manager. The task will request parameters from SSM PS, and SSM PS decrypts parameter values using the associated key in AWS Key Management Service (KMS). Azure credentials are securely stored in AWS Secrets Manager.
  4. Fargate Task queries AWS Organizations and gets a list of child accounts. It then constructs cross-account role ARNs. The ECS Task then assumes those cross-account roles and iterates through all IAM roles in each account to find those associated with your IdP for Azure AD.
  5. The ECS Task connects to the Azure AD SSO application and retrieves the existing manifest. Notice that, although you manually retrieved the manifest file during setup, it still needs to be fetched again every time to make sure it’s the latest version. The one you manually downloaded is used to retrieve parameters needed for setup, such as the application identifier or entity ID.
  6. ECS Task stores the existing manifest as a backup in a highly-durable S3 bucket. In case anything goes wrong, the last working state of the application manifest is always available in the S3 bucket. These files are stored with the exact time of their retrieval as their file name. You can find the correct version based on the point in time it was retrieved.
  7. The ECS Task generates a new manifest based on your AWS account/roles as inspected in the preceding steps. It uses the Azure AD credentials retrieved from AWS Secrets Manager and uses them to update the Azure AD SSO app with the new manifest. It also creates any required Azure AD Groups according to the specified custom naming convention. This makes it easier for the Azure AD administrator to map Azure AD users to AWS roles and entitle them to assume those roles.

Prerequisites

To start, download a copy of the sample code package.

You must have AWS Organizations enabled on all of your accounts to take advantage of this solution’s automation. Using AWS Organizations, you can configure one of your accounts as the root account and all other accounts will join your organization as child accounts. The root account will be trusted by all child accounts, so you can manage your child account resources from your root account. This trust is enabled using a role in each of your child accounts. AWS Organizations creates a default role with full permissions on child accounts that are directly created using AWS Organizations. Best practice is to delete this default role and create one with privileges restricted to your requirements. A sample role, named AWSCloudFormationStackSetExecutionRole, is included in cross-account-role-cfn.json
of my code package. You should modify this template based on your requirements.

Setup steps

In following sections, I’ll show the steps to setup federation and deploy the automation package. First, I’ll show the steps to prepare Azure Active Directory for federation. After that, you’ll see how you can configure all of your AWS accounts from a central place, regardless of the number of your accounts. The last step is to deploy the automation package in your master AWS account to automatically handle ongoing changes as you go.

Step 1: Configure Microsoft Azure Active Directory

You need to create two resources on your Azure AD tenant: a User and an Enterprise Application.

First thing you need for accessing Azure AD is an Azure AD user. In following the principle of least privilege, you want a user that can only manipulate the SSO application. Azure AD users with the directory role of User will only have access to resources they “own.” Therefore, you can create a new user specifically for this purpose and assign it as the owner of your SSO app. This user will be used by the automation to access Azure AD and update the SSO app.

Here’s how you can create a user with the directory role of User (default):

  1. Open Azure Portal.
  2. Open Azure Active Directory.
  3. In the left pane, select Users.
  4. In the Manage pane, select All users.
  5. Select New user.
  6. Enter values for the Name and User name fields.
  7. Select the Show Password box and note the auto-generated password for this user. You will need it when you change the password.
  8. Select Create.
  9. Open a browser window and go to https://login.microsoftonline.com.
  10. Log in with the new user. You’ll be prompted to change your password. Note the new password so you don’t forget it.

Next, create an Enterprise Application from the Azure AD application gallery:

  1. Open Azure Portal.
  2. Open Azure Active Directory.
  3. In the Manage pane, select Enterprise applications.
  4. Select New application.
  5. In the gallery text box, type AWS.
  6. You’ll see an option with the name Amazon Web Services (AWS). Select that application. Make sure you don’t choose the other option with the name “AWS Console.” That option uses an alternate integration method that isn’t relevant to this post.
  7.  

    Figure 3: Select "Amazon Web Services (AWS)

    Figure 3: Select “Amazon Web Services (AWS)

  8. Select Add. You can change the name to any name you would prefer.
  9. Open the application using this path: Azure Portal > Azure Active Directory > Enterprise Applications > All Applications > your application name (for example, “Amazon Web Services (AWS)”).
  10. From left pane, select Single Sign-on, and then set Single Sign-on mode to SAML-based Sign-on.
  11. The first instance of the app is pre-integrated with Azure AD and requires no mandatory URL settings. However, if you previously created a similar application, you’ll see this:
  12.  

    Figure 4: Azure AD Application Identifier

    Figure 4: Azure AD Application Identifier

  13. If you see the red “Required” value in the Identifier field, select the Edit button and enter a value for it. This can be any value you prefer (the default is https://signin.aws.amazon.com/saml), but it has to be unique within your Azure AD tenant. If you don’t see the Identifier field, it means it’s already prepopulated and you can proceed with the default value. However, if for any reason you prefer to have a custom Identifier value, you can select the Show advanced URL settings checkbox and enter the preferred value.
  14. In the User Attributes section, select the Edit button.
  15. You need to tell Azure AD what SAML attributes and values are expected and accepted on the AWS side. AWS requires two mandatory attributes in any incoming SAML assertion. The Role attribute defines which roles the federated user is allowed to assume. The RoleSessionName attribute defines the specific, traceable attribute for the user that will appear in AWS CloudTrail logs. Role and RoleSessionName are mandatory attributes. You can also use the optional attribute of SessionDuration to specify how long each session will be valid until the user is requested to get a new token. Add the following attributes to the User Attributes & Claims section in the Azure AD SSO application. You can also remove existing default attributes, if you want, because they’ll be ignored by AWS:

    Name (case-sensitive) Value Namespace (case-sensitive) Required or optional?
    RoleSessionName user.userprincipalname
    (this will show logged in user ID in AWS portal, if you want user name, replace it with user.displayName)
    https://aws.amazon.com/SAML/Attributes Required
    Role user.assignedroles https://aws.amazon.com/SAML/Attributes Required
    SessionDuration An integer between 900 seconds (15 minutes) and 43200 seconds (12 hours). https://aws.amazon.com/SAML/Attributes Optional

    Note: I assume that you use users that are directly created within your Azure AD tenant. If you’re using an external user such as a Hotmail, Live, or Gmail account for proof-of-concept purposes, RoleSessionName should be set to user.mail instead.

  16. As a good practice, when it approaches its expiration date, you can rotate your SAML certificate. For this purpose, Azure AD allows you to create additional certificates, but only one certificate can be active at a time. In the SAML Signing Certificate section, make sure the status of this certificate is Active, and then select Federation Metadata XML to download the XML document.
  17. Download the Metadata XML file and save it in the setup directory of the package you downloaded in the beginning of this walkthrough. Make sure you save it with file extension of .xml.
  18. Open Azure Portal > Azure Active Directory > App Registrations > your application name (for example, “Amazon Web Services (AWS)”). If you don’t see your application in the list on the App Registrations page, select All apps from the drop-down list on top of that page and search for it.
  19. Select Manifest. All Azure AD applications are described as a JavaScript Object Notification (JSON) document called manifest. For AWS, this manifest defines all AWS to Azure AD role mappings. Later, we’ll be using automation to generate updates to this file.
     
    Figure 5: Azure AD Application Manifest

    Figure 5: Azure AD Application Manifest

  20. Select Download to download the app manifest JSON file. Save it in the setup directory of the package you downloaded in the beginning of this walkthrough. Make sure you save it with file extension of .json.
  21. Now, back on your registered app, select Settings.
  22. In the Settings pane, select Owners.
     
    Figure 6: Application Owner

    Figure 6: Application Owner

  23. Select Add owner and add the user you created previously as owner of this application. Adding the Azure AD user as owner enables the user to manipulate this object. Since this application is the only Azure AD resource owned by our user, it means we’re enforcing the principle of least privilege on Azure AD side.

At this point, we’re done with the initial configuration of Azure AD. All remaining steps will be performed in your AWS accounts.

Step 2: Configure AWS IAM Identity Providers and Roles

In the previous section, I showed how to configure the Azure AD side represented in the Solution architecture in Figure 1. This section explains the AWS side.

As seen in Figure 1, enabling SAML federation in any AWS account requires two types of AWS IAM resources:

You’ll have to create these two resources in all of your AWS accounts participating in SAML federation. There are various options for doing this. You can:

  • Manually create IAM IdP and Roles using AWS Management Console. For one or two accounts, this might be the easiest way. But as the number of your AWS accounts and roles increase, this method becomes more difficult.
  • Use AWS CLI or AWS Tools for PowerShell. You can use these tools to write automation scripts and simplify both creation and maintenance of your roles.
  • Use AWS CloudFormation. CloudFormation templates enable structured definition of all resources and minimize the effort required to create and maintain them.

Here, I’m going to use CloudFormation and show how it can help you create up to thousands of roles in your organization, if you need that many.

Managing multiple AWS accounts from a root account

AWS CloudFormation simplifies provisioning and management on AWS. You can create templates for the service or application architectures you want and have AWS CloudFormation use those templates for quick and reliable provisioning of the services or applications (called “stacks“). You can also easily update or replicate the stacks as needed. Each stack is deployed in a single AWS account and a specific AWS Region. For example, you can write a template that defines your organization roles in AWS IAM and deploy it in your first AWS account and US East (N.Virginia) region.

But if you have hundreds of accounts, it wouldn’t be easy, and if you have time or budget constraints, sometimes not even possible to manually deploy your template in all accounts. Ideally, you’d want to manage all your accounts from a central place. AWS Organizations is the service that gives you this capability.

In my GitHub package there is a CloudFormation template named cross-account-roles-cfn.json. It’s located under the cfn directory. This template includes two cross-account roles. The first one is a role for cross-account access with the minimum required privileges for this solution that trusts your AWS Organizations master account. This role is used to deploy AWS IAM Identity Provider (IdP) for Azure AD and all SAML federation roles, trusting that IdP within all of your AWS child accounts. The second one is used by the automation to inspect your AWS accounts (through describe calls) and keep the Azure AD SSO application updated. I’ve created two roles to ensure that each component executes with the least privilege required. To recap, you’ll have two cross account roles for two different purposes:

  1. A role with full IAM access and Lambda execution permissions. This one is used for creation and maintenance of SAML IdP and associated IAM roles in all accounts.
  2. A role with IAM read-only access. This one is used by the update task to read and detect any changes in your federation IAM roles so it can update Azure AD SSO app with those changes.

You can deploy CloudFormation templates in your child accounts using CloudFormation StackSets. Log in to your root account, go to the CloudFormation console, and select StackSets.

Select Template is ready, select Upload a template file, and then select the cross-account-roles-cfn.json template to deploy it in all of your accounts. AWS IAM is a global service, so it makes no difference which region you choose for this template. You can select any region, such as us-east-1.
 

Figure 7: Upload template to StackSets console

Figure 7: Upload template to StackSets console

This template includes a parameter prompting you to enter root account number. For instructions to find your account number, see this page.

If you create your child accounts through AWS Organizations, you’ll be able to directly deploy StackSets in those child accounts. But, if you add existing accounts to you organization, you have to first manually deploy
cross-account-roles-cfn.json in your existing accounts. This template includes the IAM role and policies needed to enable your root account to execute StackSets on it.

Configure the SAML Identity Provider and Roles

A sample template to create your organization roles as SAML federation IAM roles is included in the saml-roles.json file in the same cfn directory. This template includes the SAML IdP and three sample roles trusting that IdP. The IdP is implemented as an AWS Lambda-backed CloudFormation custom resource. Included roles are samples using AWS IAM Job Functions for Administrator, Observer, and DBA. Modify this template by adding or removing roles as needed in your organization.

If you need different roles in some of your accounts, you’ll have to create separate copies of this template and modify them accordingly. From the CloudFormation StackSets console, you can choose the accounts to which your template should be deployed.

The last modification to make is on the IdentityProvider custom resource. It includes a <Metadata> property. Its value is defined as <MetadataDocument>. You’d have to replace the value with the content of the SAML certificate metadata XML document that you previously saved in the setup directory (see the Configure Microsoft Azure Active Directory section above). You’ll need to escape all of the quotation marks (“) in the XML string with a backslash (\). If you don’t want to do this manually, you can copy the saml-roles.json template file in the setup directory and as you follow the remainder of instructions in this post, my setup script will do that for you.

Step 3: Updating Azure AD from the root AWS account

The third and last template in the cfn directory is setup-env-cfn-template.json. You have to deploy this template only in your root account. This template creates all the components in your root account, as shown in Figure 8. These are resources needed to run the update task and keep Azure AD SSO App updated with your IAM roles. In addition, it also creates a temporary EC2 instance for initial configuration of that update task. The update task uses AWS Fargate, a serverless service that allows you to run Docker containers in AWS. You have to deploy the setup-env-cfn-template.json template in a region where Fargate is available. Check the AWS Region Table to make sure Fargate is available in your target region. Follow these steps to deploy the stack:

  1. Log in to your root account and open the CloudFormation console page.
  2. Select Create Stack, upload the setup-env-cfn-template.json file, and then select Next.
  3. Enter a stack name, such as aws-iam-aad. The stack name must be all lowercase letters. The template uses the stack name to create an S3 bucket, and because S3 does not allow capital letters, if you choose a stack name containing capital letters, the stack creation will fail. The stack name is also used as the appName parameter in all scripts, and all Parameter Store parameter names are prefixed with it.
  4. Enter and select values for the following parameters:
    1. azureADTenantName: You can get the Azure Active Directory Tenant Name from Azure Portal. Go to the Azure Active Directory Overview page and the tenant name should appear at the top of the page. During setup, this is used as the value for the parameter.
    2. ExecFrequency is the time period for the update task to run. For example, if you enter 30, every 30 minutes Azure AD will be updated with any changes in IAM roles of your AWS accounts.
    3. KeyName is a key pair that is used for login and accessing the EC2 instance. You’ll need to have a key pair created before deploying this template. To create a key pair, follow these instructions: Amazon EC2 Key Pairs. Also, for more convenience, if you’re using a MAC or Linux, you can copy your private key in the setup directory. Don’t forget to run chmod 600 <key name> to change the permissions on the key.
    4. NamingConvention is used to map AWS IAM roles to Azure AD roles. The default naming convention is: “AWS {0} – {1}”. The value of {0} is your account number. The value of {1} is the name of your IAM Role.
    5. SSHLocation is used in a Security Group that restricts access to the setup EC2 instance. You only need this instance for initial setup; therefore, the best practice and most secure option is to change this value to your specific IP address. In any case, make sure you only allow access to your internal IP address range.
    6. Subnet is the VPC subnet in which you want the update task to run. This subnet must have egress (outgoing) internet connectivity. The update task needs this to reach Azure AD Graph API endpoints.
       
      Figure 8: Enter parameters for automation stack

      Figure 8: Enter parameters for automation stack

Once you deploy this template in CloudFormation and the associated stack is successfully created, you can get the IP address of the setup EC2 instance from the Output tab in CloudFormation. Now, follow the steps below to complete the setup.

Note: At this point, in addition to all the files already included in the original package, you have two additional, modified files in the setup directory:

  • The SAML Certificate XML file from Azure AD
  • The App Manifest JSON file from Azure AD

Make sure you have following information handy. This info is required in some of the steps:

Now, follow these steps to complete the setup:

  1. If you’re using Mac, Linux, or UNIX, run the initiate_setup.sh script in the setup directory and, when prompted, provide the IP address from the previous procedure. It will copy all the required files to the target setup EC2 instance and automatically take you to the setup.ps1 script. Now, skip to step 3 below.
  2. If you’re using Windows on your local computer, use your favorite tool (such as WinSCP) to copy both setup and docker directories from your local computer to the /home/ec2-user/scripts directory on the target EC2 instance.
  3. Once copied, use your favorite SSH tool to log in to the target setup EC2 instance. For example, you can use PuTTY for this purpose. As soon as you log in, Setup.ps1 will automatically run for you.
  4. Setup.ps1 is interactive. It will prompt for the path to the three files you saved in the setup directory, and also for your Azure AD user credentials. Use the credentials of the user you created in step 1 of the Configure Microsoft Azure Active Directory section. The script will perform following tasks:
    1. Store Azure AD credentials securely in AWS Secrets Manager. The script also extracts necessary values out of the three input files and stores them as additional parameters in AWS Systems Manager (SSM) Parameter Store.

      Important: The credentials of your Azure user will be stored in AWS Secrets Manager. You must make sure that access to Secrets Manager is restricted to users who are also authorized to retrieve these credentials.

    2. Create a Docker image and push it into an AWS Elastic Container Registry (ECR) repository that’s created as part of the CloudFormation template.
    3. The script checks if saml-roles.json is available in setup directory. If it’s available, the script will replace the value of the Metadata property in the IdP custom resource with content of the SAML metadata XML file. It also generates a text file containing a comma-separated list of all your child accounts, extracting account numbers from cross-account-roles-cfn.json. Both of these are copied to the S3 bucket that is created as part of the template. You can use these at any time to deploy, maintain, and manage your SAML roles in child accounts using CloudFormation StackSets.
    4. If saml-roles.json is available, the script will prompt whether you want it to deploy your roles on your behalf. If you select yes (“y“), it will immediately deploy the template in all child accounts. You can also select no (“n“), if you prefer to do this at another time, or if you need different templates and roles in some accounts.
  5. Once the script executes and successfully completes, you should terminate the setup EC2 instance.

You’ve now completed setting up federation on both sides. All AWS IAM roles that trust an IdP with the SAML certificate of your Azure AD (the Metadata XML file) will now automatically be replicated into your Azure AD tenant. This will take place with the frequency you have defined. Therefore, if you have set the ExecFrequency parameter to “30“, after 30 minutes you’ll see the roles replicated in Azure AD.

But to enable your users to use this federation, you have to entitle them to assume roles, which is what I’ll cover in the next section.

Entitling Azure AD users to assume AWS Roles

  1. Open Azure Portal > Azure Active Directory >
    Enterprise applications > All applications > (your application name) > Users and groups.
  2. Select Add user.
  3. In the Users and groups pane, select one of your Azure AD users (or groups), and then select Select.
  4. Select the Select role pane and, on the right hand side, you should now see your AWS roles listed.

You can add and map Azure AD users or groups to AWS IAM roles this way. By adding users to these groups, you’re giving them access to those roles in AWS through pre-established trust. In the case of Groups, any Azure AD users inside that Group will have SSO access to the AWS Console and permitted to assume AWS roles/accounts associated with their Azure AD Group. Azure AD users who are authenticated against login.microsoftonline.com can go to their Access Panel (myapps.microsoft.com) and select the AWS app icon.

Application maintenance

Most of the time, you will not need to do anything else because the Fargate task will execute on each interval and keep the Azure AD manifest aligned with your AWS accounts and roles. However, there are two situations that might require you to take action:

  • If you rotate your Azure AD SAML certificate
  • If you rotate the Azure user credentials used for synchronization

You can use AWS Secrets Manager lifecycle management capabilities to automate the process for the second case. Otherwise, in the event of either of these two situations, you can modify the corresponding values using the AWS Systems Manager Parameter Store and Secrets Manager consoles. Open the Parameter Store console and find parameters having names prefixed with your setup-env-cfn-template.json stack name (you entered this name when you were creating the stack). In case you rotate your Azure AD SAML certificate, you should also update all of your IdP resources in AWS accounts to use the new resource. Here again, StackSets can do the heavy-lifting for you. Use the same saml-roles.json template to update all of your Stack Instances through CloudFormation. You’ll have to replace the Metadata property value with content of the new certificate, and replace quotation mark characters (“) with escaped quotes (\”).

Summary

I’ve demonstrated how to set up and configure SAML federation and SSO using Azure Active Directory to AWS Console following these principles and requirements:

  • Using security best practices to keep both sides of federation (AWS and Azure) secure
  • Saving time and effort by automating the manual effort needed to synchronize two sides of federation
  • Keeping operation cost to a minimum through a serverless solution

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread in the forums.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Sepehr Samiei

Sepehr is currently a Senior Microsoft Tech Specialized Solutions Architect at AWS. He started his professional career as a .Net developer, which continued for more than 10 years. Early on, he quickly became a fan of cloud computing and he loves to help customers utilise the power of Microsoft tech on AWS. His wife and daughter are the most precious parts of his life.

Automate analyzing your permissions using IAM access advisor APIs

Post Syndicated from Ujjwal Pugalia original https://aws.amazon.com/blogs/security/automate-analyzing-permissions-using-iam-access-advisor/

As an administrator that grants access to AWS, you might want to enable your developers to get started with AWS quickly by granting them broad access. However, as your developers gain experience and your applications stabilize, you want to limit permissions to only what they need. To do this, access advisor will determine the permissions your developers have used by analyzing the last timestamp when an IAM entity (for example, a user, role, or group) accessed an AWS service. This information helps you audit service access, remove unnecessary permissions, and set appropriate permissions across different environments. For example, you can grant broad access to services in development accounts and then reduce permissions for access to specific services in production accounts. Finally, as you manage more IAM entities and AWS accounts, you need a way to scale these processes through automation. To help you achieve this automation, you can now use IAM access advisor APIs with the AWS Command Line Interface (AWS CLI) or a programmatic client.

In this post, I first provide the details of the access advisor APIs. Next, I walk through an example to demonstrate how you can use the AWS CLI to create a report of the last-accessed timestamps for the services used by the roles in your account. In this post, I assume that you’re familiar with access advisor and how to Remove Unnecessary Permissions in Your IAM Policies by Using Service Last Accessed Data from the IAM console. Before I share an example, I’ll describe the new IAM access advisor APIs:

  • generate-service-last-accessed-details: Generates the service last accessed data for an IAM resource (user, role, group, or policy). You need to call this API first to start a job that generates the service last accessed data for the IAM resource. This API returns a JobId that you will use for the other APIs, such as get-service-last-accessed-details, to determine the status of the job completion.
  • get-service-last-accessed-details: Use this to retrieve the service last accessed data for an IAM resource based on the JobID you pass in.
  • get-service-last-accessed-details-with-entities: Use this to retrieve the service last accessed data for a specific AWS service. The API provides you with a list of all the IAM entities who have access to the service and includes the last accessed date for each IAM entity.
  • list-policies-granting-service-access: Use this to retrieve all the IAM policies that grant permissions to the services accessed for an IAM entity. This helps you identify the policies you need to modify to remove any unused permissions.

Now that you understand the different IAM access advisor APIs, I’ll walk through an example to demonstrate how to use them to set permissions based on service last accessed information.

Example use case: Setting permissions for IAM roles

Assume Arnav Desai is a security administrator for Example Corp. He works with several development teams and monitors their access across multiple accounts. To get his development teams up and running quickly, he initially created multiple roles with broad permissions that are based on job function in the development accounts. Now, his developers are ready to deploy workloads to production accounts. The developers need access to configure AWS, however, Arnav only wants to grant them access to what they need. To determine these permissions, he uses access advisor APIs to automate a process that helps him understand the services developers accessed in the last six months. Using this information, he authors policies to grant access to specific services in production. I’ll now show you an example to achieve this in one account using AWS CLI commands.

First, Arnav uses the list-roles command to list the IAM roles in his development account. For this example, there are two roles in his development account: DBAdminRole and NetworkAdminRole.

For each role, he uses the generate-service-last-accessed-details command to generate the service last accessed data for the role. Here’s an example of the command that he uses:


aws iam generate-service-last-accessed-details --arn arn:aws:iam::123456789012:role/DBAdminRole

The command above provides Arnav with a JobId for each role signaling that the job has started generating the service last accessed details. Arnav waits for the job to complete successfully to retrieve the access advisor information. In the meantime, he can call the get-service-last-accessed-details command to view the JobStatus of the job. Once the jobs for both roles are COMPLETED, Arnav can view the service last accessed report for both the roles, as shown below.

DBAdminRole


"ServicesLastAccessed": [
        {
            "LastAuthenticated": "2018-11-01T17:41:15Z",
            "LastAuthenticatedEntity": "arn:aws:iam::123456789012:role/ DBAdminRole",
            "ServiceName": "Amazon DynamoDB",
            "ServiceNamespace": "dynamodb",
            "TotalAuthenticatedEntities": 1
        },
        {
            "LastAuthenticated": "2018-08-25T17:41:15Z",
            "LastAuthenticatedEntity": "arn:aws:iam::123456789012:role/ DBAdminRole",
            "ServiceName": "Amazon S3",
            "ServiceNamespace": "s3",
            "TotalAuthenticatedEntities": 1
        },
	.
	.
	.
    ]

Note: I’ve truncated the output because the DBAdminRole doesn’t access other services.

NetworkAdminRole


"ServicesLastAccessed": [
        {
            "LastAuthenticated": "2018-11-21T17:41:15Z",
            "LastAuthenticatedEntity": "arn:aws:iam::123456789012:role/ NetworkAdminRole",
            "ServiceName": "Amazon EC2",
            "ServiceNamespace": "ec2",
            "TotalAuthenticatedEntities": 1
        },
	.
	.
	.
    ]

Note: I’ve truncated the output because the NetworkAdminRole doesn’t access other services.

Based on the output above, you can see that the two roles in development accessed Amazon DynamoDB, Amazon S3, and Amazon EC2 in the last six months. Using this information, Arnav can author a policy to grant access to these specific services for the production accounts.

Conclusion

In this post, I reviewed IAM access advisor APIs and shown how you can use them to determine service last accessed information programmatically. You can use this information to audit access, removed unused permissions, or grant appropriate permissions across your accounts.

If you have comments about retrieving Access Advisor service last accessed information programmatically, submit them in the Comments section below. If you have issues using access advisor commands, start a thread on the IAM forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Ujjwal Pugalia

Ujjwal is the product manager for the console sign-in and sign-up experience at AWS. He enjoys working in the customer-centric environment at Amazon because it aligns with his prior experience building an enterprise marketplace. Outside of work, Ujjwal enjoys watching crime dramas on Netflix. He holds an MBA from Carnegie Mellon University (CMU) in Pittsburgh.

Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

Post Syndicated from David Bailey original https://aws.amazon.com/blogs/architecture/stream-amazon-cloudwatch-logs-to-a-centralized-account-for-audit-and-analysis/

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, I will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

The streaming architecture I use in the destination logging account is a streamlined version of the architecture and AWS CloudFormation templates from the Central logging in Multi-Account Environments blog post by Mahmoud Matouk. This blog post assumes some knowledge of CloudFormation, Python3 and the boto3 AWS SDK. You will need to have or configure an AWS working account and logging account, an IAM access and secret key for those accounts, and a working environment containing Python and the boto3 SDK. (For assistance, see the Getting Started Resource Center and Start Building with SDKs and Tools.) All CloudFormation templates and Python code used in this article can be found in this GitHub Repository.

Setting Up the Solution

You need to create or use an existing S3 bucket for storing CloudFormation templates and Python code for an AWS Lambda function. This S3 bucket is referred to throughout the blog post as the <S3 infrastructure-bucket>. Ensure that the bucket does not block new bucket policies or cross-account access by checking the bucket’s Permissions tab and the Public access settings button.

You also need a bucket policy that allows each account that needs to stream logs to access it when we create the AWS Lambda function below. To do so, update your bucket policy to include each new account you create and the <S3 infrastructure-bucket> ARN from the top of the Bucket policy editor page to modify this template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                  "03XXXXXXXX85",
                  "29XXXXXXXX02",
                  "13XXXXXXXX96",
                  "37XXXXXXXX30",
                  "86XXXXXXXX95"
                ]
            },
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::<S3 infrastructure-bucket>",
                "arn:aws:s3:::<S3 infrastructure-bucket>/*"
            ]
        }
    ]
}

Clone a local copy of the CloudFormation templates and Python code from the GitHub repository. Compress the CentralLogging.py and lambda.py into a .zip file for the lambda function we create below and name it AddSubscriptionFilter.zip. Load these local files into the <S3 infrastructure-bucket>. I recommend using folders called /python for the .py files, /lambdas for the AddSubscriptionFilter.zip file and /cfn for the CloudFormation templates.

Multi-Account Configuration and the Central Logging Account

One form of multi-account configuration is the Landing Zone offering, which provides a core logging account for storing all logs for auditing. I use this account configuration as an example in this blog post. Initially, the Landing Zone setup creates several stack sets and resources, including roles, security groups, alarms, lambda functions, a cloud trail stream and an S3 bucket.

If you are not using a Landing Zone, create an appropriately named S3 bucket in the account you have chosen as a logging account. This S3 bucket will be referred to later as the <LoggingS3Bucket>. To mimic what the Landing Zone calls its logging bucket, you can use the format aws-landing-zone-logs-<Account Number><Region>, or simply pick an appropriate name for the centralized logging location. In a production environment, remember that it is critical to lock down the access to logging resources and the permissions allowed within the account to prevent deletion or tampering with the logs.

Figure 1 - Initial Landing Zone logging account resources

Figure 1 – Initial Landing Zone logging account resources

The S3 bucket – aws-landing-zone-logs-<Account Number><Region> is the most important resource created by the stack-sets for logging purposes. It contains all of the logs streamed to it from all of the accounts. Initially, the Landing Zone only sends the AWS CloudTrail and AWS Config logs to this S3 bucket.

In order to send all of the other CloudWatch Logs that are necessary for auditing, we need to add a destination and streaming mechanism to the logging account.

Logging Account Insfrastructure

The additional infrastructure required in the central logging account provides a destination for the log group subscription filters and a stream for log events that are sent from all accounts and appropriate regions to load them into the <LoggingS3Bucket> repository. The selection of these particular AWS resources is important, because Kinesis Data Streams is the only resource currently supported as a destination for cross-account CloudWatch Logs subscription filters.

The centralLogging.yml CloudFormation template automates the creation of the entire required infrastructure in the core logging account. Make sure to run it in each of the regions in which you need to centralize logs. The log group subscription filter and destination regions must match in order to successfully stream the logs.

Installation Instructions:

  1. Modify the centralLogging.yml template to add your account numbers for all of the accounts you want to stream logs from into the DestinationPolicy where you see the <AccountNumberHere> placeholders. Remove any unused placeholders.
  2. In the same DestinationPolicy, modify the final arn statement, replacing <region> with the region it will be run in (e.g., us-east-1), and the <logging account number> with the account number of the logging account where this template is to be run.
  3. Log in to the core logging account and access the AWS management console using administrator credentials.
  4. Navigate to CloudFormation and click the Create Stack button.
  5. Select Specify an Amazon S3 template URL and enter the Link for the centralLogging.yml template found in the <S3 infrastructure-bucket>.
  6. Enter a stack name, such as CentralizedLogging, and the one parameter called LoggingS3Bucket. Enter in the ARN of the logging bucket: arn:aws:s3::: <LoggingS3Bucket>. This can be obtained by opening the S3 console, clicking on the bucket icon next to this bucket, and then clicking the Copy Bucket ARN button.
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.
  8. When the stack completes, select the stack name to go to stack details and open the Outputs. Copy the value of the DestinationArnExport, which will be needed as a parameter for the script in the next section.

Upon successful creation of this CloudFormation stack, the following new resources will be created:

  • Amazon CloudWatch Logs Destination
  • Amazon Kinesis Stream
  • Amazon Kinesis Firehose Stream
  • Two AWS Identity and Access Management (IAM) Roles
Figure 2 - New infrastructure required in the centralized logging account

Figure 2 – New infrastructure required in the centralized logging account

Because the Landing Zone is a multi-account offering, the Log Destination is required to be the destination for all subscription filters. The key feature of the destination is its DestinationPolicy. Whenever a new account is added to the environment, its account number needs to be added to this DestinationPolicy in order for logs to be sent to it from the new account. Add the new account number in the centralLogging.yml CloudFormation template, and run an update in CloudFormation to complete the addition. A sample Destination Policy looks like this:

{
  "Version" : "2012-10-17",
  "Statement" : [
    {
      "Effect" : "Allow",
      "Principal" : {
        "AWS" : [
          "03XXXXXXXX85",
          "29XXXXXXXX02",
          "13XXXXXXXX96",
          "37XXXXXXXX30",
          "86XXXXXXXX95"
        ]
      },
      "Action" : "logs:PutSubscriptionFilter",
      "Resource" : "arn:aws:logs:<Region>:<LoggingAccountNumber>:destination:CentralLogDestination"
    }
  ]
}

The Kinesis Stream get records from the Logs Destination and holds them for 48 hours. Kinesis Streams scale by adding shards. The CloudFormation template starts the stream with two shards. You need to monitor this as instances and applications are deployed into the accounts, however, because all CloudWatch log objects will flow through this stream, and it will need to be scaled up at some point. To scale, change the number of shards (ShardCount) in the Kinesis Stream resource (KinesisLoggingStream) to the required number. See the Amazon Kinesis Data Streams FAQ documentation to confirm the capacity and throughput of each shard.

Kinesis Firehose provides a simple and efficient mechanism to retrieve the records from the Kinesis Stream and load them into the <LoggingS3Bucket> repository. It uses the CloudFormation template parameter to know where to load the logs. All of the CloudWatch logs loaded by Firehose will be under the prefix /CentralizedAccountsLog. The buffering hints for Firehose suggest that the logs be loaded every 5 minutes or 50 MB. Leave the CompressionFormat UNCOMPRESSED, since the logs are already compressed.

There are two AWS Identity and Access Management (IAM) roles created for this infrastructure. The first, CWLtoKinesisRole is used by the destination to allow CloudWatch Logs from all regions to use the destination to put the log object records into the Kinesis Stream, as well as to pass the role. The second, FirehoseDeliveryRole, allows Firehose to get the log object records from the Kinesis Stream, and then to load them into S3 logging bucket.

Once you have successfully created this infrastructure, the next step is to add the subscription filters to existing log groups.

Adding Subscription Filters to Existing Log Groups

The next step in the process is to add subscription filters for the Log Destination in the core logging account to all existing log groups. Several log groups are created by the Landing Zone, or you may have created them by using various AWS services or by logging application events. For every new AWS account, you will need to run the init_account_central_logging.py Python script to add the subscription filters to all the existing log groups.

The init_account_central_logging.py script takes one parameter, which is the Log Destination ARN. Use the Destination ARN you copied from the stack details output in the previous section as the parameter to the script.

The init_account_central_logging.py script first adds this Destination ARN to the AWS Systems Manager Parameter Store so that the core logic that creates the subscription filter can use it. The script then gets a list of all existing log groups, iterates over them, deletes any existing subscription filters (because there can only be one subscription filter per log group and attempting to create another would cause an error), and then adds the new subscription filter to the centralized logging account to the Log Destination.

Figure 3 - Run script to add subscription filters to existing log groups

Figure 3 – Run script to add subscription filters to existing log groups

Installation Instructions:

  1. Make sure that Python and boto3 are installed and accessible in the client computer – consider loading into a virtual environment to keep dependencies separate.
  2. Set the AWS_PROFILE environment variable to the appropriate AWS account profile.
  3. Log in to the proper account, and obtain administrator or other credentials with appropriate permissions, and add the account access key and secret key to the AWS credentials file.
  4. Set the region and output in the AWS config file.
  5. Download and place two python files into a working directory: init_account_central_logging.py and CentralLogging.py.
  6. Run the script using the command python3 ./init_account_central_logging.py -d <LogDestinationArn>.

Use the AWS Management Console to validate the results. Navigate to CloudWatch Logs and view all of the log groups. Each one should now have a subscription filter named “Logs (CentralLogDestination).”

Automatically Adding Subscription Filters to New Log Groups

The final step to set up the centralized log streaming capability is to run a CloudFormation script to create resources that automatically add subscription filters to new log groups. New log groups are created in accounts by resources (e.g., Lambda functions) and by applications. A subscription filter must be added to every new log group in order to deliver its log events to the logging account,

The AddSubscriptionFilter.yml CloudFormation template contains resources to automatically add subscription filters.

First, it creates a role that allows it to access the lambda code that is stored in a centralized location – the <S3 infrastructure-bucket>. (Remember that its S3 bucket policy must contain this account number in order to access the lambda code.)

Second, the template creates the AddSubscriptionLambda, which reuses the core logic shared by the script in the last section. It retrieves the proper destination from the Parameter Store, deletes any existing subscription filter from the log group, and adds the new subscription filter to the newly created log group. This lambda function is triggered by a CloudWatch event rule.

Third, the CloudFormation creates a Lambda Permission, which allows the event trigger to invoke this particular lambda.

Finally, the CloudFormation template creates an Amazon CloudWatch Events Rule that acts as a trigger for the lambda. This rule looks for an event coming from CloudTrail that signals the creation of a new log group. For each create log group event found, it invokes the AddSubscriptionLambda.

Figure 4 - Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Figure 4 – Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Installation Instructions:

(Important note: This functionality requires that the LogDestination parameter be properly set to the LogDestinationArn in the Parameter Store before the Lambda will run successfully. The script in the previous step sets this parameter, or it can be done manually. Make certain that the destination specified is in this same region.)

  1. Ensure that the <S3 infrastructure-bucket> has the AddSubscriptionFilter.zip file containing the Python code files lambda.py and CentralLogging.py.
  2. Log in to the appropriate account, and access using administrator credentials. Make sure that the region is set properly.
  3. Navigate to Cloudformation and click the Create Stack button.
  4. Select Specify an Amazon S3 template URL and enter the Link for the AddSubscriptionFilter.yml template found in <S3 infrastructure-bucket>
  5. Enter a stack name, such as AddSubscription.
  6. Enter the two parameters, the <S3 infrastructure-bucket> name (not ARN) and the folder and file name (e.g., lambdas/AddSubscriptionFilter.zip)
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.

In order to test that the automated addition of subscription filters is working properly, use the AWS Management Console to navigate to CloudWatch Logs and click the Actions button. Select Create New Log Group and enter a random log group name, such as “testLogGroup.” When first created, the log group will not have a subscription filter. After a few minutes, refresh the display and you should see the new subscription filter on the log group. At this point, you can delete the test log group.

New Account Setup

As a reminder, when you add new accounts that you want to have stream log events to the central logging account, you will need to configure the new accounts in two places in order for this functionality to work properly.

First, add the account number to the LoggingDestination property DestinationPolicy in the centralLogging.yml template. Then, update the CloudFormation stack.

Second, modify the bucket policy for the <S3 infrastructure-bucket>. Select the Permissions tab, then the Bucket Policy button. Add the new account to allow cross-account access to the lambda code by adding the line “arn:aws:iam::<new account number>:root” to the Principal.AWS list.

Conclusion

Centralized logging is a key component in enterprise multi-account architectures. In this blog post, I have built on the central logging in multi-account environments streaming architecture to automatically subscribe all CloudWatch Logs log groups to send all log events to an S3 bucket in a designated logging account. The solution uses a script to add subscription filters to existing log groups, and a lambda function to automatically place a subscription filter on all new log groups created within the account. This can be used to forward application logs, security logs, VPC flow logs, or any other important logs that are required for audit, security, or compliance purposes.

About the author

David BaileyDavid Bailey is a Cloud Infrastructure Architect with AWS Professional Services specializing in serverless application architecture, IoT, and artificial intelligence. He has spent decades architecting and developing complex custom software applications, as well as teaching internationally on object-oriented design, expert systems, and neural networks.

 

 

Recovering from a rough Monday morning: An Amazon GuardDuty threat detection and remediation scenario

Post Syndicated from Greg McConnel original https://aws.amazon.com/blogs/security/amazon-guardduty-threat-detection-and-remediation-scenario/

Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. Given the many log types that Amazon GuardDuty analyzes (Amazon Virtual Private Cloud (VPC) Flow Logs, AWS CloudTrail, and DNS logs), you never know what it might discover in your AWS account. After enabling GuardDuty, you might quickly find serious threats lurking in your account or, preferably, just end up staring at a blank dashboard for weeks…or even longer.

A while back at an AWS Loft event, one of the customers enabled GuardDuty in their AWS account for a lab we were running. Soon after, GuardDuty alerts (findings) popped up that indicated multiple Amazon Elastic Compute Cloud (EC2) instances were communicating with known command and control servers. This means that GuardDuty detected activity commonly seen in the situation where an EC2 instance has been taken over as part of a botnet. The customer asked if this was part of the lab, and we explained it wasn’t and that the findings should be immediately investigated. This led to an investigation by that customer’s security team and luckily the issue was resolved quickly.

Then there was the time we spoke to a customer that had been running GuardDuty for a few days but had yet to see any findings in the dashboard. They were concerned that the service wasn’t working. We explained that the lack of findings was actually a good thing, and we discussed how to generate sample findings to test GuardDuty and their remediation pipeline.

This post, and the corresponding GitHub repository, will help prepare you for either type of experience by walking you through a threat detection and remediation scenario. The scenario will show you how to quickly enable GuardDuty, generate and examine test findings, and then review automated remediation examples using AWS Lambda.

Scenario overview

The instructions and AWS CloudFormation template for setting everything up are provided in a GitHub repository. The CloudFormation template sets up a test environment in your AWS Account, configures everything needed to run through the scenario, generates GuardDuty findings and provides automatic remediation for the simulated threats in the scenario. All you need to do is run the CloudFormation template in the GitHub repository and then follow the instructions to investigate what occurred.

The scenario presented is that you manage an IT organization and Alice, your security engineer, has enabled GuardDuty in a production AWS Account and configured a few automated remediations. In threat detection and remediation, the standard pattern starts with a threat which is then investigated and finally remediated. These remediations can be manual or automated. Alice focused on a few specific attack vectors, which represent a small sample of what GuardDuty is capable of detecting. Alice has set all this up on Thursday but isn’t in the office on Monday. Unfortunately, as soon as you arrive at the office, GuardDuty notifies you that multiple threats have been detected (and given the automated remediation setup, these threats have been addressed but you still need to investigate.) The documentation in GitHub will guide you through the analysis of the findings and discuss how the automatic remediation works. You will also have the opportunity to manually trigger a GuardDuty finding and view that automated remediation.

The GuardDuty findings generated in the scenario are listed here:

You can view all of the GuardDuty findings here.

You can get started immediately by browsing to the GitHub repository for this scenario where you will find the instructions and AWS CloudFormation template. This scenario will show you how easy it is to enable GuardDuty in addition to demonstrating some of the threats GuardDuty can discover. To learn more about Amazon GuardDuty please see the GuardDuty site and GuardDuty documentation.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon GuardDuty forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Control access to your APIs using Amazon API Gateway resource policies

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/control-access-to-your-apis-using-amazon-api-gateway-resource-policies/

This post courtesy of Tapodipta Ghosh, AWS Solutions Architect

Amazon API Gateway provides you with a simple, flexible, secure, and fully managed service that lets you focus on building core business services. API Gateway supports multiple mechanisms of access control using AWS Identity and Access Management (IAM), AWS Lambda authorizers, and Amazon Cognito.

You may want to enforce strict control on the locations from which your APIs are invoked. For example, if you are an AWS Partner who offers APIs over a SaaS model, you can take advantage of the new Amazon API Gateway resource policies feature to control access to your APIs using predefined IP address ranges. API Gateway resource policies are JSON policy documents that you attach to an API to control whether a specified principal (typically, an IAM user or role) can invoke the API.

After a customer subscribes to your SaaS product in AWS Marketplace, you can ask for IP address ranges in the registration information. Then you can enable access to your API from only those IP addresses, making it a secure integration. For example, if you know that your customers are spread across a certain geography, you could blacklist all other countries. Alternately, if you have global customers, you can whitelist only specific IP address ranges.

What problems do resource policies solve?

In a distributed development team with separate AWS accounts, integration testing can be challenging. Allowing users from a different AWS account to access your API requires writing and maintaining code for assuming the role in the API owners account. Also, if you work with a third party, you have to write a Lambda authorizer to implement a bearer token–based authorization scheme.

Now, you can use resource policies much like S3 bucket policies, to provide overarching controls on your APIs without writing custom authorizers or complicated application logic. In this post, I demonstrate how you can use API Gateway resource policies to enable users from a different AWS account to access your API securely. You can also allow the API to be invoked only from specified source IP address ranges or CIDR blocks, without writing any code.

Solution overview

Imagine a company has two teams, Team A and Team B. Team B has created an API that is backed by a Lambda function and a DynamoDB database. They want to make the API public to third parties. First, they want Team A to run integration tests. After the API goes live, Team B wants to allow only users who access the API from a known IP address range.

The following diagram shows the sequence:
Flow Diagram

Start with building an API. For this walkthrough, use a SAM template and the AWS CLI to create the API. For the code to create an API and attach the resource policy to it, see the Sam-moviesapi-resourcepolicy GitHub repo.

Here’s a walkthrough of the steps, so you can get a deeper understanding of what’s happening under the covers.

  • Create the API
  • Turn on IAM authentication
  • Grant user access
  • Test the access permissions

Create the API

Assume that you are hosting the API in AccountB. Run the following commands:

git clone https://github.com/aws-samples/aws-sam-movies-api-resource-policy.git
mkdir ./build

cp -p -r ./movies
./build/movies

pip install -r
requirements.txt -t ./build

aws cloudformation package --template-file template.yaml --output-template-file template-out.yaml --s3-bucket $S3Bucket –profile AccountB

aws cloudformation deploy --template-file template-out.yaml --stack-name apigw-resource-policies-demo --capabilities CAPABILITY_IAM –profile AccountB

Note: You’ll need an S3 bucket to store your artifact for the “package” step.

Turn on IAM authentication

After the movie API is set up, turn on IAM authentication, so that it’s protected from unauthenticated attempts.
It should look like the following screenshot:
iam-auth-on

Also, make sure that you are getting a valid response when you make a GET request, as shown in the following screenshot:

Grant user access

Now grant AccountA user access to your API. In the API Gateway console, choose Movies API, Resource Policy.

Note: All the IP address ranges recorded in this post are for illustration purposes only.

Here is a screenshot of how it would look in the console:

The entire policy is listed here:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<account_idA>:user/<user>",
                    "arn:aws:iam::<account_idA>:root"
                ]
            },
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*/*/*"
        },
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": " 203.0.113.0/24"
                }
            }
        }
    ]
}

Here are a few points worth noting. The first policy statement shows how you could provide granular access to certain API IDs down to the specific resource paths in the resource section of the policy. To provide the AccountA user with access only to GET requests, change the resource line to the following:

"Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*/GET/*"

In the second statement, you are whitelisting the entire 203.0.113.0/24 network to make all calls to the API.

While whitelisting IP addresses is a good way to start while launching the API for the first time, maintaining the updated list could provide challenging. For a stable product, blacklisting bad actors might be more practical.

A blacklist implementation could look like the following:

{
	"Effect": "Deny",
	"Principal": "*",
	"Action": "execute-api:Invoke",
	"Resource": "arn:aws:execute-api:us-east-1:<account_idB>:qxz8y9c8a4/*",
	"Condition": {
		"IpAddress": {
			"aws:SourceIp": "203.0.113.0/24"
		}
	}
}

You have access logs turned on for the API and your log analysis tool has flagged bad actor/s from a particular IP address range, for example 203.0.113.0/24. Now you can blacklist this IP address in the resource policy.

Test the access permissions

You can now test, using postman, to ensure that the user from AccountA can indeed call the API hosted in AccountB. Also verify that attempts from other accounts are rejected.

In the following examples, the AWS Signature is configured to the AccessKey and SecretKey values from an AccountB user, who was granted access to the API.

Successful response from an authorized user from AccountB – Got a 200 OK

Failure from an unauthorized account/user: Got 401 Unauthorized

Summary

In this post, I showed you the different ways that you can use resource policies to lock down access to your API. Want to restrict a dev API endpoint to the office IP address range? Now you can. Cross-account API access is also made much simpler without having to write complex authentication/authorization schemes.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Директивата за авторско право: ход на ревизията: да се действа сега

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/26/copyright-5/

Ново развитие в ревизията на авторското право в ЕС – става ясно от  съобщенията на българското председателство, участници в ревизията и Юлия Реда – защото тя имаше много ясен възглед какво иска да се промени в правната рамка (общ режим на изключенията, актуализиране – за да имаме правна рамка, адекватна на технологичното развитие) – и сега следи ангажирано законодателния процес.

Правителствата на държавите от ЕС  са приели позиция  относно реформата на авторските права  без съществени промени по чл.11 (новото право за издателите)  и чл.13 (филтрите на входа), проектът е на сайта на Реда,  Politico дава измененията, засягащи правото на издателите, в цвят.

Сега Парламентът трябва да ги спре, пише Реда.

 Сега имате шанса да окажете влияние – шанс, който ще изчезне след две години, когато всички “изведнъж” ще се сблъскат с предизвикателството да се  внедряват филтри   и link tax.  Експертите почти единодушно се съгласяват, че проектът за реформата на авторското право е наистина лош.

Update: Member State governments have just adopted their position on #copyright, with no significant changes to the #CensorshipMachines and #LinkTax provisions. It is now up to Parliament to stop them and #FixCopyright. https://t.co/1JwNvQn24n pic.twitter.com/KAgqV3YYG1

https://platform.twitter.com/widgets.js

Две графики от сайта на Реда – за двата текста,  против които се събира подкрепа (вж и преподавателите) – за  отношението по държави и по партии в ЕП:

 

 

Достъп до документи – триалози – още една стъпка напред

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/22/access_ep/

Документите за триалозите често се оказват в ръцете на  привилегировани лобисти само минути, след като се произвеждат. Даването на гражданите на един и същ достъп до тези документи не е просто въпрос на справедливост, необходимо е да се позволи обществен контрол върху целия законодателен процес. Парламентът многократно призовава за проактивно публикуване на документите за триалозите  на уебсайтовете на институциите.

Коментарът на Юлия Реда за новината, че Комисията по правни въпроси (JURI)  препоръча на Европейския парламент да не обжалва решението на Общия съд по делото De Capitani v. European Parliament , с което се дава достъп до документите от триалозите (вж за решението).

 

Williams: Introducing Git protocol version 2

Post Syndicated from corbet original https://lwn.net/Articles/754872/rss

Brandon Williams writes
about the new Git remote protocol
that will debut in the 2.18 release.
We recently rolled out support for protocol version 2 at Google and
have seen a performance improvement of 3x for no-op fetches of a single
branch on repositories containing 500k references. Protocol v2 has also
enabled a reduction of 8x of the overhead bytes (non-packfile) sent from
googlesource.com servers. A majority of this improvement is due to
filtering references advertised by the server to the refs the client has
expressed interest in.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”