Tag Archives: security

Email Link Isolation: your safety net for the latest phishing attacks

Post Syndicated from Joao Sousa Botto original https://blog.cloudflare.com/area1-eli-ga/

Email Link Isolation: your safety net for the latest phishing attacks

Email Link Isolation: your safety net for the latest phishing attacks

Email is one of the most ubiquitous and also most exploited tools that businesses use every single day. Baiting users into clicking malicious links within an email has been a particularly long-standing tactic for the vast majority of bad actors, from the most sophisticated criminal organizations to the least experienced attackers.

Even though this is a commonly known approach to gain account access or commit fraud, users are still being tricked into clicking malicious links that, in many cases, lead to exploitation. The reason is simple: even the best trained users (and security solutions) cannot always distinguish a good link from a bad link.

On top of that, securing employees’ mailboxes often results in multiple vendors, complex deployments, and a huge drain of resources.

Email Link Isolation turns Cloudflare Area 1 into the most comprehensive email security solution when it comes to protecting against phishing attacks. It rewrites links that could be exploited, keeps users vigilant by alerting them of the uncertainty around the website they’re about to visit, and protects against malware and vulnerabilities through the user-friendly Cloudflare Browser Isolation service. Also, in true Cloudflare fashion,  it’s a one-click deployment.

With more than a couple dozen customers in beta and over one million links protected (so far), we can now clearly see the significant value and potential that this solution can deliver. To extend these benefits to more customers and continue to expand on the multitude of ways we can apply this technology, we’re making Email Link Isolation generally available (GA) starting today.

Email Link Isolation is included with Cloudflare Area 1 enterprise plan at no extra cost, and can be enabled with three clicks:

1. Log in to the Area 1 portal.

2. Go to Settings (the gear icon).

3. On Email Configuration, go to Email Policies > Link Actions.

4. Scroll to Email Link Isolation and enable it.

Email Link Isolation: your safety net for the latest phishing attacks

Defense in layers

Applying multiple layers of defense becomes ever more critical as threat actors continuously look for ways to navigate around each security measure and develop more complex attacks. One of the best examples that demonstrates these evolving techniques is a deferred phishing attack, where an embedded URL is benign when the email reaches your email security stack and eventually your users’ inbox, but is later weaponized post-delivery.

Email Link Isolation: your safety net for the latest phishing attacks

To combat evolving email-borne threats, such as malicious links, Area 1 continually updates its machine learning (ML) models to account for all potential attack vectors, and leverages post-delivery scans and retractions as additional layers of defense. And now, customers on the Enterprise plan also have access to Email Link Isolation as one last defense – a safety net.

The key to successfully adding layers of security is to use a strong Zero Trust suite, not a disjointed set of products from multiple vendors. Users need to be kept safe without disrupting their productivity – otherwise they’ll start seeing important emails being quarantined or run into a poor experience when accessing websites, and soon enough they’ll be the ones looking for ways around the company’s security measures.

Built to avoid productivity impacts

Email Link Isolation provides an additional layer of security with virtually no disruption to the user experience. It’s smart enough to decide which links are safe, which are malicious, and which are still dubious. Those dubious links are then changed (rewritten to be precise) and Email Link Isolation keeps evaluating them until it reaches a verdict with a high degree of confidence. When a user clicks on one of those rewritten links, Email Link Isolation checks for a verdict (benign or malign) and takes the corresponding action – benign links open in the local browser as if they hadn’t been changed, while malign links are prevented from opening altogether.

Most importantly, when Email Link Isolation is unable to confidently determine a verdict based on all available intelligence, an interstitial page is presented to ask the user to be extra vigilant. The interstitial page calls out that the website is suspicious, and that the user should refrain from entering any personal information and passwords unless they know and fully trust the website. Over the last few months of beta, we’ve seen that over two thirds of users don’t proceed to the website after seeing this interstitial – that’s a good thing!

For the users that still want to navigate to the website after seeing the interstitial page, Email Link Isolation uses Cloudflare Browser Isolation to automatically open the link in an isolated browser running in Cloudflare’s closest data center to the user. This delivers an experience virtually indistinguishable from using the local browser, thanks to our Network Vector Rendering (NVR) technology and Cloudflare’s expansive, low-latency network. By opening the suspicious link in an isolated browser, the user is protected against potential browser attacks (including malware, zero days, and other types of malicious code execution).

In a nutshell, the interstitial page is displayed when Email Link Isolation is uncertain about the website, and provides another layer of awareness and protection against phishing attacks. Then, Cloudflare Browser Isolation is used to protect against malicious code execution when a user decides to still proceed to such a website.

What we’ve seen in the beta

As expected, the percentage of rewritten links that users actually click is quite small (single digit percentage). That’s because the majority of such links are not delivered in messages the users are expecting, and aren’t coming from trusted colleagues or partners of theirs. So, even when a user clicks on such a link, they will often see the interstitial page and decide not to proceed any further. We see that less than half of all clicks lead to the user actually visiting the website (in Browser Isolation, to protect against malicious code that could otherwise be executing behind the scenes).

Email Link Isolation: your safety net for the latest phishing attacks
Email Link Isolation: your safety net for the latest phishing attacks

You may be wondering why we’re not seeing a larger amount of clicks on these rewritten links. The answer is quite simply that link Email Link Isolation is indeed that last layer of protection against attack vectors that may have evaded other lines of defense. Virtually all the well crafted phishing attacks that try and trick users into clicking malicious links are already being stopped by the Area 1 email security, and such messages don’t reach users’ inboxes.

The balance is very positive. From all the customers using Email Link Isolation beta in production, some Fortune 500, we received no negative feedback on the user experience. That means that we’re meeting one of the most challenging goals – to provide additional security without negatively affecting users and without adding the burden of tuning/administration to the SOC and IT teams.

One interesting thing we uncover is how valuable our customers are finding our click-time inspection of link shorteners. The fact that a shortened URL (e.g. bit.ly) can be modified at any time to point to a different website has been making some of our customers anxious. Email Link Isolation inspects the link at time-of-click, evaluates the actual website that it’s going to open, and proceeds to open locally, block or present the interstitial page as adequate. We’re now working on full link shortener coverage through Email Link Isolation.

All built on Cloudflare

Cloudflare’s intelligence is driving the decisions of what gets rewritten. We have earlier signals than others.

Email Link Isolation has been built on Cloudflare’s unique capabilities in many areas.

First, because Cloudflare sees enough Internet traffic for us to confidently identify new/low confidence and potentially dangerous domains earlier than anyone else – leveraging the Cloudflare intelligence for this early signal is key to the user experience, to not add speed bumps to legitimate websites that are part of our users’ daily routines. Next, we’re using Cloudflare Workers to process this data and serve the interstitial without introducing frustrating delays to the user. And finally, only Cloudflare Browser Isolation can protect against malicious code with a low-latency experience that is invisible to end users and feels like a local browser.

If you’re not yet a Cloudflare Area 1 customer, start your free trial and phishing risk assessment here.

How Cloudflare Area 1 and DLP work together to protect data in email

Post Syndicated from Ayush Kumar original https://blog.cloudflare.com/dlp-area1-to-protect-data-in-email/

How Cloudflare Area 1 and DLP work together to protect data in email

How Cloudflare Area 1 and DLP work together to protect data in email

Threat prevention is not limited to keeping external actors out, but also keeping sensitive data in. Most organizations do not realize how much confidential information resides within their email inboxes. Employees handle vast amounts of sensitive data on a daily basis, such as intellectual property, internal documentation, PII, or payment information and often share this information internally via email making email one of the largest locations confidential information is stored within a company. It comes as no shock that organizations worry about protecting the accidental or malicious egress of sensitive data and often address these concerns by instituting strong Data Loss Prevention policies. Cloudflare makes it easy for customers to manage the data in their email inboxes with Area 1 Email Security and Cloudflare One.

Cloudflare One, our SASE platform that delivers network-as-a-service (NaaS) with Zero Trust security natively built-in, connects users to enterprise resources, and offers a wide variety of opportunities to secure corporate traffic, including the inspection of data transferred to your corporate email. Area 1 email security, as part of our composable Cloudflare One platform, delivers the most complete data protection for your inbox and offers a cohesive solution when including additional services, such as Data Loss Prevention (DLP). With the ability to easily adopt and implement Zero Trust services as needed, customers have the flexibility to layer on defenses based on their most critical use cases. In the case of Area 1 + DLP, the combination can collectively and preemptively address the most pressing use cases that represent high-risk areas of exposure for organizations. Combining these products provides the in-depth defense of your corporate data.

Preventing egress of cloud email data via HTTPs

Email provides a readily available outlet for corporate data, so why let sensitive data reach email in the first place? An employee can accidentally attach an internal file rather than a public white paper in a customer email, or worse, attach a document with the wrong customers’ information to an email.

With Cloudflare Data Loss Prevention (DLP) you can prevent the upload of sensitive information, such as PII or intellectual property, to your corporate email. DLP is offered as part of Cloudflare One, which runs traffic from data centers, offices, and remote users through the Cloudflare network.  As traffic traverses Cloudflare, we offer protections including validating identity and device posture and filtering corporate traffic.

How Cloudflare Area 1 and DLP work together to protect data in email

Cloudflare One offers HTTP(s) filtering, enabling you to inspect and route the traffic to your corporate applications. Cloudflare Data Loss Prevention (DLP) leverages the HTTP filtering abilities of Cloudflare One. You can apply rules to your corporate traffic and route traffic based on information in an HTTP request. There are a wide variety of options for filtering, such as domain, URL, application, HTTP method, and many more. You can use these options to segment the traffic you wish to DLP scan. All of this is done with the performance of our global network and managed with one control plane.

You can apply DLP policies to corporate email applications, such as Google Suite or O365.  As an employee attempts to upload an attachment to an email, the upload is inspected for sensitive data, and then allowed or blocked according to your policy.

How Cloudflare Area 1 and DLP work together to protect data in email

Inside your corporate email extend more core data protection principles with Area 1 in the following ways:

Enforcing data security between partners

With Cloudflare’s Area 1, you can also enforce strong TLS standards. Having TLS configured adds an extra layer of security as it ensures that emails are encrypted, preventing any attackers from reading sensitive information and changing the message if they intercept the email in transit (on-path-attack). This is especially useful for G Suite customers whose internal emails still go out to the whole Internet in front of prying eyes or for customers who have contractual obligations to communicate with partners with SSL/TLS.

Area 1 makes it easy to enforce SSL/TLS inspections. From the Area 1 portal, you can configure Partner Domain(s) TLS by navigating “Partner Domains TLS” within “Domains & Routing” and adding a partner domain with which you want to enforce TLS. If TLS is required then all emails from that domain with no TLS will be automatically dropped. Our TLS ensures strong TLS rather than the best effort in order to make sure that all traffic is encrypted with strong ciphers preventing a malicious attacker from being able to decrypt any intercepted emails.

How Cloudflare Area 1 and DLP work together to protect data in email

Stopping passive email data loss

Organizations often forget that exfiltration also can be done without ever sending any email. Attackers who are able to compromise a company account are able to passively sit by, monitoring all communications and picking out information manually.

Once an attacker has reached this stage, it is incredibly difficult to know an account is compromised and what information is being tracked. Indicators like email volume, IP address changes, and others do not work since the attacker is not taking any actions that would cause suspicion. At Cloudflare, we have a strong thesis on preventing these account takeovers before they take place, so no attacker is able to fly under the radar.

In order to stop account takeovers before they happen, we place great emphasis on filtering emails that pose a risk for stealing employee credentials. The most common attack vector used by malicious actors are phishing emails. Given its ability to have a high impact in accessing confidential data when successful, it’s no shock that this is the go-to tool in the attackers tool kit. Phishing emails pose little threat to an email inbox protected by Cloudflare’s Area 1 product. Area 1’s models are able to assess if a message is a suspected phishing email by analyzing different metadata. Anomalies detected by the models like domain proximity (how close a domain is to the legitimate one), sentiment of email, or others can quickly determine if an email is legitimate or not. If Area 1 determines an email to be a phishing attempt, we automatically retract the email and prevent the recipient from receiving the email in their inbox ensuring that the employee’s account remains uncompromised and unable to be used to exfiltrate data.

Attackers who are looking to exfiltrate data from an organization also often rely on employees clicking on links sent to them via email. These links can point to online forms which on the surface look innocuous but serve to gather sensitive information. Attackers can use these websites to initiate scripts which gather information about the visitor without any interaction from the employee. This presents a strong concern since an errant click by an employee can lead to the exfiltration of sensitive information. Other malicious links can contain exact copies of websites which the user is accustomed to accessing. However, these links are a form of phishing where the credentials entered by the employee are sent to the attacker rather than logging them into the website.

Area 1 covers this risk by providing Email Link Isolation as part of our email security offering. With email link isolation, Area 1 looks at every link sent and accesses its domain authority. For anything that’s on the margin (a link we cannot confidently say is safe), Area 1 will launch a headless Chromium browser and open the link there with no interruption. This way, any malicious scripts that execute will run on an isolated instance far from the company’s infrastructure, stopping the attacker from getting company information. This is all accomplished instantaneously and reliably.

Stopping Ransomware

Attackers have many tools in their arsenal to try to compromise employee accounts. As we mentioned above, phishing is a common threat vector, but it’s far from the only one. At Area 1, we are also vigilant in preventing the propagation of ransomware.

A common mechanism that attackers use to disseminate ransomware is to disguise attachments by renaming them. A ransomware payload could be renamed from petya.7z to Invoice.pdf in order to try to trick an employee into downloading the file. Depending on how urgent the email made this invoice seem, the employee could blindly try to open the attachment on their computer causing the organization to suffer a ransomware attack. Area 1’s models detect these mismatches and stop malicious ones from arriving into their target’s email inbox.

How Cloudflare Area 1 and DLP work together to protect data in email

A successful ransomware campaign can not only stunt the daily operations of any company, but can also lead to the loss of local data if the encryption is unable to be reversed. Cloudflare’s Area 1 product has dedicated payload models which analyze not only the attachment extensions but also the hashed value of the attachment to compare it to known ransomware campaigns. Once Area 1 finds an attachment deemed to be ransomware, we prohibit the email from going any further.

How Cloudflare Area 1 and DLP work together to protect data in email

Cloudflare’s DLP vision

We aim for Cloudflare products to give you the layered security you need to protect your organization, whether its malicious attempts to get in or sensitive data getting out. As email continues to be the largest surface of corporate data, it is crucial for companies to have strong DLP policies in place to prevent the loss of data. With Area 1 and Cloudflare One working together, we at Cloudflare are able to provide organizations with more confidence about their DLP policies.

If you are interested in these email security or DLP services, contact us for a conversation about your security and data protection needs.

Or if you currently subscribe to Cloudflare services, consider reaching out to your Cloudflare customer success manager to discuss adding additional email security or DLP protection.

Secure CDK deployments with IAM permission boundaries

Post Syndicated from Brian Farnhill original https://aws.amazon.com/blogs/devops/secure-cdk-deployments-with-iam-permission-boundaries/

The AWS Cloud Development Kit (CDK) accelerates cloud development by allowing developers to use common programming languages when modelling their applications. To take advantage of this speed, developers need to operate in an environment where permissions and security controls don’t slow things down, and in a tightly controlled environment this is not always the case. Of particular concern is the scenario where a developer has permission to create AWS Identity and Access Management (IAM) entities (such as users or roles), as these could have permissions beyond that of the developer who created them, allowing for an escalation of privileges. This approach is typically controlled through the use of permission boundaries for IAM entities, and in this post you will learn how these boundaries can now be applied more effectively to CDK development – allowing developers to stay secure and move fast.

Time to read 10 minutes
Learning level Advanced (300)
Services used

AWS Cloud Development Kit (CDK)

AWS Identity and Access Management (IAM)

Applying custom permission boundaries to CDK deployments

When the CDK deploys a solution, it assumes a AWS CloudFormation execution role to perform operations on the user’s behalf. This role is created during the bootstrapping phase by the AWS CDK Command Line Interface (CLI). This role should be configured to represent the maximum set of actions that CloudFormation can perform on the developers behalf, while not compromising any compliance or security goals of the organisation. This can become complicated when developers need to create IAM entities (such as IAM users or roles) and assign permissions to them, as those permissions could be escalated beyond their existing access levels. Taking away the ability to create these entities is one way to solve the problem. However, doing this would be a significant impediment to developers, as they would have to ask an administrator to create them every time. This is made more challenging when you consider that security conscious practices will create individual IAM roles for every individual use case, such as each AWS Lambda Function in a stack. Rather than taking this approach, IAM permission boundaries can help in two ways – first, by ensuring that all actions are within the overlap of the users permissions and the boundary, and second by ensuring that any IAM entities that are created also have the same boundary applied. This blocks the path to privilege escalation without restricting the developer’s ability to create IAM identities. With the latest version of the AWS CLI these boundaries can be applied to the execution role automatically when running the bootstrap command, as well as being added to IAM entities that are created in a CDK stack.

To use a permission boundary in the CDK, first create an IAM policy that will act as the boundary. This should define the maximum set of actions that the CDK application will be able to perform on the developer’s behalf, both during deployment and operation. This step would usually be performed by an administrator who is responsible for the security of the account, ensuring that the appropriate boundaries and controls are enforced. Once created, the name of this policy is provided to the bootstrap command. In the example below, an IAM policy called “developer-policy” is used to demonstrate the command.
cdk bootstrap –custom-permissions-boundary developer-policy
Once this command runs, a new bootstrap stack will be created (or an existing stack will be updated) so that the execution role has this boundary applied to it. Next, you can ensure that any IAM entities that are created will have the same boundaries applied to them. This is done by either using a CDK context variable, or the permissionBoundary attribute on those resources. To explain this in some detail, let’s use a real world scenario and step through an example that shows how this feature can be used to restrict developers from using the AWS Config service.

Installing or upgrading the AWS CDK CLI

Before beginning, ensure that you have the latest version of the AWS CDK CLI tool installed. Follow the instructions in the documentation to complete this. You will need version 2.54.0 or higher to make use of this new feature. To check the version you have installed, run the following command.

cdk --version

Creating the policy

First, let’s begin by creating a new IAM policy. Below is a CloudFormation template that creates a permission policy for use in this example. In this case the AWS CLI can deploy it directly, but this could also be done at scale through a mechanism such as CloudFormation Stack Sets. This template has the following policy statements:

  1. Allow all actions by default – this allows you to deny the specific actions that you choose. You should carefully consider your approach to allow/deny actions when creating your own policies though.
  2. Deny the creation of users or roles unless the “developer-policy” permission boundary is used. Additionally limit the attachment of permissions boundaries on existing entities to only allow “developer-policy” to be used. This prevents the creation or change of an entity that can escalate outside of the policy.
  3. Deny the ability to change the policy itself so that a developer can’t modify the boundary they will operate within.
  4. Deny the ability to remove the boundary from any user or role
  5. Deny any actions against the AWS Config service

Here items 2, 3 and 4 all ensure that the permission boundary works correctly – they are controls that prevent the boundary being removed, tampered with, or bypassed. The real focus of this policy in terms of the example are items 1 and 5 – where you allow everything, except the specific actions that are denied (creating a deny list of actions, rather than an allow list approach).

Resources:
  PermissionsBoundary:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      PolicyDocument:
        Statement:
          # ----- Begin base policy ---------------
          # If permission boundaries do not have an explicit allow
          # then the effect is deny
          - Sid: ExplicitAllowAll
            Action: "*"
            Effect: Allow
            Resource: "*"
          # Default permissions to prevent privilege escalation
          - Sid: DenyAccessIfRequiredPermBoundaryIsNotBeingApplied
            Action:
              - iam:CreateUser
              - iam:CreateRole
              - iam:PutRolePermissionsBoundary
              - iam:PutUserPermissionsBoundary
            Condition:
              StringNotEquals:
                iam:PermissionsBoundary:
                  Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
            Effect: Deny
            Resource: "*"
          - Sid: DenyPermBoundaryIAMPolicyAlteration
            Action:
              - iam:CreatePolicyVersion
              - iam:DeletePolicy
              - iam:DeletePolicyVersion
              - iam:SetDefaultPolicyVersion
            Effect: Deny
            Resource:
              Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
          - Sid: DenyRemovalOfPermBoundaryFromAnyUserOrRole
            Action: 
              - iam:DeleteUserPermissionsBoundary
              - iam:DeleteRolePermissionsBoundary
            Effect: Deny
            Resource: "*"
          # ----- End base policy ---------------
          # -- Begin Custom Organization Policy --
          - Sid: DenyModifyingOrgCloudTrails
            Effect: Deny
            Action: config:*
            Resource: "*"
          # -- End Custom Organization Policy --
        Version: "2012-10-17"
      Description: "Bootstrap Permission Boundary"
      ManagedPolicyName: developer-policy
      Path: /

Save the above locally as developer-policy.yaml and then you can deploy it with a CloudFormation command in the AWS CLI:

aws cloudformation create-stack --stack-name DeveloperPolicy \
        --template-body file://developer-policy.yaml \
        --capabilities CAPABILITY_NAMED_IAM

Creating a stack to test the policy

To begin, create a new CDK application that you will use to test and observe the behaviour of the permission boundary. Create a new directory with a TypeScript CDK application in it by executing these commands.

mkdir DevUsers && cd DevUsers
cdk init --language typescript

Once this is done, you should also make sure that your account has a CDK bootstrap stack deployed with the cdk bootstrap command – to start with, do not apply a permission boundary to it, you can add that later an observe how it changes the behaviour of your deployment. Because the bootstrap command is not using the --cloudformation-execution-policies argument, it will default to arn:aws:iam::aws:policy/AdministratorAccess which means that CloudFormation will have full access to the account until the boundary is applied.

cdk bootstrap

Once the command has run, create an AWS Config Rule in your application to be sure that this works without issue before the permission boundary is applied. Open the file lib/dev_users-stack.ts and edit its contents to reflect the sample below.


import * as cdk from 'aws-cdk-lib';
import { ManagedRule, ManagedRuleIdentifiers } from 'aws-cdk-lib/aws-config';
import { Construct } from "constructs";

export class DevUsersStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new ManagedRule(this, 'AccessKeysRotated', {
      configRuleName: 'access-keys-policy',
      identifier: ManagedRuleIdentifiers.ACCESS_KEYS_ROTATED,
      inputParameters: {
        maxAccessKeyAge: 60, // default is 90 days
      },
    });
  }
}

Next you can deploy with the CDK CLI using the cdk deploy command, which will succeed (the output below has been truncated to show a summary of the important elements).

❯ cdk deploy
✨  Synthesis time: 3.05s
✅  DevUsersStack
✨  Deployment time: 23.17s

Stack ARN:
arn:aws:cloudformation:ap-southeast-2:123456789012:stack/DevUsersStack/704a7710-7c11-11ed-b606-06d79634f8d4

✨  Total time: 26.21s

Before you deploy the permission boundary, remove this stack again with the cdk destroy command.

❯ cdk destroy
Are you sure you want to delete: DevUsersStack (y/n)? y
DevUsersStack: destroying... [1/1]
✅ DevUsersStack: destroyed

Using a permission boundary with the CDK test application

Now apply the permission boundary that you created above and observe the impact it has on the same deployment. To update your booststrap with the permission boundary, re-run the cdk bootstrap command with the new custom-permissions-boundary parameter.

cdk bootstrap --custom-permissions-boundary developer-policy

After this command executes, the CloudFormation execution role will be updated to use that policy as a permission boundary, which based on the deny rule for config:* will cause this same application deployment to fail. Run cdk deploy again to confirm this and observe the error message.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack
named DevUsersStack failed creation, it may need to be manually deleted 
from the AWS console: 
  ROLLBACK_COMPLETE: 
    User: arn:aws:sts::123456789012:assumed-role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
    is not authorized to perform: config:PutConfigRule on resource: access-keys-policy with an explicit deny in a
    permissions boundary

This shows you that the action was denied specifically due to the use of a permissions boundary, which is what was expected.

Applying permission boundaries to IAM entities automatically

Next let’s explore how the permission boundary can be extended to IAM entities that are created by a CDK application. The concern here is that a developer who is creating a new IAM entity could assign it more permissions than they have themselves – the permission boundary manages this by ensuring that entities can only be created that also have the boundary attached. You can validate this by modifying the stack to deploy a Lambda function that uses a role that doesn’t include the boundary. Open the file lib/dev_users-stack.ts again and edit its contents to reflect the sample below.

import * as cdk from 'aws-cdk-lib';
import { PolicyStatement } from "aws-cdk-lib/aws-iam";
import {
  AwsCustomResource,
  AwsCustomResourcePolicy,
  PhysicalResourceId,
} from "aws-cdk-lib/custom-resources";
import { Construct } from "constructs";

export class DevUsersStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new AwsCustomResource(this, "Resource", {
      onUpdate: {
        service: "ConfigService",
        action: "putConfigRule",
        parameters: {
          ConfigRule: {
            ConfigRuleName: "SampleRule",
            Source: {
              Owner: "AWS",
              SourceIdentifier: "ACCESS_KEYS_ROTATED",
            },
            InputParameters: '{"maxAccessKeyAge":"60"}',
          },
        },
        physicalResourceId: PhysicalResourceId.of("SampleConfigRule"),
      },
      policy: AwsCustomResourcePolicy.fromStatements([
        new PolicyStatement({
          actions: ["config:*"],
          resources: ["*"],
        }),
      ]),
    });
  }
}

Here the AwsCustomResource is used to provision a Lambda function that will attempt to create a new config rule. This is the same result as the previous stack but in this case the creation of the rule is done by a new IAM role that is created by the CDK construct for you. Attempting to deploy this will result in a failure – run cdk deploy to observe this.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named 
DevUsersStack failed creation, it may need to be manually deleted from the AWS 
console: 
  ROLLBACK_COMPLETE: 
    API: iam:CreateRole User: arn:aws:sts::123456789012:assumed-
role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
    is not authorized to perform: iam:CreateRole on resource:
arn:aws:iam::123456789012:role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-1EAD7M62914OZ
    with an explicit deny in a permissions boundary

The error message here details that the stack was unable to deploy because the call to iam:CreateRole failed because the boundary wasn’t applied. The CDK now offers a straightforward way to set a default permission boundary on all IAM entities that are created, via the CDK context variable core:permissionsBoundary in the cdk.json file.

{
  "context": {
     "@aws-cdk/core:permissionsBoundary": {
       "name": "developer-policy"
     }
  }
}

This approach is useful because now you can import constructs that create IAM entities (such as those found on Construct Hub or out of the box constructs that create default IAM roles) and have the boundary apply to them as well. There are alternative ways to achieve this, such as setting a boundary on specific roles, which can be used in scenarios where this approach does not fit. Make the change to your cdk.json file and run the CDK deploy again. This time the custom resource will attempt to create the config rule using its IAM role instead of the CloudFormation execution role. It is expected that the boundary will also protect this Lambda function in the same way – run cdk deploy again to confirm this. Note that the deployment updates from CloudFormation show that this time the role creation succeeds this time, and a new error message is generated.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS 
console:
  ROLLBACK_COMPLETE: 
    Received response status [FAILED] from custom resource. Message returned: User:
    arn:aws:sts::123456789012:assumed-role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd22872-MBnArBmaaLJp
    is not authorized to perform: config:PutConfigRule on resource: SampleRule with an explicit deny in a permissions boundary

In this error message you can see that the user it refers to is DevUsersStack-AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N rather than the CloudFormation execution role. This is the role being used by the custom Lambda function resource, and when it attempts to create the Config rule it is rejected because of the permissions boundary in the same way. Here you can see how the boundary is being applied consistently to all IAM entities that are created in your CDK app, which ensures the administrative controls can be applied consistently to everything a developer does with a minimal amount of overhead.

Cleanup

At this point you can either choose to remove the CDK bootstrap stack if you no longer require it, or remove the permission boundary from the stack. To remove it, delete the CDKToolkit stack from CloudFormation with this AWS CLI command.

aws cloudformation delete-stack --stack-name CDKToolkit

If you want to keep the bootstrap stack, you can remove the boundary by following these steps:

  1. Browse to the CloudFormation page in the AWS console, and select the CDKToolit stack.
  2. Select the ‘Update’ button. Choose “Use Current Template” and then press ‘Next’
  3. On the parameters page, find the value InputPermissionsBoundary which will have developer-policy as the value, and delete the text in this input to leave it blank. Press ‘Next’ and the on the following page, press ‘Next’ again
  4. On the final page, scroll to the bottom and check the box acknowledging that CloudFormation might create IAM resources with custom names, and choose ‘Submit’

With the permission boundary no longer being used, you can now remove the stack that created it as the final step.

aws cloudformation delete-stack --stack-name DeveloperPolicy

Conclusion

Now you can see how IAM permission boundaries can easily be integrated in to CDK development, helping ensure developers have the control they need while administrators can ensure that security is managed in a way that meets the needs of the organisation as well.

With this being understood, there are next steps you can take to further expand on the use of permission boundaries. The CDK Security and Safety Developer Guide document on GitHub outlines these approaches, as well as ways to think about your approach to permissions on deployment. It’s recommended that developers and administrators review this, and work to develop and appropriate approach to permission policies that suit your security goals.

Additionally, the permission boundary concept can be applied in a multi-account model where each Stage has a unique boundary name applied. This can allow for scenarios where a lower-level environment (such as a development or beta environment) has more relaxed permission boundaries that suit troubleshooting and other developer specific actions, but then the higher level environments (such as gamma or production) could have the more restricted permission boundaries to ensure that security risks are more appropriately managed. The mechanism for implement this is defined in the security and safety developer guide also.

About the authors:

Brian Farnhill

Brian Farnhill is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. His background is in building solutions and helping customers improve DevOps tools and processes. When he isn’t working, you’ll find him either coding for fun or playing online games.

David Turnbull

David Turnbull is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. He likes to comprehend new programming languages and has used this to stray out of his line. David writes computer simulations for fun.

How to query and visualize Macie sensitive data discovery results with Athena and QuickSight

Post Syndicated from Keith Rozario original https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to help you discover and protect sensitive data in Amazon Simple Storage Service (Amazon S3). With Macie, you can analyze objects in your S3 buckets to detect occurrences of sensitive data, such as personally identifiable information (PII), financial information, personal health information, and access credentials.

In this post, we walk you through a solution to gain comprehensive and organization-wide visibility into which types of sensitive data are present in your S3 storage, where the data is located, and how much is present. Once enabled, Macie automatically starts discovering sensitive data in your S3 storage and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to run targeted sensitive data discovery jobs. Both automated data discovery and targeted jobs produce rich, detailed sensitive data discovery results. This solution uses Amazon Athena and Amazon QuickSight to deep-dive on the Macie results, and to help you analyze, visualize, and report on sensitive data discovered by Macie, even when the data is distributed across millions of objects, thousands of S3 buckets, and thousands of AWS accounts. Athena is an interactive query service that makes it simpler to analyze data directly in Amazon S3 using standard SQL. QuickSight is a cloud-scale business intelligence tool that connects to multiple data sources, including Athena databases and tables.

This solution is relevant to data security, data governance, and security operations engineering teams.

The challenge: how to summarize sensitive data discovered in your growing S3 storage

Macie issues findings when an object is found to contain sensitive data. In addition to findings, Macie keeps a record of each S3 object analyzed in a bucket of your choice for long-term storage. These records are known as sensitive data discovery results, and they include additional context about your data in Amazon S3. Due to the large size of the results file, Macie exports the sensitive data discovery results to an S3 bucket, so you need to take additional steps to query and visualize the results. We discuss the differences between findings and results in more detail later in this post.

With the increasing number of data privacy guidelines and compliance mandates, customers need to scale their monitoring to encompass thousands of S3 buckets across their organization. The growing volume of data to assess, and the growing list of findings from discovery jobs, can make it difficult to review and remediate issues in a timely manner. In addition to viewing individual findings for specific objects, customers need a way to comprehensively view, summarize, and monitor sensitive data discovered across their S3 buckets.

To illustrate this point, we ran a Macie sensitive data discovery job on a dataset created by AWS. The dataset contains about 7,500 files that have sensitive information, and Macie generated a finding for each sensitive file analyzed, as shown in Figure 1.

Figure 1: Macie findings from the dataset

Figure 1: Macie findings from the dataset

Your security team could spend days, if not months, analyzing these individual findings manually. Instead, we outline how you can use Athena and QuickSight to query and visualize the Macie sensitive data discovery results to understand your data security posture.

The additional information in the sensitive data discovery results will help you gain comprehensive visibility into your data security posture. With this visibility, you can answer questions such as the following:

  • What are the top 5 most commonly occurring sensitive data types?
  • Which AWS accounts have the most findings?
  • How many S3 buckets are affected by each of the sensitive data types?

Your security team can write their own customized queries to answer questions such as the following:

  • Is there sensitive data in AWS accounts that are used for development purposes?
  • Is sensitive data present in S3 buckets that previously did not contain sensitive information?
  • Was there a change in configuration for S3 buckets containing the greatest amount of sensitive data?

How are findings different from results?

As a Macie job progresses, it produces two key types of output: sensitive data findings (or findings for short), and sensitive data discovery results (or results).

Findings provide a report of potential policy violations with an S3 bucket, or the presence of sensitive data in a specific S3 object. Each finding provides a severity rating, information about the affected resource, and additional details, such as when Macie found the issue. Findings are published to the Macie console, AWS Security Hub, and Amazon EventBridge.

In contrast, results are a collection of records for each S3 object that a Macie job analyzed. These records contain information about objects that do and do not contain sensitive data, including up to 1,000 occurrences of each sensitive data type that Macie found in a given object, and whether Macie was unable to analyze an object because of issues such as permissions settings or use of an unsupported format. If an object contains sensitive data, the results record includes detailed information that isn’t available in the finding for the object.

One of the key benefits of querying results is to uncover gaps in your data protection initiatives—these gaps can occur when data in certain buckets can’t be analyzed because Macie was denied access to those buckets, or was unable to decrypt specific objects. The following table maps some of the key differences between findings and results.

Findings Results
Enabled by default Yes No
Location of published results Macie console, Security Hub, and EventBridge S3 bucket
Details of S3 objects that couldn’t be scanned No Yes
Details of S3 objects in which no sensitive data was found No Yes
Identification of files inside compressed archives that contain sensitive data No Yes
Number of occurrences reported per object Up to 15 Up to 1,000
Retention period 90 days in Macie console Defined by customer

Architecture

As shown in Figure 2, you can build out the solution in three steps:

  1. Enable the results and publish them to an S3 bucket
  2. Build out the Athena table to query the results by using SQL
  3. Visualize the results with QuickSight
Figure 2: Architecture diagram showing the flow of the solution

Figure 2: Architecture diagram showing the flow of the solution

Prerequisites

To implement the solution in this blog post, you must first complete the following prerequisites:

Figure 3: Sample data loaded into three different AWS accounts

Figure 3: Sample data loaded into three different AWS accounts

Note: All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not, nor is it intended, to relate back to any individual person.

Step 1: Enable the results and publish them to an S3 bucket

Publication of the discovery results to Amazon S3 is not enabled by default. The setup requires that you specify an S3 bucket to store the results (we also refer to this as the discovery results bucket), and use an AWS Key Management Service (AWS KMS) key to encrypt the bucket.

If you are analyzing data across multiple accounts in your organization, then you need to enable the results in your delegated Macie administrator account. You do not need to enable results in individual member accounts. However, if you’re running Macie jobs in a standalone account, then you should enable the Macie results directly in that account.

To enable the results

  1. Open the Macie console.
  2. Select the AWS Region from the upper right of the page.
  3. From the left navigation pane, select Discovery results.
  4. Select Configure now.
  5. Select Create Bucket, and enter a unique bucket name. This will be the discovery results bucket name. Make note of this name because you will use it when you configure the Athena tables later in this post.
  6. Under Encryption settings, select Create new key. This takes you to the AWS KMS console in a new browser tab.
  7. In the AWS KMS console, do the following:
    1. For Key type, choose symmetric, and for Key usage, choose Encrypt and Decrypt.
    2. Enter a meaningful key alias (for example, macie-results-key) and description.
    3. (Optional) For simplicity, set your current user or role as the Key Administrator.
    4. Set your current user/role as a user of this key in the key usage permissions step. This will give you the right permissions to run the Athena queries later.
    5. Review the settings and choose Finish.
  8. Navigate to the browser tab with the Macie console.
  9. From the AWS KMS Key dropdown, select the new key.
  10. To view KMS key policy statements that were automatically generated for your specific key, account, and Region, select View Policy. Copy these statements in their entirety to your clipboard.
  11. Navigate back to the browser tab with the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created, choose Switch to policy view, and under Key policy, select Edit.
    3. In the key policy, paste the statements that you copied. When you add the statements, do not delete any existing statements and make sure that the syntax is valid. Policies are in JSON format.
  12. Navigate back to the Macie console browser tab.
  13. Review the inputs in the Settings page for Discovery results and then choose Save. Macie will perform a check to make sure that it has the right access to the KMS key, and then it will create a new S3 bucket with the required permissions.
  14. If you haven’t run a Macie discovery job in the last 90 days, you will need to run a new discovery job to publish the results to the bucket.

In this step, you created a new S3 bucket and KMS key that you are using only for Macie. For instructions on how to enable and configure the results using existing resources, see Storing and retaining sensitive data discovery results with Amazon Macie. Make sure to review Macie pricing details before creating and running a sensitive data discovery job.

Step 2: Build out the Athena table to query the results using SQL

Now that you have enabled the discovery results, Macie will begin publishing them into your discovery results bucket in the form of jsonl.gz files. Depending on the amount of data, there could be thousands of individual files, with each file containing multiple records. To identify the top five most commonly occurring sensitive data types in your organization, you would need to query all of these files together.

In this step, you will configure Athena so that it can query the results using SQL syntax. Before you can run an Athena query, you must specify a query result bucket location in Amazon S3. This is different from the Macie discovery results bucket that you created in the previous step.

If you haven’t set up Athena previously, we recommend that you create a separate S3 bucket, and specify a query result location using the Athena console. After you’ve set up the query result location, you can configure Athena.

To create a new Athena database and table for the Macie results

  1. Open the Athena console, and in the query editor, enter the following data definition language (DDL) statement. In the context of SQL, a DDL statement is a syntax for creating and modifying database objects, such as tables. For this example, we named our database macie_results.
    CREATE DATABASE macie_results;
    

    After running this step, you’ll see a new database in the Database dropdown. Make sure that the new macie_results database is selected for the next queries.

    Figure 4: Create database in the Athena console

    Figure 4: Create database in the Athena console

  2. Create a table in the database by using the following DDL statement. Make sure to replace <RESULTS-BUCKET-NAME> with the name of the discovery results bucket that you created previously.
    CREATE EXTERNAL TABLE maciedetail_all_jobs(
    	accountid string,
    	category string,
    	classificationdetails struct<jobArn:string,result:struct<status:struct<code:string,reason:string>,sizeClassified:string,mimeType:string,sensitiveData:array<struct<category:string,totalCount:string,detections:array<struct<type:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<struct<pageNumber:string>>,records:array<struct<recordIndex:string,jsonPath:string>>,cells:array<struct<row:string,`column`:string,`columnName`:string,cellReference:string>>>>>>>,customDataIdentifiers:struct<totalCount:string,detections:array<struct<arn:string,name:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<string>,records:array<string>,cells:array<string>>>>>>,detailedResultsLocation:string,jobId:string>,
    	createdat string,
    	description string,
    	id string,
    	partition string,
    	region string,
    	resourcesaffected struct<s3Bucket:struct<arn:string,name:string,createdAt:string,owner:struct<displayName:string,id:string>,tags:array<string>,defaultServerSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,publicAccess:struct<permissionConfiguration:struct<bucketLevelPermissions:struct<accessControlList:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,bucketPolicy:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>,accountLevelPermissions:struct<blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>>,effectivePermission:string>>,s3Object:struct<bucketArn:string,key:string,path:string,extension:string,lastModified:string,eTag:string,serverSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,size:string,storageClass:string,tags:array<string>,embeddedFileDetails:struct<filePath:string,fileExtension:string,fileSize:string,fileLastModified:string>,publicAccess:boolean>>,
    	schemaversion string,
    	severity struct<description:string,score:int>,
    	title string,
    	type string,
    	updatedat string)
    ROW FORMAT SERDE
    	'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
    	'paths'='accountId,category,classificationDetails,createdAt,description,id,partition,region,resourcesAffected,schemaVersion,severity,title,type,updatedAt')
    STORED AS INPUTFORMAT
    	'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    	's3://<RESULTS-BUCKET-NAME>/AWSLogs/'
    

    After you complete this step, you will see a new table named maciedetail_all_jobs in the Tables section of the query editor.

  3. Query the results to start gaining insights. For example, to identify the top five most common sensitive data types, run the following query:
    select sensitive_data.category,
    	detections_data.type,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by sensitive_data.category, detections_data.type
    order by total_detections desc
    LIMIT 5
    

    Running this query on the sample dataset gives the following output.

    Results of a query showing the five most common sensitive data types in the dataset

    Figure 5: Results of a query showing the five most common sensitive data types in the dataset

  4. (Optional) The previous query ran on all of the results available for Macie. You can further query which accounts have the greatest amount of sensitive data detected.
    select accountid,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by accountid
    order by total_detections desc
    

    To test this query, we distributed the synthetic dataset across three member accounts in our organization, ran the query, and received the following output. If you enable Macie in just a single account, then you will only receive results for that one account.

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

For a list of more example queries, see the amazon-macie-results-analytics GitHub repository.

Step 3: Visualize the results with QuickSight

In the previous step, you used Athena to query your Macie discovery results. Although the queries were powerful, they only produced tabular data as their output. In this step, you will use QuickSight to visualize the results of your Macie jobs.

Before creating the visualizations, you first need to grant QuickSight the right permissions to access Athena, the results bucket, and the KMS key that you used to encrypt the results.

To allow QuickSight access to the KMS key

  1. Open the AWS Identity and Access Management (IAM) console, and then do the following:
    1. In the navigation pane, choose Roles.
    2. In the search pane for roles, search for aws-quicksight-s3-consumers-role-v0. If this role does not exist, search for aws-quicksight-service-role-v0.
    3. Select the role and copy the role ARN. You will need this role ARN to modify the KMS key policy to grant permissions for this role.
  2. Open the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created.
    3. Paste the following statement in the key policy. When you add the statement, do not delete any existing statements, and make sure that the syntax is valid. Replace <QUICKSIGHT_SERVICE_ROLE_ARN> and <KMS_KEY_ARN> with your own information. Policies are in JSON format.
	{ "Sid": "Allow Quicksight Service Role to use the key",
		"Effect": "Allow",
		"Principal": {
			"AWS": <QUICKSIGHT_SERVICE_ROLE_ARN>
		},
		"Action": "kms:Decrypt",
		"Resource": <KMS_KEY_ARN>
	}

To allow QuickSight access to Athena and the discovery results S3 bucket

  1. In QuickSight, in the upper right, choose your user icon to open the profile menu, and choose US East (N.Virginia). You can only modify permissions in this Region.
  2. In the upper right, open the profile menu again, and select Manage QuickSight.
  3. Select Security & permissions.
  4. Under QuickSight access to AWS services, choose Manage.
  5. Make sure that the S3 checkbox is selected, click on Select S3 buckets, and then do the following:
    1. Choose the discovery results bucket.
    2. You do not need to check the box under Write permissions for Athena workgroup. The write permissions are not required for this post.
    3. Select Finish.
  6. Make sure that the Amazon Athena checkbox is selected.
  7. Review the selections and be careful that you don’t inadvertently disable AWS services and resources that other users might be using.
  8. Select Save.
  9. In QuickSight, in the upper right, open the profile menu, and choose the Region where your results bucket is located.

Now that you’ve granted QuickSight the right permissions, you can begin creating visualizations.

To create a new dataset referencing the Athena table

  1. On the QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. From the list of data sources, select Athena.
  4. Enter a meaningful name for the data source (for example, macie_datasource) and choose Create data source.
  5. Select the database that you created in Athena (for example, macie_results).
  6. Select the table that you created in Athena (for example, maciedetail_all_jobs), and choose Select.
  7. You can either import the data into SPICE or query the data directly. We recommend that you use SPICE for improved performance, but the visualizations will still work if you query the data directly.
  8. To create an analysis using the data as-is, choose Visualize.

You can then visualize the Macie results in the QuickSight console. The following example shows a delegated Macie administrator account that is running a visualization, with account IDs on the y axis and the count of affected resources on the x axis.

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

You can also visualize the aggregated data in QuickSight. For example, you can view the number of findings for each sensitive data category in each S3 bucket. The Athena table doesn’t provide aggregated data necessary for visualization. Instead, you need to query the table and then visualize the output of the query.

To query the table and visualize the output in QuickSight

  1. On the Amazon QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. Select the data source that you created in Athena (for example, macie_datasource) and then choose Create Dataset.
  4. Select the database that you created in Athena (for example, macie_results).
  5. Choose Use Custom SQL, enter the following query below, and choose Confirm Query.
    	select resourcesaffected.s3bucket.name as bucket_name,
    		sensitive_data.category,
    		detections_data.type,
    		sum(cast(detections_data.count as INT)) total_detections
    	from macie_results.maciedetail_all_jobs,
    		unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by resourcesaffected.s3bucket.name, sensitive_data.category, detections_data.type
    order by total_detections desc
    	

  6. You can either import the data into SPICE or query the data directly.
  7. To create an analysis using the data as-is, choose Visualize.

Now you can visualize the output of the query that aggregates data across your S3 buckets. For example, we used the name of the S3 bucket to group the results, and then we created a donut chart of the output, as shown in Figure 6.

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

From the visualizations, we can identify which buckets or accounts in our organizations contain the most sensitive data, for further action. Visualizations can also act as a dashboard to track remediation.

If you encounter permissions issues, see Insufficient permissions when using Athena with Amazon QuickSight and Troubleshooting key access for troubleshooting steps.

You can replicate the preceding steps by using the sample queries from the amazon-macie-results-analytics GitHub repo to view data that is aggregated across S3 buckets, AWS accounts, or individual Macie jobs. Using these queries with the results of your Macie results will help you get started with tracking the security posture of your data in Amazon S3.

Conclusion

In this post, you learned how to enable sensitive data discovery results for Macie, query those results with Athena, and visualize the results in QuickSight.

Because Macie sensitive data discovery results provide more granular data than the findings, you can pursue a more comprehensive incident response when sensitive data is discovered. The sample queries in this post provide answers to some generic questions that you might have. After you become familiar with the structure, you can run other interesting queries on the data.

We hope that you can use this solution to write your own queries to gain further insights into sensitive data discovered in S3 buckets, according to the business needs and regulatory requirements of your organization. You can consider using this solution to better understand and identify data security risks that need immediate attention. For example, you can use this solution to answer questions such as the following:

  • Is financial information present in an AWS account where it shouldn’t be?
  • Are S3 buckets that contain PII properly hardened with access controls and encryption?

You can also use this solution to understand gaps in your data security initiatives by tracking files that Macie couldn’t analyze due to encryption or permission issues. To further expand your knowledge of Macie capabilities and features, see the following resources:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Keith Rozario

Keith is a Sr. Solution Architect at Amazon Web Services based in Singapore, where he helps customers develop solutions for their most complex business problems. He loves road cycling, reading comics from DC, and enjoying the sweet sound of music from AC/DC.

Author

Scott Ward

Scott is a Principal Solutions Architect with AWS External Security Services (ESS) and has been with Amazon for over 20 years. Scott provides technical guidance to the ESS services, such as GuardDuty, Security Hub, Macie, Inspector and Detective, and helps customers make their applications secure. Scott has a deep background in supporting, enhancing, and building global financial solutions to meet the needs of large companies, including many years of supporting the global financial systems for Amazon.com.

Author

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Amazon S3 Encrypts New Objects By Default

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-s3-encrypts-new-objects-by-default/

At AWS, security is job zero. Starting today, Amazon Simple Storage Service (Amazon S3) encrypts all new objects by default. Now, S3 automatically applies server-side encryption (SSE-S3) for each new object, unless you specify a different encryption option. SSE-S3 was first launched in 2011. As Jeff wrote at the time: “Amazon S3 server-side encryption handles all encryption, decryption, and key management in a totally transparent fashion. When you PUT an object, we generate a unique key, encrypt your data with the key, and then encrypt the key with a [root] key.”

This change puts another security best practice into effect automatically—with no impact on performance and no action required on your side. S3 buckets that do not use default encryption will now automatically apply SSE-S3 as the default setting. Existing buckets currently using S3 default encryption will not change.

As always, you can choose to encrypt your objects using one of the three encryption options we provide: S3 default encryption (SSE-S3, the new default), customer-provided encryption keys (SSE-C), or AWS Key Management Service keys (SSE-KMS). To have an additional layer of encryption, you might also encrypt objects on the client side, using client libraries such as the Amazon S3 encryption client.

While it was simple to enable, the opt-in nature of SSE-S3 meant that you had to be certain that it was always configured on new buckets and verify that it remained configured properly over time. For organizations that require all their objects to remain encrypted at rest with SSE-S3, this update helps meet their encryption compliance requirements without any additional tools or client configuration changes.

With today’s announcement, we have now made it “zero click” for you to apply this base level of encryption on every S3 bucket.

Verify Your Objects Are Encrypted
The change is visible today in AWS CloudTrail data event logs. You will see the changes in the S3 section of the AWS Management Console, Amazon S3 Inventory, Amazon S3 Storage Lens, and as an additional header in the AWS CLI and in the AWS SDKs over the next few weeks. We will update this blog post and documentation when the encryption status is available in these tools in all AWS Regions.

To verify the change is effective on your buckets today, you can configure CloudTrail to log data events. By default, trails do not log data events, and there is an extra cost to enable it. Data events show the resource operations performed on or within a resource, such as when a user uploads a file to an S3 bucket. You can log data events for Amazon S3 buckets, AWS Lambda functions, Amazon DynamoDB tables, or a combination of those.

Once enabled, search for PutObject API for file uploads or InitiateMultipartUpload for multipart uploads. When Amazon S3 automatically encrypts an object using the default encryption settings, the log includes the following field as the name-value pair: "SSEApplied":"Default_SSE_S3". Here is an example of a CloudTrail log (with data event logging enabled) when I uploaded a file to one of my buckets using the AWS CLI command aws s3 cp backup.sh s3://private-sst.

Cloudtrail log for S3 with default encryption enabled

Amazon S3 Encryption Options
As I wrote earlier, SSE-S3 is now the new base level of encryption when no other encryption-type is specified. SSE-S3 uses Advanced Encryption Standard (AES) encryption with 256-bit keys managed by AWS.

You can choose to encrypt your objects using SSE-C or SSE-KMS rather than with SSE-S3, either as “one click” default encryption settings on the bucket, or for individual objects in PUT requests.

SSE-C lets Amazon S3 perform the encryption and decryption of your objects while you retain control of the keys used to encrypt objects. With SSE-C, you don’t need to implement or use a client-side library to perform the encryption and decryption of objects you store in Amazon S3, but you do need to manage the keys that you send to Amazon S3 to encrypt and decrypt objects.

With SSE-KMS, AWS Key Management Service (AWS KMS) manages your encryption keys. Using AWS KMS to manage your keys provides several additional benefits. With AWS KMS, there are separate permissions for the use of the KMS key, providing an additional layer of control as well as protection against unauthorized access to your objects stored in Amazon S3. AWS KMS provides an audit trail so you can see who used your key to access which object and when, as well as view failed attempts to access data from users without permission to decrypt the data.

When using an encryption client library, such as the Amazon S3 encryption client, you retain control of the keys and complete the encryption and decryption of objects client-side using an encryption library of your choice. You encrypt the objects before they are sent to Amazon S3 for storage. The Java, .Net, Ruby, PHP, Go, and C++ AWS SDKs support client-side encryption.

You can follow the instructions in this blog post if you want to retroactively encrypt existing objects in your buckets.

Available Now
This change is effective now, in all AWS Regions, including on AWS GovCloud (US) and AWS China Regions. There is no additional cost for default object-level encryption.

— seb

Graph service platform

Post Syndicated from Grab Tech original https://engineering.grab.com/graph-service-platform

Introduction

In earlier articles of this series, we covered the importance of graph networks, graph concepts, how graph visualisation makes fraud investigations easier and more effective, and how graphs for fraud detection work. In this article, we elaborate on the need for a graph service platform and how it works.

In the present age, data linkages can generate significant business value. Whether we want to learn about the relationships between users in online social networks, between users and products in e-commerce, or understand credit relationships in financial networks, the capability to understand and analyse large amounts of highly interrelated data is becoming more important to businesses.

As the amount of consumer data grows, the GrabDefence team must continuously enhance fraud detection on mobile devices to proactively identify the presence of fraudulent or malicious users. Even simple financial transactions between users must be monitored for transaction loops and money laundering. To preemptively detect such scenarios, we need a graph service platform to help discover data linkages. 

Background

As mentioned in an earlier article, a graph is a model representation of the association of entities and holds knowledge in a structured way by marginalising entities and relationships. In other words, graphs hold a natural interpretability of linked data and graph technology plays an important role. Since the early days, large tech companies started to create their own graph technology infrastructure, which is used for things like social relationship mining, web search, and sorting and recommendation systems with great commercial success.

As graph technology was developed, the amount of data gathered from graphs started to grow as well, leading to a need for graph databases. Graph databases1 are used to store, manipulate, and access graph data on the basis of graph models. It is similar to the relational database with the feature of Online Transactional Processing (OLTP), which supports transactions, persistence, and other features.

A key concept of graphs is the edge or relationship between entities. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. These relationships allow data in the store to be linked directly and retrieved with one operation.

With graph databases, relationships between data can be queried fast as they are perpetually stored in the database. Additionally, relationships can be intuitively visualised using graph databases, making them useful for heavily interconnected data. To have real-time graph search capabilities, we must leverage the graph service platform and graph databases.

Architecture details

Graph services with graph databases are Platforms as a Service (PaaS) that encapsulate the underlying implementation of graph technology and support easier discovery of data association relationships with graph technologies.

They also provide universal graph operation APIs and service management for users. This means that users do not need to build graph runtime environments independently and can explore the value of data with graph service directly.

Fig. 1 Graph service platform system architecture

As shown in Fig. 1, the system can be divided into four layers:

  1. Storage backend – Different forms of data (for example, CSV files) are stored in Amazon S3, graph data stores in Neptune and meta configuration stores in DynamoDB.
  2. Driver – Contains drivers such as Gremlin, Neptune, S3, and DynamoDB.
  3. Service – Manages clusters, instances, databases etc, provides management API, includes schema and data load management, graph operation logic, and other graph algorithms.
  4. RESTful APIs – Currently supports the standard and uniform formats provided by the system, the Management API, Search API for OLTP, and Analysis API for online analytical processing (OLAP).

How it works

Graph flow

Fig. 2 Graph flow

CSV files stored in Amazon S3 are processed by extract, transform, and load (ETL) tools to generate graph data. This data is then managed by an Amazon Neptune DB cluster, which can only be accessed by users through graph service. Graph service converts user requests into asynchronous interactions with Neptune Cluster, which returns the results to users.

When users launch data load tasks, graph service synchronises the entity and attribute information with the CSV file in S3, and the schema stored in DynamoDB. The data is only imported into Neptune if there are no inconsistencies.

The most important component in the system is the graph service, which provides RESTful APIs for two scenarios: graph search for real-time streams and graph analysis for batch processing. At the same time, the graph service manages clusters, databases, instances, users, tasks, and meta configurations stored in DynamoDB, which implements features of service monitor and data loading offline or stream ingress online.

Use case in fraud detection

In Grab’s mobility business, we have come across situations where multiple accounts use shared physical devices to maximise their earning potential. With the graph capabilities provided by the graph service platform, we can clearly see the connections between multiple accounts and shared devices.

Historical device and account data are stored in the graph service platform via offline data loading or online stream injection. If the device and account data exists in the graph service platform, we can find the adjacent account IDs or the shared device IDs by using the device ID or account ID respectively specified in the user request.

In our experience, fraudsters tend to share physical resources to maximise their revenue. The following image shows a device that is shared by many users. With our Graph Visualisation platform based on graph service, you can see exactly what this pattern looks like.

Fig 3. Example of a device being shared with many users

Data injection

Fig. 4 Data injection

Graph service also supports data injection features, including data load by request (task with a type of data load) and real-time stream write by Kafka.  

When connected to GrabDefence’s infrastructure, Confluent with Kafka is used as the streaming engine.  The purpose of using Kafka as a streaming write engine is two-fold: to provide primary user authentication and to relieve the pressure on Neptune.

Impact

Graph service supports data management of Labelled Property Graphs and provides the capability to add, delete, update, and get vertices, edges, and properties for some graph models. Graph traversal and searching relationships with RESTful APIs are also more convenient with graph service.

Businesses usually do not need to focus on the underlying data storage, just designing graph schemas for model definition according to their needs. With the graph service platform, platforms or systems can be built for personalised search, intelligent Q&A, financial fraud, etc.

For big organisations, extensive graph algorithms provide the power to mine various entity connectivity relationships in massive amounts of data. The growth and expansion of new businesses is driven by discovering the value of data.

What’s next?

Fig. 5 Graph-centric ecosystems

We are building an integrated graph ecosystem inside and outside Grab. The infrastructure and service, or APIs are key components in graph-centric ecosystems; they provide graph arithmetic and basic capabilities of graphs in relation to search, computing, analysis etc. Besides that, we will also consider incorporating applications such as risk prediction and fraud detection in order to serve our current business needs.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

References

How Cloudflare can help stop malware before it reaches your app

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/waf-content-scanning/

How Cloudflare can help stop malware before it reaches your app

How Cloudflare can help stop malware before it reaches your app

Let’s assume you manage a job advert site. On a daily basis job-seekers will be uploading their CVs, cover letters and other supplementary documents to your servers. What if someone tried to upload malware instead?

Today we’re making your security team job easier by providing a file content scanning engine integrated with our Web Application Firewall (WAF), so that malicious files being uploaded by end users get blocked before they reach application servers.

Enter WAF Content Scanning.

If you are an enterprise customer, reach out to your account team to get access.

Making content scanning easy

At Cloudflare, we pride ourselves on making our products very easy to use. WAF Content Scanning was built with that goal in mind. The main requirement to use the Cloudflare WAF is that application traffic is proxying via the Cloudflare network. Once that is done, turning on Content Scanning requires a single API call.

Once on, the WAF will automatically detect any content being uploaded, and when found, scan it and provide the results for you to use when writing WAF Custom Rules or reviewing security analytics dashboards.

The entire process runs inline with your HTTP traffic and requires no change to your application.

As of today, we scan files up to 1 MB. You can easily block files that exceed this size or perform other actions such as log the upload.

To block a malicious file, you could write a simple WAF Custom Rule like the following:

if: (cf.waf.content_scan.has_malicious_obj)

then: BLOCK

In the dashboard the rule would look like this:

How Cloudflare can help stop malware before it reaches your app

Many other use cases can be achieved by leveraging the metadata exposed by the WAF Content Scanning engine. For example, let’s say you only wanted to allow PDF files to be uploaded on a given endpoint. You would achieve this by deploying the following WAF Custom Rule:

if: any(cf.waf.content_scan.obj_types[*] != "application/pdf") and http.request.uri.path eq "/upload"

then: BLOCK

This rule will, for any content file being uploaded to the /upload endpoint, block the HTTP request if at least one file is not a PDF.

More generally, let’s assume your application does not expect content to be uploaded at all. In this case, you can block any upload attempts with:

if: (cf.waf.content_scan.has_obj)

then: BLOCK

Another very common use case is supporting file upload endpoints that accept JSON content. In this instance files are normally embedded into a JSON payload after being base64-encoded. If your application has such an endpoint, you can provide additional metadata to the scanning engine to recognise the traffic by submitting a custom scan expression. Once submitted, files within JSON payloads will be parsed, decoded, and scanned automatically. In the event you want to issue a block action, you can use a JSON custom response type so that your web application front end can easily parse and display error messages:

How Cloudflare can help stop malware before it reaches your app

The full list of fields exposed by the WAF Content Scanning engine can be found on our developer documentation.

The engine

Scanned content objects

A lot of time designing the system was spent defining what should be scanned. Defining this properly helps us ensure that we are not scanning unnecessary content reducing latency impact and CPU usage, and that there are no bypasses making the system complementary to existing WAF functionality.

The complexity stems from the fact that there is no clear definition of a “file” in HTTP terms. That’s why in this blog post and in the system design, we refer to “content object” instead.

At a high level, although this can loosely be defined as a “file”, not all “content objects” may end up being stored in the file system of an application server! Therefore, we need a definition that applies to HTTP. Additional complexity is given by the fact this is a security product, and attackers will always try to abuse HTTP to obfuscate/hide true intentions. So for example, although a Content-Type header may indicate that the request body is a jpeg image, it may actually be a pdf.

With the above in mind, a “content object” as of today, is any request payload that is detected by heuristics (so no referring to the Content-Type header) to be anything that is not text/html, text/x-shellscript, application/json or text/xml. All other content types are considered a content object.

Detecting via heuristics the content type of an HTTP request body is not enough, as content objects might be found within portions of the HTTP body or encoded following certain rules, such as when using multipart/form-data, which is the most common encoding used when creating standard HTML file input forms.

So when certain payload formats are found, additional parsing applies. As of today the engine will automatically parse and perform content type heuristics on individual components of the payload, when the payload is either encoded using multipart/form-data or multipart/mixed or a JSON string that may have “content objects” embedded in base64 format as defined by the customer

In these cases, we don’t scan the entire payload as a single content object, but we parse it following the relevant standard and apply scanning, if necessary, to the individual portions of the payload. That allows us to support scanning of more than one content object per request, such as an HTML form that has multiple file inputs. We plan to add additional automatic detections in the future on complex payloads moving forward.

In the event we end up finding a malicious match, but we were not able to detect the content type correctly, we will default to reporting a content type of application/octet-stream in the Cloudflare logs/dashboards.

Finally, it is worth noting that we explicitly avoid scanning anything that is plain text (HTML, JSON, XML etc.) as finding attack vectors in these payloads is already covered by the WAF, API Gateway and other web application security solutions already present in Cloudflare’s portfolio.

Local scans

At Cloudflare, we try to leverage our horizontal architecture to build scalable software. This means the underlying scanner is deployed on every server that handles customer HTTP/S traffic. The diagram below describes the setup:

How Cloudflare can help stop malware before it reaches your app

Having each server perform the scanning locally helps ensure latency impact is reduced to a minimum to applicable HTTP requests. The actual scanning engine is the same one used by the Cloudflare Web Gateway, our forward proxy solution that among many other things, helps keep end user devices safe by blocking attempts to download malware.

Consequently, the scanning capabilities provided match those exposed by the Web Gateway AV scanning. The main difference as of today, is the maximum file size currently limited at 1 MB versus 15 MB in Web Gateway. We are working on increasing this to match the Web Gateway in the coming months.

Separating detection from mitigation

A new approach that we are adopting within our application security portfolio is the separation of detection from mitigation. The WAF Content Scanning features follow this approach, as once turned on, it simply enhances all available data and fields with scan results. The benefits here are twofold.

First, this allows us to provide visibility into your application traffic, without you having to deploy any mitigation. This automatically opens up a great use case: discovery. For large enterprise applications security teams may not be aware of which paths or endpoints might be expecting file uploads from the Internet. Using our WAF Content Scanning feature in conjunction with our new Security Analytics they can now filter on request traffic that has a file content object (a file being uploaded) to observe top N paths and hostnames, exposing such endpoints.

How Cloudflare can help stop malware before it reaches your app

Second, as mentioned in the prior section, exposing the intelligence provided by Cloudflare as fields that can be used in our WAF Custom Rules allows us to provide a very flexible platform. As a plus, you don’t need to learn how to use a new feature, as you are likely already familiar with our WAF Custom Rule builder.

This is not a novel idea, and our Bot Management solution was the first to trial it with great success. Any customer who uses Bot Management today gains access to a bot score field that indicates the likelihood of a request coming from automation or a human. Customers use this field to deploy rules that block bots.

To that point, let’s assume you run a job applications site, and you do not wish for bots and crawlers to automatically submit job applications. You can now block file uploads coming from bots!

if: (cf.bot_management.score lt 10 and cf.waf.content_scan.has_obj)

then: BLOCK

And that’s the power we wish to provide at your fingertips.

Next steps

Our WAF Content Scanning is a new feature, and we have several improvements planned, including increasing the max content size scanned, exposing the “rewrite” action, so you can send malicious files to a quarantine server, and exposing better analytics that allow you to explore the data more easily without deploying rules. Stay tuned!

A Security Issue in Android That Remains Unfixed – Pull-down Menu On Lock Screen

Post Syndicated from Bozho original https://techblog.bozho.net/a-security-issue-in-android-that-remains-unfixed-pull-down-menu-on-lock-screen/

Having your phone lying around when your kids are playing with everything they find is a great security test. They immediately discover new features and ways to go beyond the usual flow.

This is the way I recently discovered a security issue with Android. Apparently, even if the phone is locked, the pull-down menu with quick settings works. Also, volume control works. Not every functionality inside the quick settings menu works fully while unlocked, but you can disable mobile data and Wi-Fi, you can turn on your hotspot, you can switch to Airplane mode.

While this has been pointed out on Google Pixel forums, on reddit and Stack Exchange, it has not been fixed in stock Android. Different manufacturers seem to have acknowledged the issue in their custom ROMs, but that’s not a reliable long-term solution.

Let me explain why this is an issue. First, it breaks the assumption that when the phone is locked nothing works. Breaking user assumptions is bad by itself.

Second, it allows criminals to steal your phone and put in in Airplane mode, thus disabling any ability to track the phone – either through “find my phone” services, or by the police through mobile carriers. They can silence the phone, so that it’s not found with “ring my phone” functionality. It’s true that an attacker can just take out the SIM card, but having the Wi-Fi on still allows tracking using wifi networks through which the phone passes.

Third, the hotspot (similar issues go with Bluetooth). Allowing a connection can be used to attack the device. It’s not trivial, but it’s not impossible either. It can also be used to do all sorts of network attacks on other devices connected to the hotspot (e.g. you enable the hotspot, a laptop connects automatically, and you execute an APR poisoning attack). The hotspot also allows attackers to use a device to commit online crimes and frame the owner. Especially if they do not steal the phone, but leave it lying where it originally was, just with the hotspot turned on. Of course, they would need to get the password for the hotspot, but this can be obtained through social engineering.

The interesting thing is that when you use Google’s Family Link to lock a device that’s given to a child, the pull-down menu doesn’t work. So the basic idea that “once locked, nothing should be accessible” is there, it’s just not implemented in the default use-case.

While the things described above are indeed edge-cases and may be far fetched, I think they should be fixed. The more functionality is available on a locked phone, the more attack surface it has (including for the exploitation of 0days).

The post A Security Issue in Android That Remains Unfixed – Pull-down Menu On Lock Screen appeared first on Bozho's tech blog.

AWS CIRT announces the release of five publicly available workshops

Post Syndicated from Steve de Vera original https://aws.amazon.com/blogs/security/aws-cirt-announces-the-release-of-five-publicly-available-workshops/

Greetings from the AWS Customer Incident Response Team (CIRT)! AWS CIRT is dedicated to supporting customers during active security events on the customer side of the AWS Shared Responsibility Model.

Over the past year, AWS CIRT has responded to hundreds of such security events, including the unauthorized use of AWS Identity and Access Management (IAM) credentials, ransomware and data deletion in an AWS account, and billing increases due to the creation of unauthorized resources to mine cryptocurrency.

We are excited to release five workshops that simulate these security events to help you learn the tools and procedures that AWS CIRT uses on a daily basis to detect, investigate, and respond to such security events. The workshops cover AWS services and tools, such as Amazon GuardDuty, Amazon CloudTrail, Amazon CloudWatch, Amazon Athena, and AWS WAF, as well as some open source tools written and published by AWS CIRT.

To access the workshops, you just need an AWS account, an internet connection, and the desire to learn more about incident response in the AWS Cloud! Choose the following links to access the workshops.

Unauthorized IAM Credential Use – Security Event Simulation and Detection

During this workshop, you will simulate the unauthorized use of IAM credentials by using a script invoked within AWS CloudShell. The script will perform reconnaissance and privilege escalation activities that have been commonly seen by AWS CIRT and that are typically performed during similar events of this nature. You will also learn some tools and processes that AWS CIRT uses, and how to use these tools to find evidence of unauthorized activity by using IAM credentials.

Ransomware on S3 – Security Event Simulation and Detection

During this workshop, you will use an AWS CloudFormation template to replicate an environment with multiple IAM users and five Amazon Simple Storage Service (Amazon S3) buckets. AWS CloudShell will then run a bash script that simulates data exfiltration and data deletion events that replicate a ransomware-based security event. You will also learn the tools and processes that AWS CIRT uses to respond to similar events, and how to use these tools to find evidence of unauthorized S3 bucket and object deletions.

Cryptominer Based Security Events – Simulation and Detection

During this workshop, you will simulate a cryptomining security event by using a CloudFormation template to initialize three Amazon Elastic Compute Cloud (Amazon EC2) instances. These EC2 instances will mimic cryptomining activity by performing DNS requests to known cryptomining domains. You will also learn the tools and processes that AWS CIRT uses to respond to similar events, and how to use these tools to find evidence of unauthorized creation of EC2 instances and communication with known cryptomining domains.

SSRF on IMDSv1 – Simulation and Detection

During this workshop, you will simulate the unauthorized use of a web application that is hosted on an EC2 instance configured to use Instance Metadata Service Version 1 (IMDSv1) and vulnerable to server side request forgery (SSRF). You will learn how web application vulnerabilities, such as SSRF, can be used to obtain credentials from an EC2 instance. You will also learn the tools and processes that AWS CIRT uses to respond to this type of access, and how to use these tools to find evidence of the unauthorized use of EC2 instance credentials through web application vulnerabilities such as SSRF.

AWS CIRT Toolkit For Automating Incident Response Preparedness

During this workshop, you will install and experiment with some common tools and utilities that AWS CIRT uses on a daily basis to detect security misconfigurations, respond to active events, and assist customers with protecting their infrastructure.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Steve de Vera

Steve is the Incident Response Watch Lead for the US Pacific region of the AWS CIRT. He is passionate about American-style BBQ and is a certified competition BBQ judge. He has a dog named Brisket.

How we use GitHub to be more productive, collaborative, and secure

Post Syndicated from Mike Hanley original https://github.blog/2022-12-20-how-we-use-github-to-be-more-productive-collaborative-and-secure/

It’s that time of year where we’re all looking back at what we’ve accomplished and thinking ahead to goals and plans for the calendar year to come. As part of GitHub Universe, I shared some numbers that provided a window into the work our engineering and security teams drive each day on behalf of our community, customers, and Hubbers. As someone who loves data, it’s not just fun to see how we operate GitHub at scale, but it’s also rewarding to see how this work contributes to our vision to be the home for all developers–which includes our own engineering and security teams.

Over the course of the past year1, GitHub staff made millions of commits across all of our internal repositories. That’s a ton of branches, pull requests, Issues, and more. We processed billions of API requests daily. And we ran tens of thousands of production deployments across the internal apps that power GitHub’s services. If you do the math, that’s hundreds of deploys per day.

GitHub is big. But the reality is, no matter your size, your scale, or your stage, we’re all dealing with the same questions. Those questions boil down to how to optimize for productivity, collaboration, and, of course, security.

It’s a running joke internally that you have to type “GitHub” three times to get to the monolith. So, let’s take a look at how we at GitHub (1) use GitHub (2) to build the GitHub (3) you rely on.

Productivity

GitHub’s cloud-powered experiences, namely Codespaces and GitHub Copilot, have been two of the biggest game changers for us in the past few years.

Codespaces

It’s no secret that local development hasn’t evolved much in the past decade. The github/github repository, where much of what you experience on GitHub.com lives, is fairly large and took several minutes to clone even on a good network connection. Combine this with setting up dependencies and getting your environment the way you like it, spinning up a local environment used to take 45 minutes to go from checkout to a built local developer environment.

But now, with Codespaces, a few clicks and less than 60 seconds later, you’re in a working development environment that’s running on faster hardware than the MacBook I use daily.

Heating my home office in the chilly Midwest with my laptop doing a local build was nice, but it’s a thing of the past. Moving to Codespaces last year has truly impacted our day-to-day developer experience, and we’re not looking back.

GitHub Copilot

We’ve been using GitHub Copilot for more than a year internally, and it still feels like magic to me every day. We recently published a study that looked at GitHub Copilot performance across two groups of developers–one that used GitHub Copilot and one that didn’t. To no one’s surprise, the group that used GitHub Copilot was able to complete the same task 55% faster than the group that didn’t have GitHub Copilot.

Getting the job done faster is great, but the data also provided incredible insight into developer satisfaction. Almost three-quarters of the developers surveyed said that GitHub Copilot helped them stay in the flow and spend more time focusing on the fun parts of their jobs. When was the last time you adopted an experience that made you love your job more? It’s an incredible example of putting developers first that has completely changed how we build here at GitHub.

Collaboration

At GitHub, we’re remote-first and we have highly distributed teams, so we prioritize discoverability and how we keep teams up-to-date across our work. That’s where tools like Issues and projects come into play. They allow us to plan, track, and collaborate in a centralized place that’s right next to the code we’re working on.

Incorporating projects across our security team has made it easier for us to not only track our work, but also to help people understand how their work fits into the company’s broader mission and supports our customers.

Projects gives us a big picture view of our work, but what about the more tactical discovery of a file, function, or new feature another team is building? When you’re working on a massive 15-year-old codebase (looking at you, GitHub), sometimes you need to find code that was written well before you even joined the company, and that can feel like trying to find a needle in a haystack.

So, we’ve adopted the new code search and code view, which has helped our developers quickly find what they need without losing velocity. This improved discoverability, along with the enhanced organization offered by Issues and projects, has had huge implications for our teams in terms of how we’ve been able to collaborate across groups.

Shifting security left

Like we saw when we looked at local development environments, the security industry still struggles with the same issues that have plagued us for more than a decade. Exposed credentials, as an example, are still the root cause for more than half of all data breaches today2. Phishing is still the best, and cheapest, way for an adversary to get into organizations and wreak havoc. And we’re still pleading with organizations to implement multi-factor authentication to keep the most basic techniques from bad actors at bay.

It’s time to build security into everything we do across the developer lifecycle.

The software supply chain starts with the developer. Normalizing the use of strong authentication is one of the most important ways that we at GitHub, the home of open source, can help defend the entire ecosystem against supply chain attacks. We enforce multi-factor authentication with security keys for our internal developers, and we’re requiring that every developer who contributes software on GitHub.com enable 2FA by the end of next year. The closer we can bring our security and engineering teams together, the better the outcomes and security experiences we can create together.

Another way we do that is by scaling the knowledge of our security teams with tools like CodeQL to create checks that are deployed for all our developers, protecting all our users. And because the CodeQL queries are open source, the vulnerability patterns shared by security teams at GitHub or by our customers end up as CodeQL queries that are then available for everyone. This acts like a global force multiplier for security knowledge in the developer and security communities.

Security shouldn’t be gatekeeping your teams from shipping. It should be the process that enables them to ship quickly–remember our hundreds of production deployments per day?–and with confidence.

Big, small, or in-between

As you see, GitHub has the same priorities as any other development team out there.

It doesn’t matter if you’re processing billions of API requests a day, like we are, or if you’re just starting on that next idea that will be launched into the world.

These are just a few ways over the course of the last year that we’ve used GitHub to build our own platform securely and improve our own developer experiences, not only to be more productive, collaborative, and secure, but to be creative, to be happier, and to build the best work of our lives.

To learn more about how we use GitHub to build GitHub, and to see demos of the features highlighted here, take a look at this talk from GitHub Universe 2022.

Notes


  1. Data collected January-October 2022 
  2. Verizon DBIR 

Introducing the Security Design of the AWS Nitro System whitepaper

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/introducing-the-security-design-of-the-aws-nitro-system-whitepaper/

AWS recently released a whitepaper on the Security Design of the AWS Nitro System. The Nitro System is a combination of purpose-built server designs, data processors, system management components, and specialized firmware that serves as the underlying virtualization technology that powers all Amazon Elastic Compute Cloud (Amazon EC2) instances launched since early 2018. With the Nitro System, AWS undertook an effort to reimagine the architecture of virtualization to deliver security, isolation, performance, cost savings, and a pace of innovation that our customers require.

This whitepaper is a detailed design document on the inner workings of the AWS Nitro System, and how we use it to help secure your most critical workloads. This is the first time that AWS has provided such a detailed design document on the Nitro System and how it offers a no-operator access design and strong tenant isolation. The whitepaper describes the security design of the Nitro System in detail to help you evaluate Amazon EC2 for your sensitive workloads.

Three key components of the Nitro System are used to implement this design:

  • Purpose-built Nitro Cards – Hardware devices designed by AWS that provide overall system control and I/O virtualization that is independent of the main system board with its CPUs and memory.
  • Nitro Security Chip – Enables a secure boot process for the overall system based on a hardware root of trust, the ability to offer bare metal instances, and defense-in-depth that offers protection to the server from unauthorized modification of system firmware.
  • Nitro Hypervisor – A deliberately minimized and firmware-like hypervisor designed to provide strong resource isolation, and performance that is nearly indistinguishable from a bare metal server.

The whitepaper describes the fundamental architectural change introduced by the Nitro System compared to previous approaches to virtualization. It discusses the three key components of the Nitro System, and provides a demonstration of how these components work together by walking through what happens when a new Amazon Elastic Block Store (Amazon EBS) volume is added to a running EC2 instance. The whitepaper also discusses how the Nitro System is designed to eliminate the possibility of administrator access to an EC2 server, the overall passive communications design of the Nitro System, and the Nitro System change management process. Finally, the paper surveys important aspects of the EC2 system design that provide mitigations against potential side-channel issues that can arise in compute environments.

The whitepaper dives deep into each of these considerations, offering a detailed picture of the Nitro System security design. For more information about cloud security at AWS, contact us.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

J.D. Bean

J.D. is a Principal Security Architect for Amazon EC2 based out of New York City. His interests include security, privacy, and compliance. He is passionate about his work enabling AWS customers’ successful cloud journeys. J.D. holds a Bachelor of Arts from The George Washington University and a Juris Doctor from New York University School of Law.

AWS achieves GNS Portugal certification for classified information

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/aws-achieves-gns-portugal-certification-for-classified-information/

GNS Logo

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS), and we are pleased to announce that our Regions and AWS Edge locations in Europe are now certified by the Portuguese GNS/NSO (National Security Office) at the National Restricted level. This certification demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers to process, transmit, and store classified data.

The GNS certification is based on NIST SP800-53 R4 and CSA CCM v4 frameworks, with the goal of protecting the processing and transmission of classified information.

AWS was evaluated by Adyta Lda, an independent third-party auditor, and by GNS Portugal. The Certificate of Compliance illustrating the compliance status of AWS is available on the GNS Certifications page and through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

As of this writing, 26 services offered in Europe are in scope of this certification. For up-to-date information, including when additional services are added, see the AWS Services in Scope by Compliance Program and select GNS.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about GNS Portugal compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Approaches for authenticating external applications in a machine-to-machine scenario

Post Syndicated from Patrick Sard original https://aws.amazon.com/blogs/security/approaches-for-authenticating-external-applications-in-a-machine-to-machine-scenario/

December 8, 2022: This post has been updated to reflect changes for M2M options with the new service of IAMRA. This blog post was first published November 19, 2013.

August 10, 2022: This blog post has been updated to reflect the new name of AWS Single Sign-On (SSO) – AWS IAM Identity Center. Read more about the name change here.


Amazon Web Services (AWS) supports multiple authentication mechanisms (AWS Signature v4, OpenID Connect, SAML 2.0, and more), essential in providing secure access to AWS resources. However, in a strictly machine-to machine (m2m) scenario, not all are a good fit. In these cases, a human is not present to provide user credential input. An example of such a scenario is when an on-premises application sends data to an AWS environment, as shown in Figure 1.

This post is designed to help you decide which approach is best to securely connect your applications, either residing on premises or hosted outside of AWS, to your AWS environment when no human interaction comes into play. We will go through the various alternatives available and highlight the pros and cons of each.

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Determining the best approach

Let’s start by looking at possible authentication mechanisms that AWS supports in the following table. We’ll first identify the AWS service or services where the authentication can be set up—called the AWS front-end service. Then we’ll point out the AWS service that actually handles the authentication with AWS in the background—called the AWS backend service. We will also assess each mechanism based on use case.

Table 1: Authentication mechanisms available in AWS
Authentication mechanism AWS front-end service AWS backend service Good for m2m communication?
AWS Signature v4
  • All
AWS Security Token Service (AWS STS) Yes
Mutual TLS AWS STS Yes
OpenID Connect AWS STS Yes
SAML AWS STS Yes
Kerberos
  • n/a
AWS STS Yes
Microsoft Active Directory communication AWS STS No
IAM Roles Anywhere AWS STS Yes

Notes

We’ll now review each of these alternatives and also evaluate two additional characteristics on a 5-grade scale (from very low to very high) for each authentication mechanism:

  • Complexity: How complex is it to implement the authentication mechanism?
  • Convenience: How convenient is it to use the authentication mechanism on an ongoing basis?

As you’ll see, not all of the mechanisms are necessarily a good fit for a machine-to-machine scenario. Our focus here is on authentication of external applications, but not authentication of servers or other computers or Internet of Things (IoT) devices, which has already been documented extensively.

Active Directory–based authentication is available through either AWS IAM Identity Center or a limited set of AWS services and is meant in both cases to provide end users with access to AWS accounts and business applications. Active Directory–based authentication is also used broadly to authenticate devices such as Windows or Linux computers on a network. However, it isn’t used for authenticating applications with AWS. For that reason, we’ll exclude it from further scrutiny in this article.

Let’s look at the remaining authentication mechanisms one by one, with their respective pros and cons.

AWS Signature v4

The purpose of AWS Signature v4 is to authenticate incoming HTTP(S) requests to AWS services APIs. The AWS Signature v4 process is explained in detail in the documentation for the AWS APIs but, in a nutshell, the caller computes a signature using their credentials and then adds it to the header of the HTTP(S) request. On the other end, AWS accepts the request only if the provided signature is valid.

Figure 2: AWS Signature v4 authentication

Figure 2: AWS Signature v4 authentication

Native to AWS, low in complexity and highly convenient, AWS Signature v4 is the natural choice for machine-to-machine authentication scenarios with AWS. It is used behind the scenes by the AWS Command Line Interface (AWS CLI) and the AWS SDKs.

Pros

  • AWS Signature v4 is very convenient: the signature is built in the SDKs provided by AWS and is automatically computed on the caller’s behalf. If you prefer not to use an SDK, the signature process is a simple computation that can be implemented in any programming language.
  • There are fewer credentials to manage. No need to manage tedious digital certificates or even long-lived AWS credentials, because the AWS Signature v4 process supports temporary AWS credentials.
  • There is no need to interact with a third-party identity provider: once the request is signed, you’re good to go, provided that the signature is valid.

Cons

  • If you prefer not to store long-lived AWS credentials for your on-premises applications, you must first perform authentication through a third-party identity provider to obtain temporary AWS credentials. This would require using either OpenID Connect or SAML, in addition to AWS Signature v4. You could also use IAM Roles Anywhere, which exchanges a trusted certificate for temporary AWS credentials.

Mutual TLS

Mutual TLS, more specifically the mutual authentication mechanism of the Transport Layer Security (TLS) Protocol, allows the authentication of both ends—the client and the server sides—of a communication channel. By default, the server side of the TLS channel is always authenticated. With mutual TLS, the clients must also present a valid X.509 certificate for their identity to be verified.

Amazon API Gateway has recently announced native support for mutual TLS authentication (see this blog post for more details on the new feature). You can enable mutual TLS authentication on custom domains to authenticate your regional REST and HTTP APIs (except for private or edge APIs, for which the new feature is not supported at the time of this writing).

Figure 3: Mutual TLS authentication

Figure 3: Mutual TLS authentication

Mutual TLS can be both time-consuming and complicated to set up, but it is a widespread authentication mechanism.

Pros

  • Mutual TLS is widespread for IoT and business-to-business applications

Cons

  • You need to manage the digital certificates and their lifecycles. This can add significant burden and complexity to your IT operations.
  • You also need, at an application level, to pay special care to revoked certificates to reduce the risk of misuse. Since API Gateway doesn’t automatically verify if a client certificate has been revoked, you have to implement your own logic to do so, such as by using a Lambda authorizer.

OpenID Connect

OpenID Connect (OIDC), specifically OIDC 1.0, is a standard built on top of the OAuth 2.0 authorization framework to provide authentication for mobile and web-based applications. The OIDC client authentication method can be used by a client application to gain access to APIs exposed through Amazon API Gateway. The client application typically authenticates to an OAuth 2.0 authorization server, such as Amazon Cognito or another solution supporting that standard. As a result, the client application obtains a JSON Web Token (JWT) from the OAuth 2.0 authorization server. API Gateway then allows or denies the request based on the JWT validation. For more information about the access control part of this process, see the Amazon API Gateway documentation.

Figure 4: OIDC client authentication

Figure 4: OIDC client authentication

OIDC can be complex to put in place, but it’s a widespread authentication mechanism, especially for mobile and web applications and microservices architecture, including machine-to-machine scenarios.

Pros

  • With OIDC, you avoid storing long-lived AWS credentials for your on-premises applications.
  • OIDC uses REST or JSON message flows over HTTP, which makes it a particularly good fit (compared to SAML) for application developers today.

Cons

  • You need to store and maintain a set of credentials for each client application (such as client id and client secret) and make it accessible to the application. This can add complexity to your IT operations.

SAML

SAML 2.0 is an open standard for exchanging identity and security information between applications and service providers. SAML can be used to delegate authentication to a third-party identity provider, such as an Active Directory environment that is running on premises, and to gain access to AWS by providing a valid SAML assertion. (See About SAML 2.0-based federation to learn how to configure your AWS environment to leverage SAML 2.0.)

IAM validates the SAML assertion with your identity provider and, upon success, provides a set of AWS temporary credentials to the requesting party. The whole process is described in the IAM documentation.

Figure 5: SAML authentication

Figure 5: SAML authentication

SAML can be complex to put in place, but it’s a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios.

Pros

  • With SAML, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.
  • SAML doesn’t prescribe any particular technology or protocol by which the authentication should take place. The developer has total freedom to employ whichever is more convenient or makes more sense: key-based (such as X.509 certificates), ticket-based (such as Kerberos), or another applicable mechanism.
  • SAML is also a good fit when protocol bindings other than HTTP are needed.

Cons

  • Using SAML with AWS requires a third-party identity provider for your on-premises environment.
  • SAML also requires a trust to be established between your identity provider and your AWS environment, which adds more complexity to the process.
  • Because SAML is XML-based, it isn’t as concise or nimble as AWS Signature v4 or OIDC, for example.
  • You need to manage the SAML assertions and their lifecycles. This can add significant burden and complexity to your IT operations.

Kerberos

Initially developed by MIT, Kerberos v5 is an IETF standard protocol that enables client/server authentication on an unprotected network. It isn’t supported out-of-the-box by AWS, but you can use an identity provider, such as Active Directory, to exchange the Kerberos ticket provided to your application for either an OIDC/OAuth token or a SAML assertion that can be validated by AWS.

Figure 6: Kerberos authentication (through SAML or OIDC)

Figure 6: Kerberos authentication (through SAML or OIDC)

Kerberos is highly complex to set up, but it can make sense in cases where you already have an on-premises environment with Kerberos authentication in place.

Pros

  • With Kerberos, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.

Cons

  • Using Kerberos with AWS requires the Kerberos ticket to be converted into something that can be accepted by AWS. Therefore, it requires you to use either the OIDC or SAML authentication mechanisms, as described previously.

IAM Roles Anywhere

IAM Roles Anywhere establishes a trust between your AWS account and the certificate authority (CA) that issues certificates to your on-premises workloads using public key infrastructure (PKI). For a detailed overview, see the blog post Extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere. Your workloads outside of AWS use IAM Roles Anywhere to exchange x.509 certificates for temporary AWS credentials in order to interact with AWS APIs, thus removing the need for long-term credentials in your on-premises applications. IAM Roles Anywhere enables short-term credentials for numerous hybrid environment use cases including machine-to-machine scenarios.

Figure 7: IAMRA authentication process

Figure 7: IAMRA authentication process

IAM Roles Anywhere is a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios where your on-premises workload is accessing AWS resources.

Pros

  • With IAM Roles Anywhere you avoid storing long-lived AWS credentials for your on-premises workloads.
  • You can import a certificate revocation list (CRL) from your certificate authority (CA) to support certificate revocation.

Cons

  • You need to manage the digital certificates and their lifecycles. This can add complexity to your IT operations.
  • IAM Roles Anywhere does not support callbacks to CRL distribution points (CDPs) or Online Certificate Status Protocol (OCSP) endpoints.

Conclusion

Now we’ll collect and summarize this discussion in the following table, with the pros and cons of each approach.

Authentication mechanism AWS front-end service Complexity Convenience
AWS Signature v4
  • All
Low Very High
Mutual TLS
  • AWS IoT Core
  • Amazon API Gateway
Medium High
OpenID Connect
  • Amazon Cognito
  • Amazon API Gateway
Medium High
SAML
  • Amazon Cognito
  • AWS Identity and Access Management (IAM)
High Medium
Kerberos
  • n/a
Very High Low
IAM Roles Anywhere
  • AWS Identity and Access Management (IAM)
Medium High

AWS Signature v4 is the most convenient and least complex mechanism of these options, but as for every situation, it’s important to start from your own requirements and context before making a choice. Additional factors may influence your choice, such as the structure or the culture of your organization, or the resources available for your project. Keeping the discussion focused on simple factors on purpose, we’ve come up with the following actionable decision helper.

Use AWS Signature v4 when:

  • You have access to AWS credentials (temporary or long-lived)
  • You want to call AWS services directly through their APIs

Use mutual TLS when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You plan to call AWS services indirectly through custom-built APIs

Use OpenID Connect when:

  • You need or want to procure temporary AWS credentials by using a REST-based mechanism
  • You want to call AWS services directly through their APIs

Use SAML when:

  • You need to procure temporary AWS credentials
  • You already have a SAML-based authentication process in place
  • You want to call AWS services directly through their APIs

Use Kerberos when:

  • You already have a Kerberos-based authentication process in place
  • None of the previously mentioned mechanisms can be used for your use case

Use IAMRA when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You want to call AWS services directly through their APIs
  • You need temporary security credentials for workloads such as servers, containers, and applications that run outside of AWS

We hope this post helps you find your way among the various alternatives that AWS offers to securely connect your external applications to your AWS environment, and to select the most appropriate mechanism for your specific use case. We look forward to your feedback.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Developer forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Patrick Sard

Patrick works as a Solutions Architect at AWS. Apart from being a cloud enthusiast, Patrick loves practicing tai-chi (preferably Chen style), enjoys an occasional wine-tasting (he trained as a Sommelier), and is an avid tennis player.

Jeremy Wave

Jeremy Ware

Jeremy is a Security Specialist Solutions Architect focused on Identity and Access Management. Jeremy and his team enable AWS customers to implement sophisticated, scalable, and secure IAM architecture and Authentication workflows to solve business challenges. With a background in Security Engineering, Jeremy has spent many years working to raise the Security Maturity gap at numerous global enterprises. Outside of work, Jeremy loves to explore the mountainous outdoors participate in sports such as Snowboarding, Wakeboarding, and Dirt bike riding.

Zero trust with Kafka

Post Syndicated from Grab Tech original https://engineering.grab.com/zero-trust-with-kafka

Introduction

Grab’s real-time data platform team, also known as Coban, has been operating large-scale Kafka clusters for all Grab verticals, with a strong focus on ensuring a best-in-class-performance and 99.99% availability.

Security has always been one of Grab’s top priorities and as fraudsters continue to evolve, there is an increased need to continue strengthening the security of our data streaming platform. One of the ways of doing this is to move from a pure network-based access control to state-of-the-art security and zero trust by default, such as:

  • Authentication: The identity of any remote systems – clients and servers – is established and ascertained first, prior to any further communications.
  • Authorisation: Access to Kafka is granted based on the principle of least privilege; no access is given by default. Kafka clients are associated with the whitelisted Kafka topics and permissions – consume or produce – they strictly need. Also, granted access is auditable.
  • Confidentiality: All in-transit traffic is encrypted.

Solution

We decided to use mutual Transport Layer Security (mTLS) for authentication and encryption. mTLS enables clients to authenticate servers, and servers to reciprocally authenticate clients.

Kafka supports other authentication mechanisms, like OAuth, or Salted Challenge Response Authentication Mechanism (SCRAM), but we chose mTLS because it is able to verify the peer’s identity offline. This verification ability means that systems do not need an active connection to an authentication server to ascertain the identity of a peer. This enables operating in disparate network environments, where all parties do not necessarily have access to such a central authority.

We opted for Hashicorp Vault and its PKI engine to dynamically generate clients and servers’ certificates. This enables us to enforce the usage of short-lived certificates for clients, which is a way to mitigate the potential impact of a client certificate being compromised or maliciously shared. We said zero trust, right?

For authorisation, we chose Policy-Based Access Control (PBAC), a more scalable solution than Role-Based Access Control (RBAC), and the Open Policy Agent (OPA) as our policy engine, for its wide community support.

To integrate mTLS and the OPA with Kafka, we leveraged Strimzi, the Kafka on Kubernetes operator. In a previous article, we have alluded to Strimzi and hinted at how it would help with scalability and cloud agnosticism. Built-in security is undoubtedly an additional driver of our adoption of Strimzi.

Server authentication

Figure 1 – Server authentication process for internal cluster communications

We first set up a single Root Certificate Authority (CA) for each environment (staging, production, etc.). This Root CA, in blue on the diagram, is securely managed by the Hashicorp Vault cluster. Note that the color of the certificates, keys, signing arrows and signatures on the diagrams are consistent throughout this article.

To secure the cluster’s internal communications, like the communications between the Kafka broker and Zookeeper pods, Strimzi sets up a Cluster CA, which is signed by the Root CA (step 1). The Cluster CA is then used to sign the individual Kafka broker and zookeeper certificates (step 2). Lastly, the Root CA’s public certificate is imported into the truststores of both the Kafka broker and Zookeeper (step 3), so that all pods can mutually verify their certificates when authenticating one with the other.

Strimzi’s embedded Cluster CA dynamically generates valid individual certificates when spinning up new Kafka and Zookeeper pods. The signing operation (step 2) is handled automatically by Strimzi.

For client access to Kafka brokers, Strimzi creates a different set of intermediate CA and server certificates, as shown in the next diagram.

Figure 2 – Server authentication process for client access to Kafka brokers

The same Root CA from Figure 1 now signs a different intermediate CA, which the Strimzi community calls the Client CA (step 1). This naming is misleading since it does not actually sign any client certificates, but only the server certificates (step 2) that are set up on the external listener of the Kafka brokers. These server certificates are for the Kafka clients to authenticate the servers. This time, the Root CA’s public certificate will be imported into the Kafka Client truststore (step 3).

Client authentication

Figure 3 – Client authentication process

For client authentication, the Kafka client first needs to authenticate to Hashicorp Vault and request an ephemeral certificate from the Vault PKI engine (step 1). Vault then issues a certificate and signs it using its Root CA (step 2). With this certificate, the client can now authenticate to Kafka brokers, who will use the Root CA’s public certificate already in their truststore, as previously described (step 3).

CA tree

Putting together the three different authentication processes we have just covered, the CA tree now looks like this. Note that this is a simplified view for a single environment, a single cluster, and two clients only.

Figure 4 – Complete certificate authority tree

As mentioned earlier, each environment (staging, production, etc.) has its own Root CA. Within an environment, each Strimzi cluster has its own pair of intermediate CAs: the Cluster CA and the Client CA. At the leaf level, the Zookeeper and Kafka broker pods each have their own individual certificates.

On the right side of the diagram, each Kafka client can get an ephemeral certificate from Hashicorp Vault whenever they need to connect to Kafka. Each team or application has a dedicated Vault PKI role in Hashicorp Vault, restricting what can be requested for its certificate (e.g., Subject, TTL, etc.).

Strimzi deployment

We heavily use Terraform to manage and provision our Kafka and Kafka-related components. This enables us to quickly and reliably spin up new clusters and perform cluster scaling operations.

Under the hood, Strimzi Kafka deployment is a Kubernetes deployment. To increase the performance and the reliability of the Kafka cluster, we create dedicated Kubernetes nodes for each Strimzi Kafka broker and each Zookeeper pod, using Kubernetes taints and tolerations. This ensures that all resources of a single node are dedicated solely to either a single Kafka broker or a single Zookeeper pod.

We also decided to go with a single Kafka cluster by Kubernetes cluster to make the management easier.

Client setup

Coban provides backend microservice teams from all Grab verticals with a popular Kafka SDK in Golang, to standardise how teams utilise Coban Kafka clusters. Adding mTLS support mostly boils down to upgrading our SDK.

Our enhanced SDK provides a default mTLS configuration that works out of the box for most teams, while still allowing customisation, e.g., for teams that have their own Hashicorp Vault Infrastructure for compliance reasons. Similarly, clients can choose among various Vault auth methods such as AWS or Kubernetes to authenticate to Hashicorp Vault, or even implement their own logic for getting a valid client certificate.

To mitigate the potential risk of a user maliciously sharing their application’s certificate with other applications or users, we limit the maximum Time-To-Live (TTL) for any given certificate. This also removes the overhead of maintaining a Certificate Revocation List (CRL). Additionally, our SDK stores the certificate and its associated private key in memory only, never on disk, hence reducing the attack surface.

In our case, Hashicorp Vault is a dependency. To prevent it from reducing the overall availability of our data streaming platform, we have added two features to our SDK – a configurable retry mechanism and automatic renewal of clients’ short-lived certificates when two thirds of their TTL is reached. The upgraded SDK also produces new metrics around this certificate renewal process, enabling better monitoring and alerting.

Authorisation

Figure 5 – Authorisation process before a client can access a Kafka record

For authorisation, we set up the Open Policy Agent (OPA) as a standalone deployment in the Kubernetes cluster, and configured Strimzi to integrate the Kafka brokers with that OPA.

OPA policies – written in the Rego language – describe the authorisation logic. They are created in a GitLab repository along with the authorisation rules, called data sources (step 1). Whenever there is a change, a GitLab CI pipeline automatically creates a bundle of the policies and data sources, and pushes it to an S3 bucket (step 2). From there, it is fetched by the OPA (step 3).

When a client – identified by its TLS certificate’s Subject – attempts to consume or produce a Kafka record (step 4), the Kafka broker pod first issues an authorisation request to the OPA (step 5) before processing the client’s request. The outcome of the authorisation request is then cached by the Kafka broker pod to improve performance.

As the core component of the authorisation process, the OPA is deployed with the same high availability as the Kafka cluster itself, i.e. spread across the same number of Availability Zones. Also, we decided to go with one dedicated OPA by Kafka cluster instead of having a unique global OPA shared between multiple clusters. This is to reduce the blast radius of any OPA incidents.

For monitoring and alerting around authorisation, we submitted an Open Source contribution in the opa-kafka-plugin project in order to enable the OPA authoriser to expose some metrics. Our contribution to the open source code allows us to monitor various aspects of the OPA, such as the number of authorised and unauthorised requests, as well as the cache hit-and-miss rates. Also, we set up alerts for suspicious activity such as unauthorised requests.

Finally, as a platform team, we need to make authorisation a scalable, self-service process. Thus, we rely on the Git repository’s permissions to let Kafka topics’ owners approve the data source changes pertaining to their topics.

Teams who need their applications to access a Kafka topic would write and submit a JSON data source as simple as this:

{
 "example_topic": {
   "read": [
     "clientA.grab",
     "clientB.grab"
   ],
   "write": [
     "clientB.grab"
   ]
 }
}

GitLab CI unit tests and business logic checks are set up in the Git repository to ensure that the submitted changes are valid. After that, the change would be submitted to the topic’s owner for review and approval.

What’s next?

The performance impact of this security design is significant compared to unauthenticated, unauthorised, plaintext Kafka. We observed a drop in throughput, mostly due to the low performance of encryption and decryption in Java, and are currently benchmarking different encryption ciphers to mitigate this.

Also, on authorisation, our current PBAC design is pretty static, with a list of applications granted access for each topic. In the future, we plan to move to Attribute-Based Access Control (ABAC), creating dynamic policies based on teams and topics’ metadata. For example, teams could be granted read and write access to all of their own topics by default. Leveraging a versatile component such as the OPA as our authorisation controller enables this evolution.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How to investigate and take action on security issues in Amazon EKS clusters with Amazon Detective – Part 2

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/how-to-investigate-and-take-action-on-security-issues-in-amazon-eks-clusters-with-amazon-detective-part-2/

In part 1 of this of this two-part series, How to detect security issues in Amazon EKS cluster using Amazon GuardDuty, we walked through a real-world observed security issue in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster and saw how Amazon GuardDuty detected each phase by following MITRE ATT&CK tactics.

In this blog post, we’ll walk you through investigative techniques to use with Amazon Detective, paired with the GuardDuty EKS and malware findings from the security issue. After we have identified impacted resources through our investigation, we’ll provide example remediation tactics and preventative controls to address and help prevent security issues in EKS clusters.

Amazon Detective can help you investigate security issues and related resources in your account. Detective provides EKS coverage that you can enable within your accounts. When this coverage is enabled, Detective can help investigate and remediate potentially unauthorized EKS activity that results from misconfiguration of the control plane nodes or application. Although GuardDuty is not a prerequisite to enable Detective, it is recommended that you enable GuardDuty to enhance the visualization capabilities in Detective with GuardDuty findings.

Prerequisites

You must have the following services enabled in your AWS account to generate and investigate findings associated with EKS security events in a similar manner as outlined in this blog. If you do not have GuardDuty enabled, you can still investigate with Detective, but in a limited capacity.

Investigate with Amazon Detective

In the five phases we walked through in part 1, we discussed GuardDuty findings and MITRE ATT&CK tactics that can help you detect and understand each phase of the unauthorized activity, from the initial misconfiguration to the impact on our application when the EKS cluster is used for crypto mining.

The next recommended step is to investigate the EKS cluster and any associated resources. Amazon Detective can help you to investigate whether there was any other related unauthorized activity in the environment. We will walk through Detective capabilities for visualizing and gathering important information to effectively respond to the security issue. If you’re interested in creating detailed incident response playbooks for your security team to follow in your own environment, refer to these sample AWS incident response playbooks.

Depending on your scenario, there are various resources you can use to start your investigation, such as Security Hub findings, GuardDuty findings, related Kubernetes subjects, or an AWS account’s AWS CloudTrail activity. For our walkthrough, we’ll start our investigation from the GuardDuty finding and use the EKS cluster resource to pivot to the Detective console, as shown in Figure 7. Although we initially focus on the EKS cluster, you could start from any entities that are supported in the Detective behavior graph structure in the Amazon Detective User Guide. For example, we could start directly with the Kubernetes subject system:anonymous and find activity associated with the anonymous user.

Figure 7: Example Detective popup from GuardDuty finding for EKS cluster

Figure 7: Example Detective popup from GuardDuty finding for EKS cluster

We’ll now go over the information that you would need to gather from Detective in order to investigate the example security issue.

To investigate EKS cluster findings with Detective

  1. In the GuardDuty console, navigate to an individual finding and hover over Investigate with Detective. Choose one of the specific resources to start. In the image below, we selected the EKS cluster resource to investigate with Detective. You will need to gather some preliminary information about the IAM roles associated with the EKS cluster.
    • Questions: When was the cluster created? What IAM role created the cluster? What IAM role is assigned to the cluster?
    • Why it matters: If you are an incident responder, these details can potentially help you identify the owner of the cluster and help you determine what IAM principals are involved.
    • What next: Start looking into each IAM principal’s activity, as seen in CloudTrail, to investigate whether the IAM entity itself is potentially compromised or what other resources may have been impacted.
    Figure 8: Detective summary page for EKS cluster metadata details

    Figure 8: Detective summary page for EKS cluster metadata details

  2. Next, on the EKS cluster overview page, you can see the container details associated with the cluster.
    • Question: What are some of the other container details for the cluster? Does anything look out of the ordinary? Is it using a public image? Is it missing a network policy?
    • Why it matters: Based on the architecture related to this cluster, you might be able to use this information to determine whether there are unauthorized containers. The contents of unauthorized containers will depend on your organization but typically consist of public images or unauthorized RBAC, pod security policies, or network policy configurations. It’s important to keep in mind that when you look at data in Detective, the scope time is very important. When you pivot from a GuardDuty finding, the scope time will be set to the first time the GuardDuty finding was seen to the last time the finding was seen. The container details reflect the containers that were running during the selected scope time. Changing the scope time might change the containers that are listed in the table shown in Figure 9.
    • What next: Information found on this page can help to highlight unauthorized resources or configurations that will need to be remediated. You will also need to look at how these resources were initially created and if there are missing guardrails that should have been created during the provisioning of the cluster.
    Figure 9: Detective summary page for EKS container metadata details

    Figure 9: Detective summary page for EKS container metadata details

  3. Finally, you will see associated security findings with this specific EKS cluster, similar to Figure 10, at the bottom of the EKS cluster overview page in Detective.
    • Question: Are there any other security findings associated with this cluster that I previously was not aware of?
    • Why it matters: In our example scenario, we walked through the findings that were initially detected and the events that unfolded from those findings. After further investigation, you might see other findings that were not part of the original investigation. This can occur if your security team is only investigating specific findings or severity values. The finding for PrivilegeEscalation:Kubernetes/PrivilegedContainer informs you that a privileged container was launched on your Kubernetes cluster by using an image that has never before been used to launch privileged containers in your cluster. A privileged container has root level access to the host. The other finding, Persistence:Kubernetes/ContainerWithSensitiveMount, informs you that a container was launched with a configuration that included a sensitive host path with write access in the volumeMounts section. This makes the sensitive host path accessible and writable from inside the container. Any finding associated to the suspicious or compromised cluster is valuable because it provides additional insight into what the unauthorized entity was trying to accomplish after the initial detection.
    • What next: With Detective, you might want to continue your investigation by selecting each of these findings and reviewing all details related to the finding. Depending on the findings, you could bring in additional team members to help investigate further. For this example, we will move on to the next step.
    Figure 10: Example Detective summary of security findings associated with the EKS cluster

    Figure 10: Example Detective summary of security findings associated with the EKS cluster

  4. Shift from the EKS cluster overview section to the Kubernetes API activity section, similar to Figure 11 below. This will give you the opportunity to dig into the API activity associated with this cluster.
    1. Question: What other Kubernetes API activity was attempted from the cluster? Which API calls were successful? Which API calls failed? What was the unauthorized user trying to do?
    2. Why it matters: It’s important to determine which actions were successfully invoked by the unauthorized user so that appropriate remediation actions can be taken. You can look at trends of successful and failed API calls, and can even search by Subject, IP address, or Kubernetes API call.
    3. What next: You might want to look at all cluster role binding from days before the first GuardDuty finding was seen to determine if there was any other suspicious activity you should be investigating regarding the cluster.
    Figure 11: Example Detective summary page for Kubernetes API activity on the EKS cluster

    Figure 11: Example Detective summary page for Kubernetes API activity on the EKS cluster

  5. Next, you will want to look at the Newly observed Kubernetes API calls section, similar to Figure 12 below.
    • Question: What are some of the more recent Kubernetes API calls? What are they trying to access right now and are they successful? Do I need to start taking action for other resources outside of EKS?
    • Why it matters: This data shows Kubernetes subjects who were observed issuing API calls to this cluster for the first time during our scope time. Detective provides you this information by keeping a baseline of the activity associated with supported AWS resources. This can help you more quickly determine whether activity might be suspicious and worth looking into. In our example, we used the search functionality to look at API calls associated with the built-in Kubernetes secrets management. A common way to start your search is to see if an unauthorized user has successfully accessed any secrets, which can help you determine what information you might want to search in the overall API call volume section discussed in step 4.
    • What next: If the unauthorized user has successfully accessed any secret, those secrets should be marked as compromised, and they should be rotated immediately.
    Figure 12: Example Detective summary for newly observed Kubernetes API calls from the EKS cluster

    Figure 12: Example Detective summary for newly observed Kubernetes API calls from the EKS cluster

  6. You can also consider the following question when you look at the Newly observed Kubernetes API calls section.
    • Question: Has the IP address associated with the finding been communicating with any other resources in our environment, and if so, what are the details of that communication?
    • Why it matters: To answer this question, you can use Detective’s search functionality and the ability to use wild cards to search for IP addresses with the same first three octets. Also note that you can use CIDR notation to search, as well. Based on the results in the example in Figure 13, you can see that there are a number of related IP addresses associated with the environment. With this information, you now can look at the traffic associated with these different IPs and what resources they were communicating with.
    Figure 13: Example Detective results page from a query against IP addresses associated with the EKS cluster

    Figure 13: Example Detective results page from a query against IP addresses associated with the EKS cluster

  7. You can select one of the IP addresses in the search results to get more information related to it, similar to Figure 14 below.
    1. Question: What was the first time an IP address was observed in the environment? When was the last time it was observed?
    2. Why it matters: You can use this information to start isolating where unauthorized activity is coming from and what actions are being taken. You can also start creating a time series of unauthorized activity and scope.
    3. What next: You can repeat some of the previous investigation steps for each IP address, like looking at the different tabs to review New behavior, Resource interaction, and Kubernetes activity.
    Figure 14: Example Detective results page for specific IP address and associated metadata details

    Figure 14: Example Detective results page for specific IP address and associated metadata details

In summary, we began our investigation with a GuardDuty finding about an anonymous API request that was successful in using system:anonymous on one of our EKS clusters. We then used Detective to investigate and visualize activity associated with that EKS cluster, such as volume of successful or unsuccessful API requests, where and when those actions were attempted and other security findings associated with the resource. Once we have completed the investigation, we can confirm scope and impact of the security event and start moving towards taking action.

Remediation techniques for Amazon EKS

In this section, we will focus on how to remediate the security issue in our example. Your actions will vary based on your organization and the resources affected. It’s important to note that these actions will impact the EKS cluster and associated workloads, and should accordingly be performed by or coordinated with the cluster operator.

Before you take action on the EKS cluster, you will need to preserve forensic artifacts and evidence for the impacted EKS resources. The order of operations for these actions matters, because you want to get all the data from forensic artifacts in order to determine the overall impact to the resources affected. If you quarantine resources before you capture forensic artifacts, there is a risk that running processes will be interrupted or that the malware attempts to destroy resources that are valuable to a forensics investigation, to cover its tracks.

To preserve forensic evidence

  1. Enable termination protection on the impacted worker node and change the shutdown behavior to Stop.
  2. Label the offending pod or node with a label indicating that it is part of an active investigation.
  3. Cordon the worker node.
  4. Capture both volatile (temporary memory) and non-volatile (Amazon EBS snapshots) artifacts on the worker node.

Now that you have the forensic evidence, you can start to quarantine your EKS resources to restrict unauthorized network communication. The main objective is to prevent the affected EKS pods from communicating with internal resources or exfiltrating data externally.

To quarantine EKS resources

  1. Isolate the pod by creating a network policy that denies ingress and egress traffic to the pod.
  2. Attach a security group to the host and remove inbound and outbound rules. Take this action if you believe the underlying host has been compromised.

    Depending on existing inbound and outbound rules on the security group, the connections will either be tracked or untracked. Applying an isolation security group will drop untracked connections. For tracked connections, new connections with the host will not be allowed from the isolation security group, but existing tracked connections will not be interrupted.

    Important: This action will affect all containers running on the host.

  3. Attach a deny rule for the EKS resources in a network access control list (network ACL). Because network ACLs are stateless firewalls, all connections will be interrupted, whether they are tracked or untracked connections.

    Important: This action will affect all subnets using the network ACL and all resources within those subnets.

At this point, the affected EKS resources are quarantined, but the cluster is still configured to allow anonymous, unauthenticated access. You will need to remove all unauthorized permissions that were created or added.

To remove unauthorized permissions

  1. Update the RBAC configuration to remove system:anonymous access.
  2. Revoke temporary security credentials that are assigned to the pod or worker node, if necessary. You can also remove the IAM role associated with the EKS resources.

    Note: Removing IAM policies or attaching IAM policies to restrict permissions will affect the resources that are using the IAM role.

  3. Remove any unauthorized ClusterRoleBinding created by the system:anonymous user.
  4. Redeploy the compromised pod or workload resource.

The actions taken so far primarily target the EKS resource, but based on our Detective investigation, there are other actions you might need to take. Because secrets were involved that could be used outside of the EKS cluster, those secrets will need to be rotated wherever they are referenced. Detective will also suggest additional areas where you can investigate and remediate additional unauthorized activity in your AWS account.

It is important that your team go through game days or run-throughs for investigating and responding to different scenarios in order to make sure the team is prepared. You can run through the EKS security workshop to get your security team more familiar with remediation for EKS.

For more information about responding to EKS cluster related security issues, refer to GuardDuty EKS remediation in the GuardDuty User Guide and the EKS Best Practices Guide.

Preventative controls for EKS

This section covers several preventative controls that you can use to protect EKS clusters.

How can I prevent external access to the EKS cluster?

To help prevent external access to your EKS clusters, limit the exposure of your API server. You can achieve that in two ways:

  1. Set the API server endpoint access to Private. This will effectively forbid anyone outside of the VPC to send Kubernetes API requests to your EKS cluster.
  2. Set an IP address allow list for the EKS cluster public access endpoint.

How can I prevent giving admin access to the EKS cluster?

To help prevent an EKS cluster user from granting any type of access to anonymous or unauthenticated users, you can set up a ValidatingAdmissionWebhook. This is a special type of Kubernetes admission controller that can be configured in the Kubernetes API. (To learn how to build serverless admission webhooks, see the blog post Building serverless admission webhooks for Kubernetes with AWS SAM.)

The ValidatingAdmissionWebhook will deny a Kubernetes API request that matches all of the following checks:

  1. The request is creating or modifying a ClusterRoleBinding or RoleBinding.
  2. The subjects section contains either of the following:
    • The user system:anonymous
    • The group system:unauthenticated

How can I prevent malicious images from being deployed?

Now that you have set controls to prevent external access to the EKS cluster and prevent granting access to anonymous users, you can focus on preventing the deployment of potentially malicious images.

Malicious container images can have different origins, including:

  1. Images stored in public or unauthorized registries
  2. Images replacing the ones that are stored in authorized registries
  3. Authorized images that contain software with existing or newly discovered vulnerabilities

You can address these sources of malicious images by doing the following:

  1. Use admission controllers to verify that images meet your organization’s requirements, including for the image origin. You can also refer to this this blog post to implement a solution with a webhook and admission controllers.
  2. Enable tag immutability in your registry, a control that prevents an actor from maliciously replacing container images without changing the image’s tags. Additionally, you can enable an AWS Config rule to check tag immutability
  3. Configure another ValidatingAdmissionWebhook that will only accept images if they meet all of the following criteria.
    1. Images that come from approved registries.
    2. Images that pass the vulnerability scan during deployment time.
    3. Images that are signed by a trusted party. Amazon Elastic Container Registry (Amazon ECR) is working on a product enhancement to store image signatures. Currently, you can use an open-source cosign tool to verify and store image signatures.

      Note: These criteria can vary based on your use case and internal security and compliance standards.

The above controls will help prevent the deployment of a vulnerable, unauthorized, or potentially malicious container image.

How can I prevent lateral movement inside the cluster?

To prevent lateral movement inside the cluster, it is recommended to use network policies, as follows:

  • Enforce Kubernetes network policies to enforce ingress and egress controls within the cluster. You can implement these policies by following the steps in the Securing your cluster with network policies EKS workshop.

It’s important to note that you could use security groups for the same purpose, but pod security groups should only be used if the cluster is compromised and when you want to control the traffic between a pod and a resource that resides in the VPC, not inter-pod traffic.

In this section, we’ve reviewed different preventative controls that could have helped mitigate our example security incident. With the first preventative control, we could have prevented external actors from connecting to the API server. The second control could have prevented granting access to anonymous users. The third control could have prevented the deployment of an unauthorized or vulnerable container image. Finally, the fourth control could have helped limit the impact of the deployed vulnerable images to only the pods where the images were deployed, making it harder to laterally move to other pods in the cluster.

Conclusion

In this post, we walked you through how to investigate an EKS cluster related security issue with Amazon Detective. We also provided some recommended remediation and preventative controls to put in place for the EKS cluster specific security issues. When pairing GuardDuty’s ability for continuous threat detection and monitoring with Detective’s organization and visualization capabilities, you enable your security team to conduct faster and more effective investigation. By providing the security team the ability quickly view an organized set of data associated with security events within your AWS account, you reduce the overall Mean Time to Respond (MTTR).

Now that you understand the investigative capabilities with Detective, it’s time to try things out! It is important that you provide a mechanism for your security team to practice detection, investigation, and remediation techniques using security incident response simulations. By periodically running simulations, your security team will be prepared to quickly respond to possible security events. You can find more detailed incident response playbooks that can assist you in preparing for events in your environment, see these sample AWS incident response playbooks.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a thread on Amazon GuardDuty re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Marshall Jones

Marshall is a worldwide senior security specialist solutions architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he helps enterprise customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a shared delivery team senior security consultant at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Manuel Martinez Arizmendi

Manuel Martinez Arizmendi

Manuel works a Security Engineer at Amazon Detective providing new security investigation capabilities to AWS customers. Based on Boston,MA and originally from Madrid, Spain, when he’s not at work, he enjoys playing and watching soccer, playing videogames, and hanging out with his friends.

Graph for fraud detection

Post Syndicated from Grab Tech original https://engineering.grab.com/graph-for-fraud-detection

Grab has grown rapidly in the past few years. It has expanded its business from ride hailing to food and grocery delivery, financial services, and more. Fraud detection is challenging in Grab, because new fraud patterns always arise whenever we introduce a new business product. We cannot afford to develop a new model whenever a new fraud pattern appears as it is time consuming and introduces a cold start problem, that is no protection at the early stage. We need a general fraud detection framework to better protect Grab from various unknown fraud risks.

Our key observation is that although Grab has many different business verticals, the entities within those businesses are connected to each other (Figure 1. Left), for example, two passengers may be connected by a Wi-Fi router or phone device, a merchant may be connected to a passenger by a food order, and so on. A graph provides an elegant way to capture the spatial correlation among different entities in the Grab ecosystem. A common fraud shows clear patterns on a graph, for example, a fraud syndicate tends to share physical devices, and collusion happens between a merchant and an isolated set of passengers (Figure 1. Right).

Figure 1. Left: The graph captures different correlations in the Grab ecosystem.
Right: The graph shows that common fraud has clear patterns.

We believe graphs can help us discover subtle traces and complicated fraud patterns more effectively. Graph-based solutions will be a sustainable foundation for us to fight against known and unknown fraud risks.

Why graph?

The most common fraud detection methods include the rule engine and the decision tree-based models, for example, boosted tree, random forest, and so on. Rules are a set of simple logical expressions designed by human experts to target a particular fraud problem. They are good for simple fraud detection, but they usually do not work well in complicated fraud or unknown fraud cases.

Fraud detection methods

Utilises correlations
(Higher is better)
Detects unknown fraud
(Higher is better)
Requires feature engineering
(Lower is better)
Depends on labels
(Lower is better)
Rule engine Low N/A N/A Low
Decision tree Low Low High High
Graph model High High Low Low

Table 1. Graph vs. common fraud detection methods.

Decision tree-based models have been dominating fraud detection and Kaggle competitions for structured or tabular data in the past few years. With that said, the performance of a tree-based model is highly dependent on the quality of labels and feature engineering, which is often hard to obtain in real life. In addition, it usually does not work well in unknown fraud which has not been seen in the labels.

On the other hand, a graph-based model requires little amount of feature engineering and it is applicable to unknown fraud detection with less dependence on labels, because it utilises the structural correlations on the graph.

In particular, fraudsters tend to show strong correlations on a graph, because they have to share physical properties such as personal identities, phone devices, Wi-Fi routers, delivery addresses, and so on, to reduce cost and maximise revenue as shown in Figure 2 (left). An example of such strong correlations is shown in Figure 2 (right), where the entities on the graph are densely connected, and the known fraudsters are highlighted in red. Those strong correlations on the graph are the key reasons that make the graph based approach a sustainable foundation for various fraud detection tasks.

Figure 2. Fraudsters tend to share physical properties to reduce cost (left), and they are densely connected as shown on a graph (right).

Semi-supervised graph learning

Unlike traditional decision tree-based models, the graph-based machine learning model can utilise the graph’s correlations and achieve great performance even with few labels. The semi-supervised Graph Convolutional Network model has been extremely popular in recent years 1. It has proven its success in many fraud detection tasks across industries, for example, e-commerce fraud, financial fraud, internet traffic fraud, etc.
We apply the Relational Graph Convolutional Network (RGCN) 2 for fraud detection in Grab’s ecosystem. Figure 3 shows the overall architecture of RGCN. It takes a graph as input, and the graph passes through several graph convolutional layers to get node embeddings. The final layer outputs a fraud probability for each node. At each graph convolutional layer, the information is propagated along the neighbourhood nodes within the graph, that is nodes that are close on the graph are similar to each other.

Fig 3. A semi-supervised Relational Graph Convolutional Network model.

We train the RGCN model on a graph with millions of nodes and edges, where only a few percentages of the nodes on the graph have labels. The semi-supervised graph model has little dependency on the labels, which makes it a robust model for tackling various types of unknown fraud.

Figure 4 shows the overall performance of the RGCN model. On the left is the Receiver Operating Characteristic (ROC) curve on the label dataset, in particular, the Area Under the Receiver Operating Characteristic (AUROC) value is close to 1, which means the RGCN model can fit the label data quite well. The right column shows the low dimensional projections of the node embeddings on the label dataset. It is clear that the embeddings of the genuine passenger are well separated from the embeddings of the fraud passenger. The model can distinguish between a fraud and a genuine passenger quite well.

Fig 4. Left: ROC curve of the RGCN model on the label dataset.
Right: Low dimensional projections of the graph node embeddings.

Finally, we would like to share a few tips that will make the RGCN model work well in practice.

  • Use less than three convolutional layers: The node feature will be over-smoothed if there are many convolutional layers, that is all the nodes on the graph look similar.
  • Node features are important: Domain knowledge of the node can be formulated as node features for the graph model, and rich node features are likely to boost the model performance.

Graph explainability

Unlike other deep network models, graph neural network models usually come with great explainability, that is why a user is classified as fraudulent. For example, fraudulent accounts are likely to share hardware devices and form dense clusters on the graph, and those fraud clusters can be easily spotted on a graph visualiser 3.

Figure 5 shows an example where graph visualisation helps to explain the model prediction scores. The genuine passenger with a low RGCN score does not share devices with other passengers, while the fraudulent passenger with a high RGCN score shares devices with many other passengers, that is, dense clusters.

Figure 5. Upper left: A genuine passenger with a low RGCN score has no device sharing with other passengers. Bottom right: A fraudulent user with a high RGCN score shares devices with many other passengers.

Closing thoughts

Graphs provide a sustainable foundation for combating many different types of fraud risks. Fraudsters are evolving very fast these days, and the best traditional rules or models can do is to chase after those fraudsters given that a fraud pattern has already been discovered. This is suboptimal as the damage has already been done on the platform. With the help of graph models, we can potentially detect those fraudsters before any fraudulent activity has been conducted, thus reducing the fraud cost.

The graph structural information can significantly boost the model performance without much dependence on labels, which is often hard to get and might have a large bias in fraud detection tasks. We have shown that with only a small percentage of labelled nodes on the graph, our model can already achieve great performance.

With that said, there are also many challenges to making a graph model work well in practice. We are working towards solving the following challenges we are facing.

  • Feature initialisation: Sometimes, it is hard to initialise the node feature, for example, a device node does not carry many semantic meanings. We have explored self-supervised pre-training 4 to help the feature initialisation, and the preliminary results are promising.
  • Real-time model prediction: Realtime graph model prediction is challenging because real-time graph updating is a heavy operation in most cases. One possible solution is to do batch real-time prediction to reduce the overhead.
  • Noisy connections: Some connections on the graph are inherently noisy on the graph, for example, two users sharing the same IP address does not necessarily mean they are physically connected. The IP might come from a mobile network. One possible solution is to use the attention mechanism in the graph convolutional kernel and control the message passing based on the type of connection and node profiles.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

References

  1. T. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in ICLR, 2017 

  2. Schlichtkrull, Michael, et al. “Modeling relational data with graph convolutional networks.” European semantic web conference. Springer, Cham, 2018. 

  3. Fujiao Liu, Shuqi Wang, et al.. “Graph Networks – 10X investigation with Graph Visualisations”. Grab Tech Blog. 

  4. Wang, Chen, et al.. “Deep Fraud Detection on Non-attributed Graph.” IEEE Big Data conference, PSBD, 2021. 

2022 Canadian Centre for Cyber Security Assessment Summary report available with 12 additional services

Post Syndicated from Naranjan Goklani original https://aws.amazon.com/blogs/security/2022-canadian-centre-for-cyber-security-assessment-summary-report-available-with-12-additional-services/

We are pleased to announce the availability of the 2022 Canadian Centre for Cyber Security (CCCS) assessment summary report for Amazon Web Services (AWS). This assessment will bring the total to 132 AWS services and features assessed in the Canada (Central) AWS Region, including 12 additional AWS services. A copy of the summary assessment report is available for review and download on demand through AWS Artifact.

The full list of services in scope for the CCCS assessment is available on the AWS Services in Scope page. The 12 new services are:

The CCCS is Canada’s authoritative source of cyber security expert guidance for the Canadian government, industry, and the general public. Public and commercial sector organizations across Canada rely on CCCS’s rigorous Cloud Service Provider (CSP) IT Security (ITS) assessment in their decisions to use cloud services. In addition, CCCS’s ITS assessment process is a mandatory requirement for AWS to provide cloud services to Canadian federal government departments and agencies.

The CCCS Cloud Service Provider Information Technology Security Assessment Process determines if the Government of Canada (GC) ITS requirements for the CCCS Medium cloud security profile (previously referred to as GC’s Protected B/Medium Integrity/Medium Availability [PBMM] profile) are met as described in ITSG-33 (IT security risk management: A lifecycle approach). As of November 2022, 132 AWS services in the Canada (Central) Region have been assessed by the CCCS and meet the requirements for the CCCS Medium cloud security profile. Meeting the CCCS Medium cloud security profile is required to host workloads that are classified up to and including the medium categorization. On a periodic basis, CCCS assesses new or previously unassessed services and reassesses the AWS services that were previously assessed to verify that they continue to meet the GC’s requirements. CCCS prioritizes the assessment of new AWS services based on their availability in Canada, and on customer demand for the AWS services. The full list of AWS services that have been assessed by CCCS is available on our Services in Scope for CCCS Assessment page.

To learn more about the CCCS assessment or our other compliance and security programs, visit AWS Compliance Programs. As always, we value your feedback and questions; you can reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below. Want more AWS Security news? Follow us on Twitter.

Naranjan Goklani

Naranjan Goklani

Naranjan is a Security Audit Manager at AWS, based in Toronto (Canada). He leads audits, attestations, certifications, and assessments across North America and Europe. Naranjan has more than 13 years of experience in risk management, security assurance, and performing technology audits. Naranjan previously worked in one of the Big 4 accounting firms and supported clients from the financial services, technology, retail, ecommerce, and utilities industries.

How to detect security issues in Amazon EKS clusters using Amazon GuardDuty – Part 1

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/how-to-detect-security-issues-in-amazon-eks-clusters-using-amazon-guardduty-part-1/

In this two-part blog post, we’ll discuss how to detect and investigate security issues in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Amazon GuardDuty and Amazon Detective.

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that you can use to run and scale container workloads by using Kubernetes in the AWS Cloud, which can help increase the speed of deployment and portability of modern applications. Amazon EKS provides secure, managed Kubernetes clusters on the AWS control plane by default. Kubernetes configurations such as pod security policies, runtime security, and network policies and configurations are specific for your organization’s use-case and securing them adequately would be a customer’s responsibility within AWS’ shared responsibility model.

Amazon GuardDuty can help you continuously monitor and detect suspicious activity related to AWS resources in your account. GuardDuty for EKS protection is a feature that you can enable within your accounts. When this feature is enabled, GuardDuty can help detect potentially unauthorized EKS activity resulting from misconfiguration of the control plane nodes or application.

In this post, we’ll walk through the events leading up to a real-world security issue that occurred due to EKS cluster misconfiguration, discuss how those misconfigurations could be used by a malicious actor, and how Amazon GuardDuty monitors and identifies suspicious activity throughout the EKS security event. In part 2 of the post, we’ll cover Amazon Detective investigation capabilities, possible remediation techniques, and preventative controls for EKS cluster related security issues.

Prerequisites

You must have AWS GuardDuty enabled in your AWS account in order to monitor and generate findings associated with an EKS cluster related security issue in your environment.

EKS security issue walkthrough

Before jumping into the security issue, it is important to understand how the AWS shared responsibility model applies to the Amazon EKS managed service. AWS is responsible for the EKS managed Kubernetes control plane and the infrastructure to deliver EKS in a secure and reliable manner. You have the ability to configure EKS and how it interacts with other applications and services, where you are responsible for making sure that secure configurations are being used.

The following scenario is based on a real-world observed event, where a malicious actor used Kubernetes compromise tactics and techniques to expose and access an EKS cluster. We use this example to show how you can use AWS security services to identify and investigate each step of this security event. For a security event in your own environment, the order of operations and the investigative and remediation techniques used might be different. The scenario is broken down into the following phases and associated MITRE ATT&CK tactics:

  • Phase 1 – EKS cluster misconfiguration
  • Phase 2 (Discovery) – Discovery of vulnerable EKS clusters
  • Phase 3 (Initial Access) – Credential access to obtain Kubernetes secrets
  • Phase 4 (Persistence) – Impact to persist unauthorized access to the cluster
  • Phase 5 (Impact) – Impact to manipulate resources for unauthorized activity

Phase 1 – EKS cluster misconfiguration

By default, when you provision an EKS cluster, the API cluster endpoint is set to public, meaning that it can be accessed from the internet. Despite being accessible from the internet, the endpoint is still considered secure because it requires all API requests to be authenticated by AWS Identity and Access Management (IAM) and then authorized by Kubernetes role-based access control (RBAC). Also, the entity (user or role) that creates the EKS cluster is automatically granted system:masters permissions, which allows the entity to modify the EKS cluster’s RBAC configuration.

This example scenario starts with a developer who has access to administer EKS clusters in an AWS account. The developer wants to work from their home network and doesn’t want to connect to their enterprise VPN for IAM role federation. They configure an EKS cluster API without setting up the proper authentication and authorization components. Instead, the developer grants explicit access to the system:anonymous user in the cluster’s RBAC configuration. (Alternatively, an unauthorized RBAC configuration could be introduced into your environment after a developer unknowingly installs a malicious helm chart from the internet without reviewing or inspecting it first.)

In Kubernetes anonymous requests, unauthenticated and unrejected HTTP requests are treated as anonymous access and are identified as a system:anonymous user belonging to a system:unauthenticated group. This means that any entity on the internet can access the cluster and make API requests that are permitted by the role. There aren’t many legitimate use cases for this type of activity, because it’s considered a best practice to use RBAC instead. Anonymous requests are primarily used for setting up health endpoints and custom authentication.

By monitoring EKS audit logs, GuardDuty identifies this activity and generates the finding Policy:Kubernetes/AnonymousAccessGranted, as shown in Figure 1. This finding informs you that a user on your Kubernetes cluster successfully created a ClusterRoleBinding or RoleBinding to bind the user system:anonymous to a role. This action enables unauthenticated access to the API operations permitted by the role.

Figure 1: Example GuardDuty finding for Kubernetes anonymous access granted

Figure 1: Example GuardDuty finding for Kubernetes anonymous access granted

Phase 2 (Discovery) – Discovery of vulnerable EKS clusters

Port scanning is a method that malicious actors use to determine if resources are publicly exposed, with open ports and known vulnerabilities. As an increasing number of open-source tools allows users to search for endpoints connected to the internet, finding these endpoints has become even easier. Security teams can use these open-source tools to their advantage by proactively scanning for and identifying externally exposed resources in their organization.

This brings us to the discovery phase of our misconfigured EKS cluster. The discovery phase is defined by MITRE as follows: “Discovery consists of techniques an adversary may use to gain knowledge about the system and internal network. These techniques help adversaries observe the environment and orient themselves before deciding how to act.”

By granting system:anonymous access to the EKS cluster in our example, the developer allowed requests from any public unauthenticated source. This can result in external web crawlers probing the cluster API, which can often happen within seconds of the system:anonymous access being granted. GuardDuty identifies this activity and generates the finding Discovery:Kubernetes/SuccessfulAnonymousAccess, as shown in Figure 2. This finding informs you that an API operation to discover resources in a cluster was successfully invoked by the system:anonymous user. Remember, all API calls made by system:anonymous are unauthenticated, in addition to /healthz and /version calls that are always unauthenticated regardless of the user identity, and any entity can make use of this user within the EKS cluster.

In the screenshot, under the Action section in the finding details, you can see that the anonymous user made a get request to “/”. This is a generic request that is not specific to a Kubernetes cluster, which may indicate that the crawler is not specifically targeting Kubernetes clusters. You can further see that the Status code is 200, indicating that the request was successful. If this activity is malicious, then the actor is now aware that there is an exposed resource.

Figure 2: Example GuardDuty finding for Kubernetes successful anonymous access

Figure 2: Example GuardDuty finding for Kubernetes successful anonymous access

Phase 3 (Initial Access) – Credential access to obtain Kubernetes secrets

Next, in this phase, you might start observing more targeted API calls for establishing initial access from unauthorized users. MITRE defines initial access as “techniques that use various entry vectors to gain their initial foothold within a network. Techniques used to gain a foothold include targeted spearphishing and exploiting weaknesses on public-facing web servers. Footholds gained through initial access may allow for continued access, like valid accounts and use of external remote services, or may be limited-use due to changing passwords.”

In our example, the malicious actor has established initial access for the EKS cluster which is evident in the next GuardDuty finding, CredentialAccess:Kubernetes/SuccessfulAnonymousAccess, as shown in Figure 3. This finding informs you that an API call to access credentials or secrets was successfully invoked by the system:anonymous user. The observed API call is commonly associated with the credential access tactic where an adversary is attempting to collect passwords, usernames, and access keys for a Kubernetes cluster.

You can see that in this GuardDuty finding, in the Action section, the Request uri is targeted at a Kubernetes cluster, specifically /api/v1/namespaces/kube-system/secrets. This request seems to be targeting the secrets management capabilities that are built into Kubernetes. You can find more information about this secrets management capability in the Kubernetes documentation.

Figure 3: Example GuardDuty finding for Kubernetes successful credential access from anonymous user

Figure 3: Example GuardDuty finding for Kubernetes successful credential access from anonymous user

Phase 4 (Persistence) – Impact to persist unauthorized access to the cluster

The next phase of this scenario is likely to be an impact in the EKS cluster to enable persistence by the malicious actor. MITRE defines impact as “techniques that adversaries use to disrupt availability or compromise integrity by manipulating business and operational processes.” Following the MITRE definitions, “Persistence consists of techniques that adversaries use to keep access to systems across restarts, changed credentials, and other interruptions that could cut off their access. Techniques used for persistence include any access, action, or configuration changes that let them maintain their foothold on systems, such as replacing or hijacking legitimate code or adding startup code.”

In the GuardDuty finding Impact:Kubernetes/SuccessfulAnonymousAccess, shown in Figure 4, you can see the Kubernetes user details and Action sections that indicate that a successful Kubernetes API call was made to create a ClusterRoleBinding by the system:anonymous username. This finding informs you that a write API operation to tamper with resources was successfully invoked by the system:anonymous user. The observed API call is commonly associated with the impact stage of an attack, when an adversary is tampering with resources in your cluster. This activity shows that the system:anonymous user has now created their own role to enable persistent access the EKS cluster. If the user is malicious, they can now access the cluster even if access is removed in the RBAC configuration for the system:anonymous user.

Figure 4 Example GuardDuty finding for Kubernetes successful credential change by anonymous user

Figure 4 Example GuardDuty finding for Kubernetes successful credential change by anonymous user

Phase 5 (Impact) – Impact to manipulate resources for unauthorized activity

The fifth phase of this scenario is where the unauthorized user is likely to focus on impact techniques in order to use the access for malicious purpose. MITRE says of the impact phase: “Techniques used for impact can include destroying or tampering with data. In some cases, business processes can look fine, but may have been altered to benefit the adversaries’ goals. These techniques might be used by adversaries to follow through on their end goal or to provide cover for a confidentiality breach.” Typically, once a malicious actor has access into a system, they will introduce malware to the system to manipulate the compromised resource and possibly also other resources.

With the introduction of GuardDuty Malware Protection, when an Amazon Elastic Compute Cloud (Amazon EC2) or container-related GuardDuty finding that indicates potentially suspicious activity is generated, an agentless scan on the volumes will initiate and detect the presence of malware. Existing GuardDuty customers need to enable Malware Protection, and for new customers this feature is on by default when they enable GuardDuty for the first time. Malware Protection comes with a 30-day free trial for both existing and new GuardDuty customers. You can see a list of findings that initiates a malware scan in the GuardDuty User Guide.

In this example, the malicious actor now uses access to the cluster to perform unauthorized cryptocurrency mining. GuardDuty monitors the DNS requests from the EC2 instances used to host the EKS cluster. This allows GuardDuty to identify a DNS request made to a domain name associated with a cryptocurrency mining pool, and generate the finding CryptoCurrency:EC2/BitcoinTool.B!DNS, as shown in Figure 5.

Figure 5: Example GuardDuty finding for EC2 instance querying bitcoin domain name

Figure 5: Example GuardDuty finding for EC2 instance querying bitcoin domain name

Because this is an EC2 related GuardDuty finding and GuardDuty Malware Protection is enabled in the account, GuardDuty then conducts an agentless scan on the volumes of the EC2 instance to detect malware. If the scan results in a successful detection of one or more malicious files, another GuardDuty finding for Execution:EC2/MaliciousFile is generated, as shown in Figure 6.

Figure 6: Example GuardDuty finding for detection of a malicious file on EC2

Figure 6: Example GuardDuty finding for detection of a malicious file on EC2

The first GuardDuty finding detects crypto mining activity, while the proceeding malware protection finding provides context on the malware associated with this activity. This context is very valuable for the incident response process.

Conclusion

In this post, we walked you through each of the five phases where we outlined how an initial misconfiguration could result in a malicious actor gaining control of EKS resources within an AWS account and how GuardDuty is able to continually monitor and detect the progression of the security event. As previously stated, this is just one example where a misconfiguration in an EKS cluster could result in a security event.

Now that you have a good understanding of GuardDuty capabilities to continuously monitor and detect EKS security events, you will need to establish processes and procedures to enable your security team to investigate these events. You can enable Amazon Detective to help accelerate your security team’s mean time to respond (MTTR) by providing an efficient mechanism to analyze, investigate, and identify the root cause of security events. Follow along in part 2 of this series, How to investigate and take action on an Amazon EKS cluster related security issue with Amazon Detective, where we’ll cover techniques you can use with Amazon Detective to identify impacted EKS resources in your AWS account, possible remediation actions to take on the cluster, and preventative controls you can implement.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a thread on Amazon GuardDuty re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Marshall Jones

Marshall is a worldwide senior security specialist solutions architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he helps enterprise customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a shared delivery team senior security consultant at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Manuel Martinez Arizmendi

Manuel Martinez Arizmendi

Manuel works a Security Engineer at Amazon Detective providing new security investigation capabilities to AWS customers. Based on Boston,MA and originally from Madrid, Spain, when he’s not at work, he enjoys playing and watching soccer, playing videogames, and hanging out with his friends.

Considerations for security operations in the cloud

Post Syndicated from Stuart Gregg original https://aws.amazon.com/blogs/security/considerations-for-security-operations-in-the-cloud/

Cybersecurity teams are often made up of different functions. Typically, these can include Governance, Risk & Compliance (GRC), Security Architecture, Assurance, and Security Operations, to name a few. Each function has its own specific tasks, but works towards a common goal—to partner with the rest of the business and help teams ship and run workloads securely.

In this blog post, I’ll focus on the role of the security operations (SecOps) function, and in particular, the considerations that you should look at when choosing the most suitable operating model for your enterprise and environment. This becomes particularly important when your organization starts to adapt and operate more workloads in the cloud.

Operational teams that manage business processes are the backbone of organizations—they pave the way for efficient running of a business and provide a solid understanding of which day-to-day processes are effective. Typically, these processes are defined within standard operating procedures (SOPs), also known as runbooks or playbooks, and business functions are centralized around them—think Human Resources, Accounting, IT, and so on. This is also true for cybersecurity and SecOps, which typically has operational oversight of security for the entire organization.

Teams adopt an operating model that inherently leans toward a delegated ownership of security when scaling and developing workloads in the cloud. The emergence of this type of delegation might cause you to re-evaluate your currently supported model, and when you do this, it’s important to understand what outcome you are trying to get to. You want to be able to quickly respond to and resolve security issues. You want to help application teams own their own security decisions. You also want to have centralized visibility of the security posture of your organization. This last objective is key to being able to identify where there are opportunities for improvement in tooling or processes that can improve the operation of multiple teams.

Three ways of designing the operating model for SecOps are as follows:

  • Centralized – A more traditional model where SecOps is responsible for identifying and remediating security events across the business. This can also include reviewing general security posture findings for the business, such as patching and security configuration issues.
  • Decentralized – Responsibility for responding to and remediating security events across the business has been delegated to the application owners and individual business units, and there is no central operations function. Typically, there will still be an overarching security governance function that takes more of a policy or principles view.
  • Hybrid – A mix of both approaches, where SecOps still has a level of responsibility and ownership for identifying and orchestrating the response to security events, while the responsibility for remediation is owned by the application owners and individual business units.

As you can see from these descriptions, the main distinction between the different models is in the team that is responsible for remediation and response. I’ll discuss the benefits and considerations of each model throughout this blog post.

The strategies and operating models that I talk about throughout this blog post will focus on the role of SecOps and organizations that operate in the cloud. It’s worth noting that these operating models don’t apply to any particular technology or cloud provider. Each model has its own benefits and challenges to consider; overall, you should aim to adopt an operating model that gets to the best business outcome, while managing risk and providing a path for continuous improvement.

Background: the centralized model

As you might expect, the most familiar and well-understood operating model for SecOps is a centralized one. Traditionally, SecOps has developed gradually from internal security staff who have a very good understanding of the mostly static on-premises infrastructure and corporate assets, such as employee laptops, servers, and databases.

Centralizing in this way provides organizations with a familiar operating model and structure. Over time, operating in this model across an industry has allowed teams to develop reliable SOPs for common security events. Analysts who deal with these incidents have a good understanding of the infrastructure, the environment, and the steps that are needed to resolve incidents. Every incident gives opportunities to update the SOPs and to share this knowledge and the lessons learned with the wider industry. This continuous feedback cycle has provided benefits to SecOps teams for many years.

When security issues occur, understanding the division of responsibility between the various teams in this model is extremely important for quick resolution and remediation. The Responsibility Assignment Matrix, also known as the RACI model, has defined roles—Responsible, Accountable, Consulted, and Informed. Utilizing a model like this will help align each employee, department, and business unit so that they are aware of their role and contact points when incidents do occur, and can use defined playbooks to quickly act upon incidents.

The pressure can be high during a security event, and incidents that involve production systems carry additional weight. Typically, in a centralized model, security events flow into a central queue that a security analyst will monitor. A common approach is the Security Operations Center (SOC), where events from multiple sources are displayed on screens and also trigger activity in the queue. Security incidents are acted upon by an experienced team that is well versed in SOPs and understands the importance of time sensitivity when dealing with such incidents. Additionally, a centralized SecOps team usually operates in a 24/7 model, which might be achieved by having teams in multiple time zones or with help from an MSSP (Managed Security Service Provider). Whichever strategy is followed, having experienced security analysts deal with security incidents is a great benefit, because experience helps to ensure efficient and thorough remediation of issues.

So, with context and background set—how does a centralized SOC look and feel when it operates in the cloud, and what are its challenges?

Centralized SOC in the cloud: the advantages

Cloud providers offer many solutions and capabilities for SOCs that operate in a centralized model. For example, you can monitor your organization’s cloud security posture as a whole, which allows for key performance indicator (KPI) benchmarking, both internally and industry wide. This can then help your organization target security initiatives, training, and awareness on lower-scoring areas.

Security orchestration, automation, and response (SOAR) is a phrase commonly used across the security industry, and the cloud unlocks this capability. Combining both native and third-party security services and solutions with automation facilitates quick resolution of security incidents. The use of SOAR means that only incidents that need human intervention are actually reviewed by the analysts. After investigation, if automation can be introduced on that alert, it’s quickly applied. Having a central place for automating alerts helps the organization to have a consistent and structured approach to the response for security events and gives analysts more time to focus on activities like threat hunting.

Additionally, such threat-hunting operations require a central security data lake or similar technology. As a result, the SecOps team helps to drive the centralization of data across the business, which is a traditional cybersecurity function.

Centralized SOC in the cloud: organizational considerations

Some KPIs that a traditional SOC would typically use are time to detect (TTD), time to acknowledge (TTA), and time to resolve (TTR). These have been good metrics that SecOps managers can use to understand and benchmark how well the SecOps team is performing, both internally and against industry benchmarks. As your organization starts to take advantage of the breadth and depth available within the cloud, how does this change the KPIs that you need to track? As stated earlier, the cloud makes it easier to track KPIs through increased visibility of your cloud footprint—although you should evaluate traditional KPIs to understand whether they still make sense to use. Some additional KPIs that should be considered are metrics that show increasing automation, reduction in human access, and the overall improvement in security posture.

Organizations should consider scaling factors for operational processes and capability in the centralized SOC model. Once benefits from adopting the cloud have been realized, organizations typically expand and scale up their cloud footprint aggressively. For a centralized SecOps team, this could cause a challenging battle between the wider business, which wants to expand, and the SOC, which needs the ability to fully understand and respond to issues in the environment. For example, most organizations will put together small proof of concepts (POCs) to showcase new architectures and their benefits, and these POCs may become available as blueprints for the wider organization to consume. When new blueprints are implemented, the centralized SecOps team should implement and rely on its automation capabilities to verify that the correct alerting, monitoring, and operational processes are in place.

Decentralization: all ownership with the application teams

Moving or designing workloads in the cloud provides organizations with many benefits, such as increased speed and agility, built-in native security, and the ability to launch globally in minutes. When looking at the decentralized model, business units should incorporate practices into their development pipelines to benefit from the security capabilities of the cloud. This is sometimes referred to as a shift left or DevSecOps approach—essentially building security best practices into every part of the development process, and as early as possible.

Placing the ownership of the SecOps function on the business units and application owners can provide some benefits. One immediate benefit is that the teams that create applications and architectures have first-hand knowledge and contextual awareness of their products. This knowledge is critical when security events occur, because understanding the expected behavior and information flows of workloads helps with quick remediation and resolution of issues. Having teams work on security incidents in the ways that best fit their operational processes can also increase speed of remediation.

Decentralization: organizational considerations

When considering the decentralized approach, there are some organizational considerations that you should be aware of:

Dedicated security analysts within a central SecOps function deal with security incidents day in and day out; they study the industry, have a keen eye on upcoming threats, and are also well versed in high-pressure situations. By decentralizing, you might lose the consistent, level-headed experience they offer during a security incident. Embedding security champions who have industry experience into each business unit can help ensure that security is considered throughout the development lifecycle and that incidents are resolved as quickly as possible.

Contextual information and root cause analysis from past incidents are vital data points. Having a centralized SecOps team makes it much simpler to get a broad view of the security issues affecting the whole organization, which improves the ability to take a signal from one business unit and apply that to other parts of the organization to understand if they are also vulnerable, and to help protect the organization in the future.

Decentralizing the SecOps responsibility completely can cause you to lose these benefits. As mentioned earlier, effective communication and an environment to share data is key to verifying that lessons learned are shared across business units—one way of achieving this effective knowledge sharing could be to set up a Cloud Center of Excellence (CCoE). The CCoE helps with broad information sharing, but the minimization of team hand-offs provided by a centralized SecOps function is a strong organizational mechanism to drive consistency.

Traditionally, in the centralized model, the SOC has 24/7 coverage of applications and critical business functions, which can require a large security staff. The need for 24/7 operations still exists in a decentralized model, and having to provide that capability in each application team or business unit can increase costs while making it more difficult to share information. In a decentralized model, having greater levels of automation across organizational processes can help reduce the number of humans needed for 24/7 coverage.

Blending the models: the hybrid approach

Most organizations end up using a hybrid operating model in one way or another. This model combines the benefits of the centralized and decentralized models, with clear responsibility and division of ownership between the business units and the central SecOps function.

This best-of-both-worlds scenario can be summarized by the statement “global monitoring, local response.” This means that the SecOps team and wider cybersecurity function guides the entire organization with security best practices and guardrails while also maintaining visibility for reporting, compliance, and understanding the security posture of the organization as a whole. Meanwhile, local business units have the tools, knowledge, and expertise available to confidently own remediation of security events for their applications.

In this hybrid model, you split delegation of ownership into two parts. First, the operational capability for security is centrally owned. This centrally owned capability builds upon the partnership between the application teams and the security organization, via the CCoE. This gives the benefits of consistency, tooling expertise, and lessons learned from past security incidents. Second, the resolution of day-to-day security events and security posture findings is delegated to the business units. This empowers the people closest to the business problem to own service improvement in ways that best suit that team’s way of working, whether that’s through ChatOps and automation, or through the tools available in the cloud. Examples of the types of events you might want to delegate for resolution are items such as patching, configuration issues, and workload-specific security events. It’s important to provide these teams with a well-defined escalation route to the central security organization for issues that require specialist security knowledge, such as forensics or other investigations.

A RACI is particularly important when you operate in this hybrid model. Making sure that there is a clear set of responsibilities between the business units and the SecOps team is crucial to avoid confusion when security incidents occur.

Conclusion

The cloud has the ability to unlock new capabilities for your organization. Increased security, speed, and agility and are just some of the benefits you can gain when you move workloads to the cloud. The traditional centralized SecOps model offers a consistent approach to security detection and response for your organization. Decentralization of the response provides application teams with direct exposure to the consequences of their design decisions, which can speed up improvement. The hybrid model, where application teams are responsible for the resolution of issues, can improve the time to fix issues while freeing up SecOps to continue their works. The hybrid operating model compliments the capabilities of the cloud, and enables application owners and business units to work in ways that best suit them while maintaining a high bar for security across the organization.

Whichever operating model and strategy you decide to embark on, it’s important to remember the core principles that you should aim for:

  • Enable effective risk management across the business
  • Drive security awareness and embed security champions where possible
  • When you scale, maintain organization-wide visibility of security events
  • Help application owners and business units to work in ways that work best for them
  • Work with application owners and business units to understand the cyber landscape

The cloud offers many benefits for your organization, and your security organization is there to help teams ship and operate securely. This confidence will lead to realized productivity and continued innovation—which is good for both internal teams and your customers.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Stuart Gregg

Stuart Gregg

Stuart enjoys providing thought leadership and being a trusted advisor to customers. In his spare time Stuart can be seen either eating snacks, running marathons or dabbling in the odd Ironman.

You can now assign multiple MFA devices in IAM

Post Syndicated from Liam Wadman original https://aws.amazon.com/blogs/security/you-can-now-assign-multiple-mfa-devices-in-iam/

At Amazon Web Services (AWS), security is our top priority, and configuring multi-factor authentication (MFA) on accounts is an important step in securing your organization.

Now, you can add multiple MFA devices to AWS account root users and AWS Identity and Access Management (IAM) users in your AWS accounts. This helps you to raise the security bar in your accounts and limit access management to highly privileged principals, such as root users. Previously, you could only have one MFA device associated with root users or IAM users, but now you can associate up to eight MFA devices of the currently supported types with root users and IAM users.

In this blog post, we review the current MFA features for IAM, share use cases for multiple MFA devices, and show you how to manage and sign in with the additional MFA devices for better resiliency and flexibility.

Overview of MFA for IAM

First, let’s recap some of the benefits and available MFA configurations for IAM.

The use of MFA is an important security best practice on AWS. With MFA, you have an additional layer of protection to help prevent unauthorized individuals from gaining access to your systems and data. MFA can help protect your AWS environments if a password associated with your root user or IAM user became compromised.

As a security best practice, AWS recommends that you avoid using root users or IAM users to manage access to your accounts. Instead, you should use AWS IAM Identity Center (successor to AWS Single Sign-On) to manage access to your accounts. You should only use root users for tasks that they are required for.

To help meet different customer needs, AWS supports three types of MFA devices for IAM, including FIDO security keys, virtual authenticator applications, and time-based one-time password (TOTP) hardware tokens. You should select the device type that aligns with your security and operational requirements. You can associate different types of MFA devices with an IAM principal.

Use cases for multiple MFA devices

There are several use cases in which associating multiple MFA devices with an IAM principal is beneficial to the security and operational efficiency of your organization, such as the following:

  • In the event of a lost, stolen, or inaccessible MFA device, you can use one of the remaining MFA devices to access the account without performing the AWS account recovery procedure. If an MFA device is lost or stolen, it’s best practice to disassociate the lost or stolen device from the root users or IAM users that it’s associated with.
  • Geographically dispersed teams, or teams working remotely, can use hardware-based MFA to access AWS, without shipping a single hardware device or coordinating a physical exchange of a single hardware device between team members.
  • If the holder of an MFA device isn’t available, you can maintain access to your root users and IAM users by using a different MFA device associated with an IAM principal.
  • You can store additional MFA devices in a secure physical location, such as a vault or safe, while retaining physical access to another MFA device for redundancy.

How to manage multiple MFA devices in IAM

You can register up to eight MFA devices, in any combination of the currently supported MFA types, with your root users and IAM users.

To register an MFA device

  1. Sign in to the AWS Management Console and do the following:
    • For a root user, choose My Security Credentials.
    • For an IAM user, choose Security credentials.
  2. For Multi-factor authentication (MFA), choose Assign MFA device.
  3. Select the type of MFA device that you want to use and then choose Next.

With multiple MFA devices, you only need one MFA device to sign in to the console or to create a session through the AWS Command Line Interface (AWS CLI) as that principal.

You don’t need to make permissions changes in order for your organization to start taking advantage of multiple MFA devices. The root users and IAM users in your accounts that manage MFA devices today can use their existing IAM permissions to enable additional MFA devices.

Changes to Cloudtrail log entries

In support of this new feature, the identifier of the MFA device used will now be added to the console sign-in events of the root user and IAM user that use MFA. With these changes to AWS CloudTrail log entries, you can now view both the user and the MFA device used to authenticate to AWS. This provides better traceability and audibility for your accounts.

You can find this information in the MFAIdentifier field in CloudTrail, within additionalEventData. You don’t need to take action for this information to be logged. The following is a sample log from CloudTrail that includes the MFAIdentifier.

"additionalEventData": {
"LoginTo": "https://console.aws.amazon.com/console/home?state=hashArgs%23&isauthcode=true",
"MobileVersion": "No",
"MFAIdentifier": "arn:aws:iam::111122223333:mfa/root-account-mfa-device",
"MFAUsed": "YES"
}

The identifier of the MFA devices used for AWS CLI sessions with the sts:GetSessionToken action are logged in the requestParameters field.

    "requestParameters": {
"serialNumber": "arn:aws:iam::111122223333:mfa/root-account-mfa-device"
    }

Sign-in experience with multiple MFA devices

In this section, we’ll show you how to sign in to the console as an IAM principal with multiple MFA devices associated with it.

To authenticate as an IAM principal with multiple MFA devices

  1. Sign in to the IAM console as an IAM principal.
  2. Authenticate with the principal’s password.
  3. For Additional verification required, select the type of MFA device that you want to use to continue authenticating, and then choose Next:
    Figure 1: MFA device selection when authenticating to the console as an IAM user or root user with different types of MFA devices available

    Figure 1: MFA device selection when authenticating to the console as an IAM user or root user with different types of MFA devices available

  4. You will then be prompted to authenticate with the type of device that you selected.
    Figure 2: Prompt to authenticate with a FIDO security key

    Figure 2: Prompt to authenticate with a FIDO security key

Conclusion

In this blog post, you learned about the new multiple MFA devices feature in IAM, and how to set up and manage multiple MFA devices in IAM. Associating multiple MFA devices with your root users and IAM users can make it simpler for you to manage access to them. This feature is available now for AWS customers, except for customers operating in AWS GovCloud (US) Regions or in the AWS China Regions. For more information about how to configure multiple MFA devices on your root users and IAM users, see the documentation on MFA in IAM. There is no extra charge to use MFA devices in IAM.

AWS offers a free MFA security key to eligible AWS account owners in the United States. To determine eligibility and order a key, see the ordering portal.

If you have questions, post them in the AWS Identity and Access Management re:Post topic or reach out to AWS Support.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Liam Wadman

Liam Wadman

Liam is a Solutions Architect with the Identity Solutions team. When he’s not building exciting solutions on AWS or helping customers, he’s often found in the hills of British Columbia on his Mountain Bike. Liam points out that you cannot spell LIAM without IAM.

Khaled Zaky

Khaled Zaky

Khaled is a Sr. Product Manager – Technical at Amazon Web Services. He is responsible for AWS Identity products related to user authentication such as sign-in security and multi-factor authentication products. Khaled has deep industry experience in cloud computing and product management. He is passionate about building customer-centric products that make it easier and more secure for customers to use the cloud. Outside of work interests include teaching product management, road cycling, Taekwondo (Martial Arts) and DIY home renovations.