Tag Archives: Intermediate (200)

Top 10 security best practices for securing backups in AWS

2022-01-13 Ibukun Oyewumi

Post Syndicated from Ibukun Oyewumi original https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-backups-in-aws/

Security is a shared responsibility between AWS and the customer. Customers have asked for ways to secure their backups in AWS. This post will guide you through a curated list of the top ten security best practices to secure your backup data and operations in AWS. While this blog post focuses on backup data and operations in AWS Backup service, the recommended security best practices can be leveraged by organizations that utilize other backup solutions, such as backup tools from the AWS Marketplace.

Since security practices constantly evolve to mitigate new risks, it’s important that you conduct regular risk assessments to determine the applicability of security controls, and implement multiple layers of controls to mitigate risks to your data.

#1 – Implement a backup strategy

A comprehensive backup strategy is an essential part of an organization’s data protection plan to withstand, recover, and reduce any impact that might be sustained due to a security event. You should create an extensive backup strategy that defines which data must be backed up, how often data must be backed up, and monitoring of backup and recovery tasks. When you develop a comprehensive strategy for backing up and restoring data, you should first identify interruptions that may occur, and their potential business impact.

Your objective should be building a recovery strategy that brings your workload back up or avoids downtime within the acceptable Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the acceptable delay between the interruption of service and restoration of service, and RPO is the acceptable amount of time since the last data recovery point. You should consider a granular backup strategy that includes all of the following: continuous backup cadence, Point-in-Time Recovery (PITR), file-level recovery, application data–level recovery, volume-level recovery, instance-level recovery, etc.

A well-designed backup strategy should include actions that can protect and recover your resources from ransomware, with detailed recovery requirements for your applications and their data dependencies. For example, while you establish preventive and detective controls to mitigate the risk of ransomware, you should also design the appropriate level of granularity for cross-region and/or cross-account copy and restore patterns, to ensure that administrators do not restore corrupt backup data in the event of a security event.

In some industries, when developing a backup strategy, you must also consider the regulations for data retention requirements. You should make sure your backup strategy is designed with the necessary retention requirements (per data classification level and/or resource type) sufficient to meet your regulatory needs.

Consult your security compliance teams to validate whether your backup resources and operations should be included or segmented from the scope of your compliance programs. In my experience as a PCI DSS Qualified Security Assessor (QSA), I’ve seen successful/more mature customers include backup and recovery as critical parts of their security program. This helps them understand where data is across their environment and appropriately define compliance scope.

Refer to Backup and Recovery Approaches Using AWS and the Reliability Pillar of the AWS Well-Architected Framework for architectural best practices for designing and operating reliable, secure, efficient, and cost-effective workloads in the cloud.

#2 – Incorporate backup in DR and BCP

Disaster recovery (DR) is the process of preparing, responding, and recovering from a disaster. It is an important part of your resiliency strategy, and concerns how your workload responds when a disaster strikes. A disaster could be a technical failure, human action, or natural event. A Business Continuity Plan (BCP) outlines how an organization intends to continue normal business operations during an unplanned disruption.

Your disaster recovery plan should be a subset of your organization’s business continuity plan (BCP) and you should incorporate AWS Backup procedures in your enterprise business continuity plan. For example, a security event that affects production data might require you to invoke a disaster recovery plan that fails over to backup data from another AWS Region. You should ensure that your employees are familiar with and have practiced using AWS Backup along with your organizational procedures, so that if disaster strikes, your organization can continue its normal operations with little or no service disruption.

#3 – Automate backup operations

Organizations should configure their backup plans and resource assignments to reflect their enterprise data protection policies. Automating and deploying backup policies or organization-wide backup plans allows you to standardize and scale your backup strategy. You can leverage AWS Organizations to centrally automate backup policies to implement, configure, manage, and govern backup activity across supported AWS resources by scheduling backup operations.

You should consider implementing infrastructure as code (IaC) and event-driven architecture as essential parts of your digital transformation and backup strategy, to improve productivity and govern infrastructure operations across multi-account environments. Automating backups allows you to reduce manual overhead from time-consuming configuration of your backups, minimizes the risk for errors, provides visibility on drift detection, and enhances backup policy compliance across multiple AWS workloads or accounts.

Implementing backup policies as code can help you meet data protection regulations, by configuring different requirements for your resource types, scaling your enterprise data protection strategy, and implementing lifecycle rules to specify how long before a recovery point either transitions to cold storage or is deleted, which can help optimize your costs.

When automating your backup operations, you can scale resource assignment options using AWS Tags and Resource IDs to automatically identify the AWS resources that store data for your business-critical applications and protect your data using immutable backups. This can help you prioritize security controls, such as access permissions and backup plans or policies.

#4 – Implement access control mechanisms

When thinking about security in the cloud, your foundational strategy should begin with a strong identity foundation to ensure a user has the right permissions to access data. Appropriate authentication and authorization can mitigate the risk of security events. The shared responsibility model requires AWS customers to implement access control policies. You can use AWS Identity and Access Management (IAM) service to create and manage access policies at scale.

When configuring access rights and permissions, you should implement the principle of least privilege by ensuring each user or system accessing your backup data or Vault is only given the permissions necessary to fulfill their job duties. Using AWS Backup, you should implement access control policies by setting access policies on backup vaults to protect your cloud workloads.

For example, implementing access control policies allows you to grant users access to create backup plans and on-demand backups, but still limit their ability to delete recovery points once they’ve been created. Using vault access policies, you can share a destination backup vault with a source AWS Account, user, or IAM role, as required by your business needs. Access policy can also allow you to share a backup vault with one or multiple accounts, or with your entire organization in AWS Organizations.

As you scale your workloads or migrate into AWS, you may need to centrally manage permissions to your backup vaults and operations. You should use service control policies (SCPs) to implement centralized control over the maximum available permissions for all accounts in your organization. This offers defense in depth, and ensures your users stay within the defined access control guidelines. To learn more, read how you can secure your AWS Backup data and operations using service control policies (SCPs).

To mitigate security risks such as unintended access to your backup resources and data, use AWS IAM Access Analyzer to identify any AWS Backup IAM role shared with an external entity such as AWS account, a root user, an IAM user or role, a federated user, an AWS service, an anonymous user, or other entity that you can use to create a filter.

#5 – Encrypt backup data and vault

Organizations increasingly need to improve their data security strategy, and may be required to meet data protection regulations as they scale in the cloud. The correct implementation of encryption methods can provide an additional layer of protection above foundational access control mechanisms providing a mitigation if your primary access control policies fail.

For example, if you configure overly permissive access control policies on your Backup data, your key management system or process can mitigate the maximum impact of a security event, since there are separate authorization mechanisms to access your data and encryption key which means that the backup data is only viewable as cipher text.

To get the most from AWS cloud encryption, you should encrypt data both in transit and at rest. To protect data in transit, AWS uses published API calls to access AWS Backup through the network using Transport Layer Security (TLS) protocol to provide encryption between you, your application and the Backup service. To protect data at rest, AWS offers cloud-native options of using AWS Key Managed System (KMS) or AWS CloudHSM which leverages Advanced Encryption Standard (AES) with 256-bit keys (AES-256), a strong industry-adopted algorithm for encrypting data. You should evaluate your data governance and regulatory requirements, and select the appropriate encryption service to encrypt your cloud data and backup vaults.

Encryption configuration differs depending on the resource type and backup operations across accounts or Regions. Certain resource types support the ability to encrypt your backups using a separate encryption key from the key used to encrypt the source resource. Since you are responsible for managing access controls to determine who can access your Backup data or vault encryption keys and under which conditions, you should use the policy language offered by AWS KMS to define access controls on keys. You can also use AWS Backup Audit Manager to confirm that your backup is properly encrypted.

To learn more, refer to the documentation on encryption for backups and backup copies.

AWS KMS multi-Region keys allows you to replicate keys from one Region into another. Multi-Region keys are designed to simplify encryption management when your encrypted data has to be copied into other Regions for disaster recovery. You should evaluate the need to implement multi-region KMS keys as part of your overall backup strategy.

#6 – Safeguard backups using immutable storage

Immutable storage allows organizations to write data in a Write Once Read Many (WORM) state. While in a WORM state, data can be written one time, read and used as often as needed after it has been committed or written to the storage medium. Immutable storage ensures data integrity is maintained and provides protection against deletes, overwrites, inadvertent and unauthorized access, ransomware compromise etc. Immutable storage offers an efficient mechanism to address potential security events with real impacts on your business operations.

Immutable storage can be used for better governance when paired with strong SCP restrictions, or can be used in a compliance WORM mode when the letter of the law (such as a legal hold) requires access to immutable data.

You can maintain data availability and integrity with AWS Backup Vault Lock to protect your backups* such that unauthorized entities cannot erase, alter or corrupt your customer or business data during the required retention period. AWS Backup Vault Lock helps you meet your organization’s data protection policies by preventing deletions by privileged users (including the AWS account root user), changes to your backup lifecycle settings, and updates that alter your defined retention period.

AWS Backup Vault Lock ensures immutability and adds an additional layer of defense that protects backups (recovery points) in your Backup Vaults, especially in highly- regulated industries with stringent integrity needs for backups and archives. AWS Backup Vault Lock makes sure your data is preserved along with a backup to recover from in case of unintended or malicious actions.

*The feature has not yet been assessed for compliance with the Securities and Exchange Commission (SEC) rule 17a-4(f) and the Commodity Futures Trading Commission (CFTC) in regulation 17 C.F.R. 1.31(b)-(c).

#7 – Implement backup monitoring and alerting

Backup jobs can fail. A failed job, such as backup, restore, or copy task, may have impact on subsequent steps in a process. When the initial backup job fails, there’s a high probability that other succeeding tasks will also fail. In such a scenario, you can best understand the course of events through monitoring and notification.

Enabling and configuring notifications to generate emails to monitor AWS Backup jobs gives you awareness of your backup activities, ensures you meet critical service-level agreements (SLAs), enhances your business-as-usual monitoring, and helps you meet compliance obligations. You can implement backup monitoring for your workloads by integrating AWS Backup with other AWS services and ticketing systems to perform automated investigation and escalation flows.

For example, use Amazon CloudWatch to track metrics, create alarms, and view dashboards, Amazon EventBridge to monitor AWS Backup processes and events, AWS CloudTrail to monitor AWS Backup API calls with detailed information on the time, source IP, users, and accounts making those calls, and Amazon Simple Notification Service (Amazon SNS) to subscribe to AWS Backup-related topics such as backup, restore, and copy events. Monitoring and alerting can provide organizational awareness for your backup jobs, which helps you respond to backup failures.

You can use AWS Backup Audit Manager to automatically generate evidence of your daily backup audit reports per account and Region. You can also scale your backup monitoring across multiple accounts by using a set of automation templates and dashboards (known as the backup observer solution) to obtain aggregated daily cross-account multi-Region AWS Backup reporting.

#8 – Audit backup configuration

Organizations should audit the compliance of AWS Backup policies against defined controls such as defined backup frequency. You should continuously and automatically track your backup activity and generate automatic reports to find and investigate backup operations or resources which are not compliant with your business requirements.

AWS Backup Audit Manager provides built-in, customizable, compliance controls that align with your business compliance and regulatory requirements. AWS Backup Audit Manager provides five backup governance control templates, including backup resources protected by backup plans, backup plan with a minimum frequency and minimum retention, etc. If you leverage infrastructure-as-code automation, you can use AWS Backup Audit Manager with AWS CloudFormation.

AWS Security Hub provides you with a comprehensive view of your security state in AWS and helps you check your environment against security best practices and industry standards such as AWS Foundational Security Best Practices controls. If you leverage AWS Security Hub within your cloud environment, we recommend you enable the AWS Foundational Security Best Practices, as it includes detective controls that can help with securing backups in AWS. The detective controls in AWS Backup Audit Manager and Security Hub are also mostly available as AWS managed rules in AWS Config.

#9 – Test data recovery capabilities

Ideally, any data stored as a backup must be able to be successfully restored when required. Your backup strategy must include testing your backups. A backup strategy is not effective if backed up data cannot be restored. You should regularly test your ability to find certain recovery points and restore them. While AWS Backup automatically copies tags from the resources it protects to the recovery points, tags are not copied from recovery points to the corresponding restored resources. To scale your inventory management and locate recovery points, you should consider retaining your tags on resources created by AWS Backup restore jobs, using AWS Backup events to trigger a tag replication process.

You can start your data recovery workflow by establishing data recovery patterns and then regularly test them. You should create a simple and repeatable process that allows you to perform continuous data recovery testing to increase confidence in your ability to recover backup data. For example, you can create a pattern to test a cross-account, cross-region restore operation from a central DR backup vault encrypted with a customer-managed KMS key to a source account backup vault encrypted with a different customer-managed KMS key.

If you don’t frequently test such restore operations, you may find that your assumptions on KMS encryption for cross-account, cross-region operations are incorrect. Oftentimes, the only backup recovery pattern that actually works is the path you test frequently. Through routine testing of supported backup resource types, you can spot early warnings that could potentially cause future disturbances and loss of critical data. If possible, maintain a limited but feasible number of recovery paths and patterns to prevent wasted storage space, optimize costs, and save time. It’s easier to fix the problem when a recovery test fails than losing valuable or critical data.

#10 – Incorporate backup in incident response plan

Security Incident Response Simulations (SIRS) are internal events that provide a structured opportunity to practice your incident response plan and procedures during a realistic scenario. It’s valuable to test your backup data and operations in creative SIRS activities to test yourself against the unexpected. This helps you validate your organizational readiness and develop comfort with the rare and unexpected. Your simulations must be realistic, and should involve cross-functional organizational teams required to respond to events.

Start with basic and easy simulation exercises, and work towards a full, complex event. For example, you can build a realistic model that consists of an Amazon Virtual Private Cloud and associated resources that simulate inadvertent overexposure of information or a potential data breach due to changes to policies and access control lists. Document lessons learned to evaluate how well your incident response plan worked, and to identify improvements that need to be made to future response procedures.

You can use AWS Backup to set up automated instance-level backups as AMIs and volume-level backups as snapshots across multiple AWS accounts. This can help your incident response team enhance their forensic process such as automated forensic disk collection, by providing a restore point that could reduce the scope and impact of potential security events such as ransomware.

Conclusion

In this blog post, I showed you the top ten security best practices and controls to protect your backup data in AWS. I encourage you to use these best practices to design and implement a backup and recovery strategy and architecture with multiple layers of controls that scales and achieves your business needs. To learn more about AWS Backup, refer to the AWS Backup documentation.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Backup forum or contact AWS Support.

2021 AWS security-focused workshops

2022-01-11 Temi Adebambo

Post Syndicated from Temi Adebambo original https://aws.amazon.com/blogs/security/2021-aws-security-focused-workshops/

Every year, Amazon Web Services (AWS) looks to help our customers gain more experience and knowledge of our services through hands-on workshops. In 2021, we unfortunately couldn’t connect with you in person as much as we would have liked, so we wanted to create and share new ways to learn and build on AWS. We built and published several security-focused workshops that help you learn how to use or configure new services and features securely. Workshops are hands-on learning modules designed to teach or introduce practical skills, techniques, or concepts you can use to solve business problems.

In this blog post, we highlight the newest AWS security-focused workshops below. There are also several other workshops that were developed before 2021; you can find them on AWS Workshops, AWS Security Workshops, and AWS Samples. Here’s the list:

Data Protection and Privacy

Workshop Title	Abstract
Data discovery and classification with Amazon Macie	In this workshop, get familiar with Amazon Macie and learn to scan and classify data in your Amazon Simple Storage Service (Amazon S3) buckets. Work with Macie (data classification) and AWS Security Hub (centralized security view) to see how data in your environment is stored, and to understand any changes in S3 bucket policies that may affect your security posture. Learn to create a custom data identifier and to create and scope data discovery and classification jobs in Macie. Finally, use Macie to filter and investigate the results from the scans you create.
Scaling your encryption at rest capabilities with AWS KMS	AWS makes it easy to protect your data with encryption. This hands-on workshop provides an opportunity to dive deep into encryption at rest options with AWS. Learn AWS server-side encryption with AWS Key Management Service (AWS KMS) for services such as Amazon S3, Amazon Elastic Block Store (Amazon EBS), and Amazon Relational Database Service (Amazon RDS). Also, learn best practices for using AWS KMS across multiple accounts and Regions and how to scale while optimizing for performance.
Store, retrieve, and manage sensitive credentials in AWS Secrets Manager	In this workshop, learn how to integrate AWS Secrets Manager in your development platform, backed by serverless applications. Work through a sample application, and use Secrets Manager to retrieve credentials as well as work with attribute-based access control using tags. Also, learn how to monitor the compliance of secrets and implement incident response workflows that will rotate the secret, restore the resource policy, alert the SOC, and deny access to the offender.
Building and operating a Private Certificate Authority on AWS	This workshop covers private certificate management on AWS, employing the concepts of least privilege, separation of duties, monitoring, and automation. Participants learn operational aspects of creating a complete certificate authority (CA) hierarchy, building a simple web application, and issuing private certificates. It also covers how job functions—including CA administrators, application developers, and security administrators—can follow the principle of least privilege to perform various functions associated with certificate management. Finally, learn about IoT certificates, code-signing, and certificate templates to enable all your use cases.
Amazon S3 security and access settings and controls	Amazon S3 provides many security and access settings to help you secure your data, controls that ensure that those settings remain in place, and features to help you audit those settings and controls. This workshop walks you through these Amazon S3 capabilities and scenarios, to help you apply them for different security requirements.
Redact data as needed using Amazon S3 Object Lambda	Amazon S3 Object Lambda works with your existing applications, and allows you to add your own code using AWS Lambda functions to automatically process and transform data from Amazon S3 before returning it to an application. This enables different views of the same object depending on user identity, such as restricting access to confidential information, or disallowing access to personally identifiable information (PII) data. In this workshop, learn how to use Amazon S3 Object Lambda to modify objects during GET requests, so you no longer need to store multiple views of the same document.
Using AWS Nitro Enclaves to process highly sensitive data	In this hands-on workshop, learn how to use AWS Nitro Enclaves to isolate highly-sensitive data from your users, applications, and third-party libraries on your Amazon Elastic Compute Cloud (Amazon EC2) instances. Explore AWS Nitro Enclaves, discuss common use cases, and build and run your own enclave. During this workshop, learn about enclave isolation, cryptographic attestation, enclave image files, local Vsock communication channels, common debugging scenarios, and the enclave lifecycle.
Ransomware prevention strategies in Amazon S3	Learn how to use the protective, detective and monitoring controls in AWS to protect your data in S3 from ransomware threats. Set up Amazon GuardDuty for S3 and AWS Identity and Access Management (IAM) Access Analyzer, and learn to read and respond to findings and create IAM invariants. Create a tiered storage approach to backup and recovery, and learn to use Amazon S3 Object Lock, versioning, and replication to provide immutable storage and protect against accidental or malicious deletion.

Governance, Risk, and Compliance

Operating securely in a multi-account environment

Operating multiple AWS accounts under an organization is how many users consume AWS Cloud services. In this workshop, learn how to build foundational security monitoring in multi-account environments. Walk through an initial setup of AWS Security Hub for centralized aggregation of findings across your AWS Organizations organization. Additionally, learn how to centralize Amazon GuardDuty findings, Amazon Detective functions, AWS Identity and Access Management (IAM) Access Analyzer findings (if available), AWS Config rule evaluations, and AWS CloudTrail logs into the central security monitoring account (security tools account). Finally, implement a service control policy (SCP) that denies the ability to disable these security controls.

Building remediation workflows to simplify compliance

Automation and simplification are key to managing compliance at scale. Remediation is one of the essential elements of simplifying and managing risk. In this workshop, see how to build a remediation workflow using AWS Config and AWS Systems Manager automation. Learn how this workflow can be deployed at scale and monitored with AWS Security Hub to oversee the entire organization and how to use AWS Audit Manager to easily access evidence of risk management.

Identity and Access Management

Integrating IAM Access Analyzer into a CI/CD pipeline

Want to analyze Identity and Access Management (IAM) policies at scale? Want to help your developers write secure IAM policies? This workshop provides you the hands-on opportunity to run IAM Access Analyzer policy validation on your AWS CloudFormation templates in a continuous integration/continuous deployment (CI/CD) pipeline.

Data perimeter workshop

In this workshop, learn how to create a data perimeter by building controls that allow access to data only from expected network locations and by trusted identities. The workshop consists of five modules, each designed to illustrate a different Identity and Access Management (IAM) or network control. Learn where and how to implement the appropriate controls based on different risk scenarios. Discover how to implement these controls as service control policies, identity- and resource-based policies, and Amazon Virtual Private Cloud (Amazon VPC) endpoint policies.

Network and Infrastructure Security

Build a Zero Trust architecture for service-to-service workloads on AWS	In this workshop, get hands-on experience implementing a Zero Trust architecture for service-to-service workloads on AWS. Learn how to use services such as Amazon API Gateway and Virtual Private Cloud (Amazon VPC) endpoints to integrate network and identity controls while using Amazon GuardDuty, Lambda, and Amazon DynamoDB to take advantage of native service controls. Learn how these services allow you to authorize specific flows between components to reduce lateral network mobility risk and improve the overall security posture of your workload.
Securing deployment of third-party ML models	Enterprise users adopting machine learning (ML) on AWS often look for prescriptive guidance on implementing security best practices, establishing governance, securing their ML models, and meeting compliance standards. Building a repeatable solution provides users with standardization and governance over what gets provisioned in their AWS account. In this workshop, learn steps you can take to secure third-party ML model deployments. We provide cloud infrastructure-as-code templates to automate the setup of a hardened Amazon SageMaker environment. These templates include private networking, VPC endpoints, end-to-end encryption, logging and monitoring, and enhanced governance and access controls through AWS Service Catalog.
Building Prowler into a QuickSight-powered AWS security dashboard	In this workshop, get hands-on experience with Prowler, AWS Security Hub, and Amazon QuickSight by building a custom security dashboard for the AWS environment. Using a multi-account deployment of Prowler integrated into Security Hub, learn to identify and analyze Prowler findings and integrate QuickSight to visualize the information. Discover how to get the most from QuickSight and Prowler with automatically created datasets.

Threat Detection and Incident Response

Integration, prioritization, and response with AWS Security Hub	This workshop is designed to get you familiar with AWS Security Hub, so you can better understand how to use it in your own AWS environment. This workshop has two sections. The first section demonstrates the features and functions of AWS Security Hub. The second section shows you how to use AWS Security Hub to import findings from different data sources, analyze findings so you can prioritize response work, and implement responses to findings to help improve your security posture.
Building an AWS incident response plan using Jupyter notebooks	This workshop guides you through building an incident response plan for your AWS environment using Jupyter notebooks. Walk through an easy-to-follow sample incident, using building blocks as a ready-to-use playbook in a Jupyter notebook. Then, follow simple steps to add additional programmatic and documented steps to your incident response plan.
Scaling threat detection and response on AWS	In this hands-on workshop, learn about several AWS services involved in threat detection and response as you walk through real-world threat scenarios. Learn about the threat detection capabilities of Amazon GuardDuty, Amazon Macie, and AWS Security Hub and the available response options. For each hands-on scenario, review methods to detect and respond to threats using the following services: AWS CloudTrail, Virtual Private Cloud (Amazon VPC) Flow Logs, Amazon CloudWatch Events, AWS Lambda, Amazon Inspector, Amazon GuardDuty, and AWS Security Hub.
Building incident response playbooks for AWS	In this workshop, learn how to develop incident response playbooks. Explore the incident response lifecycle, including preparation, detection and analysis, containment, eradication and recovery, and post-incident activity. To get the most out of this workshop, you should have advanced experience with AWS services and responsibilities aligned with incident response frameworks such as NIST SP 800-61 R2.

This list is representative of the security workshops created in 2021 to help customers on their journey in AWS. If you’d like to find more workshops, please go to AWS Workshops and select Security in the top navigation bar, or you can also check out AWS Security Workshops for a subset of workshops curated by AWS Security Specialists. We hope you enjoy these workshops!

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Monitor AWS resources created by Terraform in Amazon DevOps Guru using tfdevops

2022-01-05 Harish Vaswani

Post Syndicated from Harish Vaswani original https://aws.amazon.com/blogs/devops/monitor-aws-resources-created-by-terraform-in-amazon-devops-guru-using-tfdevops/

This post was written in collaboration with Kapil Thangavelu, CTO at Stacklet

Amazon DevOps Guru is a machine learning (ML) powered service that helps developers and operators automatically detect anomalies and improve application availability. DevOps Guru utilizes machine learning models, informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (e.g., increased latency, error rates, resource constraints) and surface critical issues that could cause potential outages or service disruptions. DevOps Guru’s anomaly detectors can also proactively detect anomalous behavior even before it occurs, helping you address issues before they happen; insights provide recommendations to mitigate anomalous behavior.

When you enable DevOps Guru, you can configure its coverage to determine which AWS resources you want to analyze. As an option, you can define the coverage boundary by selecting specific AWS CloudFormation stacks. For each stack you choose, DevOps Guru analyzes operational data from the supported resources to detect anomalous behavior. See Working with AWS CloudFormation stacks in DevOps Guru for more details.

For Terraform users, Stacklet developed an open-source tool called tfdevops, which converts Terraform state to an importable CloudFormation stack, which allows DevOps Guru to start monitoring the encapsulated AWS resources. Note that tfdevops is not a tool to convert Terraform into CloudFormation. Instead, it creates the CloudFormation stack containing the imported resources that are specified in the Terraform module and enables DevOps Guru to monitor the resources in that CloudFormation stack.

In this blog post, we will explain how you can configure and use tfdevops, to easily enable DevOps Guru for your existing AWS resources created by Terraform.

Solution overview

tfdevops performs the following steps to import resources into Amazon DevOps Guru:

It translates terraform state into an AWS CloudFormation template with a retain deletion policy
It creates an AWS CloudFormation stack with imported resources
It enrolls the stack into Amazon DevOps Guru

For illustration purposes, we will use a sample serverless application that includes some of the components DevOps Guru and tfdevops supports. This application consists of an Amazon Simple Queue Service (SQS) queue, and an AWS Lambda function that processes messages in the SQS queue. It also includes an Amazon DynamoDB table that the Lambda function uses to persist or to read data, and an Amazon Simple Notification Service (SNS) topic to where the Lambda function publishes the results of its processing. The following diagram depicts our sample application:

The architecture diagram shows a sample application containing an Amazon SQS queue, an AWS Lambda function, an Amazon SNS topic and an Amazon DynamoDB table.

Prerequisites

Before getting started, make sure you have these prerequisites:

Install and authenticate the AWS CLI. You can authenticate with an AWS Identity and Access Management (IAM) user or an AWS Security Token Service (AWS STS) token.
Install Terraform.
Install pip.

Walkthrough

Follow these steps to monitor your AWS resources created with Terraform templates by using tfdevops:

Install tfdevops following the instructions on GitHub
Create a Terraform module with the resources supported by tfdevops
Deploy the Terraform to your AWS account to create the resources in your account

Below is a sample Terraform module to create a sample AWS Lambda function, an Amazon DynamoDB table, an Amazon SNS topic and an Amazon SQS queue.

# IAM role for the lambda function
resource "aws_iam_role" "lambda_role" {
 name   = "iam_role_lambda_function"
 assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

# IAM policy for logging from the lambda function
resource "aws_iam_policy" "lambda_logging" {

  name         = "iam_policy_lambda_logging_function"
  path         = "/"
  description  = "IAM policy for logging from a lambda"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*",
      "Effect": "Allow"
    }
  ]
}
EOF
}

# Policy attachment for the role
resource "aws_iam_role_policy_attachment" "policy_attach" {
  role        = aws_iam_role.lambda_role.name
  policy_arn  = aws_iam_policy.lambda_logging.arn
}

# Generates an archive from the source
data "archive_file" "default" {
  type        = "zip"
  source_dir  = "${path.module}/src/"
  output_path = "${path.module}/myzip/python.zip"
}

# Create a lambda function
resource "aws_lambda_function" "basic_lambda_function" {
  filename                       = "${path.module}/myzip/python.zip"
  function_name                  = "basic_lambda_function"
  role                           = aws_iam_role.lambda_role.arn
  handler                        = "index.lambda_handler"
  runtime                        = "python3.8"
  depends_on                     = [aws_iam_role_policy_attachment.policy_attach]
}

# Create a DynamoDB table
resource "aws_dynamodb_table" "sample_dynamodb_table" {
  name           = "sample_dynamodb_table"
  hash_key       = "sampleHashKey"
  billing_mode   = "PAY_PER_REQUEST"

  attribute {
    name = "sampleHashKey"
    type = "S"
  }
}

# Create an SQS queue
resource "aws_sqs_queue" "sample_sqs_queue" {
  name          = "sample_sqs_queue"
}

# Create an SNS topic
resource "aws_sns_topic" "sample_sns_topic" {
  name = "sample_sns_topic"
}

Run tfdevops to convert to CloudFormation template, deploy the stack and enable DevOps Guru

The following command generates a CloudFormation template locally from a Terraform state file:

tfdevops cfn -d ~/path/to/terraform/module --template mycfn.json --resources importable-ids.json

The following command deploys the CloudFormation template, creates a CloudFormation stack, imports resources, and activates DevOps Guru on the stack:

tfdevops deploy --template mycfn.json --resources importable-ids.json

After tfdevopsfinishes the deployment, you can already see the stack in the CloudFormation dashboard.

CloudFormation dashboard showing the stack, GuruStack, created by tfdevops

tfdevops imports the existing resources in the Terraform module into AWS CloudFormation. Note, that these are not new resources and would have no additional cost implications for the resources itself. See Bringing existing resources into CloudFormation management to learn more about importing resources into CloudFormation.

Resources view for GuruStack listing the imported resources in GuruStack

Your stack also appears at the DevOps Guru dashboard, indicating that DevOps Guru is monitoring your resources, and will alarm in case it detects anomalous behavior. Insights are co-related sequence of events and trails, grouped together to provide you with prescriptive guidance and recommendations to root-cause and resolve issues more quickly. See Working with insights in DevOps Guru to learn more about DevOps Guru insights.

Amazon DevOps Guru Dashboard displays the system health summary and system health overview of each CloudFormation stack. GuruStack is marked as healthy with 0 reactive insights and 0 proactive insights.

Note that when you use the tfdevops tool, it automatically enables DevOps Guru on the imported stack.

Amazon DevOps Guru Analyze resources displays the analysis coverage option selected. GuruStack is the selected stack for analysis

Clean up – delete the stack

CloudFormation Stacks menu showing GuruStack as selected. The stack can be deleted by pressing the Delete button.

Conclusion

This blog post demonstrated how to enable DevOps Guru to monitor your AWS resources created by Terraform. Using the Stacklet’s tfdevops tool, you can create a CloudFormation stack from your Terraform state, and use that to define the coverage boundary for DevOps Guru. With that, if your resources have unexpected or unusual behavior, DevOps Guru will notify you and provide prescriptive recommendations to help you quickly fix the issue.

If you want to experiment DevOps Guru, AWS offers a free tier for the first three months that includes 7,200 AWS resource hours per month for free on each resource group A and B. Also, you can Estimate Amazon DevOps Guru resource analysis costs from the AWS Management Console. This feature scans selected resources to automatically generate a monthly cost estimate. Furthermore, refer to Gaining operational insights with AIOps using Amazon DevOps Guru to learn more about how DevOps Guru helps you increase your applications’ availability, and check out this workshop for a hands-on walkthrough of DevOps Guru’s main features and capabilities. To learn more about proactive insights, see Generating DevOps Guru Proactive Insights for Amazon ECS. To learn more about anomaly detection, see Anomaly Detection in AWS Lambda using Amazon DevOps Guru’s ML-powered insights.

About the authors

How Meshify Built an Insurance-focused IoT Solution on AWS

2022-01-03 Grant Fisher

Post Syndicated from Grant Fisher original https://aws.amazon.com/blogs/architecture/how-meshify-built-an-insurance-focused-iot-solution-on-aws/

The ability to analyze your Internet of Things (IoT) data can help you prevent loss, improve safety, boost productivity, and even develop an entirely new business model. This data is even more valuable, with the ever-increasing number of connected devices. Companies use Amazon Web Services (AWS) IoT services to build innovative solutions, including secure edge device connectivity, ingestion, storage, and IoT data analytics.

This post describes Meshify’s IoT sensor solution, built on AWS, that helps businesses and organizations prevent property damage and avoid loss for the property-casualty insurance industry. The solution uses real-time data insights, which result in fewer claims, better customer experience, and innovative new insurance products.

Through low-power, long-range IoT sensors, and dedicated applications, Meshify can notify customers of potential problems like rapid temperature decreases that could result in freeze damage, or rising humidity levels that could lead to mold. These risks can then be averted, instead of leading to costly damage that can impact small businesses and the insurer’s bottom line.

Architecture building blocks

The three building blocks of this technical architecture are the edge portfolio, data ingestion, and data processing and analytics, shown in Figure 1.

Figure 1. Building blocks of Meshify’s technical architecture

I. Edge portfolio (EP)

Starting with the edge sensors, the Meshify edge portfolio covers two types of sensors:

LoRaWAN (Low power, long range WAN) sensor suite. This sensor provides the long connectivity range (> 1000 feet) and extended battery life (~ 5 years) needed for enterprise environments.
Cellular-based sensors. This sensor is a narrow band/LTE-M device that operates at LTE-M band 2/4/12 radio frequency and uses edge intelligence to conserve battery life.

II. Data ingestion (DI)

For the LoRaWAN solution, aggregated sensor data at the Meshify gateway is sent to AWS using AWS IoT Core and Meshify’s REST service endpoints. AWS IoT Core is a managed cloud platform that lets IoT devices easily and securely connect using multiple protocols like HTTP, MQTT, and WebSockets. It expands its protocol coverage through a new fully managed feature called AWS IoT Core for LoRaWAN. This gives Meshify the ability to connect LoRaWAN wireless devices with the AWS Cloud. AWS IoT Core for LoRaWAN delivers a LoRaWAN network server (LNS) that provides gateway management using the Configuration and Update Server (CUPS) and Firmware Updates Over-The-Air (FUOTA) capabilities.

III. Data processing and analytics (DPA)

Initial processing of the data is done at the ingestion layer, using Meshify REST API endpoints and the Rules Engine of AWS IoT Core. Meshify applies filtering logic to route relevant events to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Amazon MSK is an AWS streaming data service that manages Apache Kafka infrastructure and operations, streamlining the process of running Apache Kafka applications on AWS.

Meshify’s applications then consume the events from Amazon MSK per the configured topic subscription. They enrich and correlate the events with the records with a managed service, Amazon Relational Database Service (RDS). These applications run as scalable containers on another managed service, Amazon Elastic Kubernetes Service (EKS), which runs container applications.

Bringing it all together – technical workflow

In Figure 2, we illustrate the technical workflow from the ingestion of field events to their processing, enrichment, and persistence. Finally, we use these events to power risk avoidance decision-making.

Figure 2. Technical workflow for Meshify IoT architecture

After installation, Meshify-designed LoRa sensors transmit information to the cloud through Meshify’s gateways. LoRaWAN capabilities create connectivity between the sensors and the gateways. They establish a low power, wide area network protocol that securely transmits data over a long distance, through walls and floors of even the largest buildings.
The Meshify Gateway is a redundant edge system, capable of sending sensor data from various sensors to the Meshify cloud environment. Once the LoRa sensor information is received by the Meshify Gateway, it converts the incoming radio frequency (RF) signals, which support faster transfer rate to Meshify’s cloud environment.
Data from the Meshify Gateway and sensors is initially processed at Meshify’s AWS IoT Core and REST service endpoints. These destinations for IoT streaming data help with the initial intake and introduce field data to the Meshify cloud environment. The initial ingestion points can scale automatically based upon the volume of sensor data received. This enables rapid scaling and ease of implementation.
After the data has entered the Meshify cloud environment, Meshify uses Amazon EKS and Amazon MSK to process the incoming data stream. Amazon MSK producer and consumer applications within the EKS systems enrich the data streams for the end users and systems to consume.
Producer applications running on EKS send processed events to the Amazon MSK service. These events include storing and retrieval of raw data, enriched data, and system-level data.
Consumer applications hosted on the EKS pods receive events per the subscribed Amazon MSK topic. Web, mobile, and analytic applications enrich and use these data streams to display data to end users, business teams, and systems operations.
Processed events are persisted in Amazon RDS. The databases are used for reporting, machine learning, and other analytics and processing services.

Building a scalable IoT solution

Meshify first began work on the Meshify sensors and hosted platform in 2012. In the ensuing decade, Meshify has successfully created a platform to auto-scale upon demand with steady, predictable performance. This gave Meshify both the ability to use only the resources needed, and still have the capacity to handle unexpected voluminous data.

As the platform scaled, so did the volume of sensor data, operations and diagnostics data, and metadata from installations and deployments. Building an end-to-end data pipeline that integrates these different data sources and delivers co-related insights at low latency was time well spent.

Conclusion

In this post, we’ve shown how Meshify is using AWS services to power their suite of IoT sensors, software, and data platforms. Meshify’s most important architectural enhancements have involved the introduction of managed services, notably AWS IoT Core for LoRaWAN and Amazon MSK. These improvements have primarily focused on the data ingestion, data processing, and analytics stages.

Meshify continues to power the data revolution at the intersection of IoT and insurance at the edge, using AWS. Looking ahead, Meshify and HSB are excited at the prospect of scaling the relationship with AWS from cloud computing to the world of edge devices.

Learn more about how emerging startups and large enterprises are using AWS IoT services to build differentiated products.

Meshify is an IoT technology company and subsidiary of HSB, based in Austin, TX. Meshify builds pioneering sensor hardware, software, and data analytics solutions that protect businesses from property and equipment damage.

Define application boundary using AWS resources tags in Amazon DevOps Guru

2021-12-30 Suneel Joshi

Post Syndicated from Suneel Joshi original https://aws.amazon.com/blogs/devops/define-application-boundary-using-aws-resources-tags-in-amazon-devops-guru/

Amazon DevOps Guru is an ML powered service that makes it easy to improve an application’s operational performance and availability. By analyzing application metrics, logs, events and traces, DevOps Guru identifies behaviors that deviate from normal operating patterns and creates insights that you can use to improve your application.

At re:Invent 2021, we announced a new tagging feature in DevOps Guru. This feature allows you to organize resources into logical applications, using AWS resources tags so that you can have more control over how applications are defined. Well-defined applications enable DevOps Guru to group related anomalies together to better identify problems and to provide more meaningful recommendations. A tag is a label consisting of a user-defined key and a value. Previously, the coverage boundary consisted of an entire AWS account or specific resources defined by AWS CloudFormation stacks.

Getting Started

Define Resources to analyze using AWS resources tags

An AWS resource tag is a label that consists of a key and a value. A key-value pair can create useful grouping of resources into different applications. For DevOps Guru, you specify one tag key across all your applications. Resources with the same tag value are grouped together into a logical application. The tag key needs to be prefixed with the string “devops-guru-”. Note that the prefix string is not case sensitive. The tag value can be any value you define. The next section describes how you can use tag values to define coverage boundary for your applications.

You can add tags to your resources using the AWS service to which each resource belongs, or use the Tag Editor. To manage tags using your resource’s service, you can use the console, AWS CLI or SDK of the service.

Define Application boundary using AWS resources tag values

For DevOps Guru, we define an application as a group of instantiated AWS resources (Amazon EC2, AWS Lambda, Amazon RDS, etc.) that your workload is running on. You assign the same tag value to all resources that make up your application. DevOps Guru will analyze each resource separately, and will also look at metrics and events across all resources in your application to detect anomalies and generate insights. For example, see the diagram below.

You can have one tag key across all your applications. For each application, assign a different tag value. All resources that make up an application should have the same tag value.

App 1 consists of 2 different resources for a database application – an EC2 instance and a database instance. Assigning the same tag value of RDS to both of the resources. I have another serverless application in App 2, which has a Lambda function and a DynamoDB instance. I assign a different tag value of serverless-app-1 to both of the App 2 resources.

Example Test Scenario

I am going to create a test scenario with an application server running in an EC2 instance. The application server is connected to an Aurora MySQL-Compatible database instance. I will instrument my application to introduce a misbehaving SQL query to create a performance anomaly.

In my example below, I tagged my EC2 instance and database instance with the tag value of RDS. I am interested in detecting performance issues in my Database instance and I want DevOps Guru to provide recommendations to fix those issues.

Manage DevOpsGuru Analysis Coverage

Next, I define the coverage boundary in DevOps Guru Console. In the Settings options in navigation pane, I select Analyzed resources and choose Edit.

To define coverage boundary in the DevOps Guru Console, select the “Settings” option in navigation pane, select “Analyzed resources“ and choose Edit.

Next, I select the “devops-guru-applications” as tag key from the dropdown menu. I am going to select RDS as the tag value, since I am interested in looking at performance issues in my Amazon Aurora database instance.

In the “Edit analyzed resources” screen, choose your tag key in the “Tag key” dropdown menu. Next, press the radio button for choosing specific tag values and then select the tag values to define the coverage boundary of your application.

Filter insights by tags

Next, I created my test scenario. Once DevOps Guru generated an insight, I am able to filter the insights by tag key or tag values. To display insights for my database instance, I select “Affected applications” from the search menu bar on insights page as shown below:

Insights generated by DevOps Guru can be filtered based on the tag values. Select “Affected applications” in the search menu bar in insights page.

Next, I select “Affected applications” as RDS in the above dropdown menu. Below is the Insight overview screen that gets displayed.

The metrics section of the Insights page provides a summary of the Anomaly detected. It also displays a graphical representation of the time window when the anomaly was detected as well as the detailed metric the anomaly was based upon.

The insights generated by DevOps Guru for my Amazon Aurora instances are enabled by Amazon DevOps Guru for RDS, a new feature we announced at re:Invent 2021. It allows developers to easily detect, diagnose, and resolve performance and operational issues in Amazon Aurora. For more information on Amazon DevOps Guru for RDS, see a related news blog written by my colleague, Marcia Villalba.

The insight summary indicates that there is high DB load, ten times above baseline. DevOps Guru for RDS uses anomaly detection on the database load (DB load) performance metric to detect issues. DB load is measured in units of Average Active Sessions (AAS). DB load measures the level of activity in your database, making it a great metric to understand the health of your database.

If you continue scrolling on the DevOps Guru for RDS analysis page, you can discover the cause for the problem and some recommendations to fix it. DevOps Guru for RDS detected there was a high load of wait events, and one SQL query was found to require further investigation. You can even see the exact SQL query if you click on the SQL digest IDs. The insight’s analysis and recommendation section is full of information on how to investigate further and fix the issue.

The easy-to-understand recommendations made by DevOps Guru for RDS means that as a DevOps engineer, you do not need to rely on a database administrator (DBA) or use any third party tools.

DevOps Guru for RDS provides specific recommendations to fix the performance issues detected. In this example, specific wait events contributing to high DB load were identified and specific SQL query ID was identified as a major contributor

Conclusion

AWS resources tags give you one more way to specify the resource analysis coverage boundary, in addition to existing methods of an entire AWS account or specific AWS CloudFormation stacks. AWS tags allows you to better isolate the applications you want DevOps Guru to analyze. In this post, we used AWS tags to define the coverage boundary for a database application. We reduced unrelated and unnecessary resource coverage from our analysis, thereby controlling our resource analysis costs. Visit the DevOps Guru documentation to learn more about how to use tags to identify resources in your DevOps Guru applications.

About the author

Automate Container Anomaly Monitoring of Amazon Elastic Kubernetes Service Clusters with Amazon DevOps Guru

2021-12-15 Rahul Sharad Gaikwad

Post Syndicated from Rahul Sharad Gaikwad original https://aws.amazon.com/blogs/devops/automate-container-anomaly-monitoring-of-amazon-elastic-kubernetes-service-clusters-with-amazon-devops-guru/

Observability in a container-centric environment presents new challenges for operators due to the increasing number of abstractions and supporting infrastructure. In many cases, organizations can have hundreds of clusters and thousands of services/tasks/pods running concurrently. This post will demonstrate new features in Amazon DevOps Guru to help simplify and expand the capabilities of the operator. The features include grouping anomalies by metric and container cluster to improve context and simplify access and support for additional Amazon CloudWatch Container Insight metrics. An example of these capabilities in action would be that Amazon DevOps Guru can now identify anomalies in CPU, memory, or networking within Amazon Elastic Kubernetes Service (EKS), notifying the operators and letting them more easily navigate to the affected cluster to examine the collected data.

Amazon DevOps Guru offers a fully managed AIOps platform service that lets developers and operators improve application availability and resolve operational issues faster. It minimizes manual effort by leveraging machine learning (ML) powered recommendations. Its ML models take advantage of the expertise of AWS in operating highly available applications for the world’s largest ecommerce business for over 20 years. DevOps Guru automatically detects operational issues, predicts impending resource exhaustion, details likely causes, and recommends remediation actions.

Solution Overview

In this post, we will demonstrate the new Amazon DevOps Guru features around cluster grouping and additionally supported Amazon EKS metrics. To demonstrate these features, we will show you how to create a Kubernetes cluster, instrument the cluster using AWS Distro for OpenTelemetry, and then configure Amazon DevOps Guru to automate anomaly detection of EKS metrics. A previous blog provides detail on the AWS Distro for OpenTelemetry collector that is employed here.

Prerequisites

Install eksctl for creating Amazon Elastic Kubernetes Service Cluster
Install kubectl for managing Amazon Elastic Kubernetes Cluster
Amazon Elastic Kubernetes Service(EKS)
AWS Distro for OpenTelemetry
Amazon DevOps Guru
Amazon Simple Notification Service(SNS)

EKS Cluster Creation

We employ the eksctl CLI tool to create an Amazon EKS. Using eksctl, you can provide details on the command line or specify a manifest file. The following manifest is used to create a single managed node using Amazon Elastic Compute Cloud (EC2), and this will be created and constrained to the specified Region via entry metadata/region and Availability Zones via the managedNodeGroups/availabilityZones entry. By default, this will create a new VPC with eight subnets.

# An example of ClusterConfig object using Managed Nodes
---
    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig

    metadata:
      name: devopsguru-eks-cluster
      region: <SPECIFY_REGION_HERE>
      version: "1.21"

    availabilityZones: ["<FIRST_AZ>","<SECOND_AZ>"]
    managedNodeGroups:
      - name: managed-ng-private
        privateNetworking: true
        instanceType: t3.medium
        minSize: 1
        desiredCapacity: 1
        maxSize: 6
        availabilityZones: ["<SPECIFY_AVAILABILITY_ZONE(S)_HERE"]
        volumeSize: 20
        labels: {role: worker}
        tags:
          nodegroup-role: worker
    cloudWatch:
      clusterLogging:
        enableTypes:
          - "api"

To create an Amazon EKS cluster using eksctl and a manifest file, we use eksctl create as shown below. Note that this step will take 10 – 15 minutes to establish the cluster.

$ eksctl create cluster -f devopsguru-managed-node.yaml
2021-10-13 10:44:53 [i] eksctl version 0.69.0
…
2021-10-13 11:04:42 [✔] all EKS cluster resources for "devopsguru-eks-cluster" have been created
2021-10-13 11:04:44 [i] nodegroup "managed-ng-private" has 1 node(s)
2021-10-13 11:04:44 [i] node "<ip>.<region>.compute.internal" is ready
2021-10-13 11:04:44 [i] waiting for at least 1 node(s) to become ready in "managed-ng-private"
2021-10-13 11:04:44 [i] nodegroup "managed-ng-private" has 1 node(s)
2021-10-13 11:04:44 [i] node "<ip>.<region>.compute.internal" is ready
2021-10-13 11:04:47 [i] kubectl command should work with "/Users/<user>/.kube/config"

Once this is complete, you can use kubectl, the Kubernetes CLI, to access the managed nodes that are running.

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
<ip>.<region>.compute.internal Ready <none> 76m v1.21.4-eks-033ce7e

AWS Distro for OpenTelemetry Collector Installation

We will use AWS Distro for OpenTelemetry Collector to extract metrics from a pod running in Amazon EKS. This will collect metrics within the Kubernetes cluster and surface them to Amazon CloudWatch. We start by defining a policy to allow access. The following information comes from the post here.

Attach the CloudWatchAgentServerPolicy IAM Policy to worker node

Open the Amazon EC2 console.
Select one of the worker node instances, and choose the IAM role in the description.
On the IAM role page, choose Attach policies.
In the list of policies, select the check box next to CloudWatchAgentServerPolicy. You can use the search box to find this policy.
Choose Attach policies.

Deploy AWS OpenTelemetry Collector on Amazon EKS

Next, you will deploy the AWS Distro for OpenTelemetry using a GitHub hosted manifest.

Deploy the artifact to the Amazon EKS cluster using the following command:

$ curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-container-insights-infra.yaml | kubectl apply -f -

View the resources in the aws-otel-eks namespace.

$ kubectl get pods -l name=aws-otel-eks-ci -n aws-otel-eks
NAME READY STATUS RESTARTS AGE
aws-otel-eks-ci-jdf2w 1/1 Running 0 107m

View Container Insight Metrics in Amazon CloudWatch

Access Amazon CloudWatch and select Metrics, All metrics to view the published metrics. Under Custom Namespaces, ContainerInsights is selectable. Under this, one can view metrics at the cluster, node, pod, namespace, and service granularity. The following example shows pod level metrics of CPU:

The AWS Console with Amazon Cloudwatch Container Insights Pod Level CPU Utilization.

Amazon Simple Notification Service

It is necessary to allow Amazon DevOps Guru access to Amazon SNS in order for Amazon SNS to publish events. During the setup process, an Amazon SNS Topic is created, and the following resource policy is applied:

{
    "Sid": "DevOpsGuru-added-SNS-topic-permissions",
    "Effect": "Allow",
    "Principal": {
        "Service": "region-id.devops-guru.amazonaws.com"
    },
    "Action": "sns:Publish",
    "Resource": "arn:aws:sns:region-id:topic-owner-account-id:my-topic-name",
    "Condition" : {
      "StringEquals" : {
        "AWS:SourceArn": "arn:aws:devops-guru:region-id:topic-owner-account-id:channel/devops-guru-channel-id",
        "AWS:SourceAccount": "topic-owner-account-id"
    }
  }
}

Amazon DevOps Guru

Amazon DevOps Guru can now be leveraged to monitor the Amazon EKS cluster and Managed Node Group. Select Amazon DevOps Guru, and select Get started as shown in the following figure to do this.

The Amazon DevOps Guru service via the AWS Console.

Once selected, the Get started console displays, letting you specify the IAM role for DevOps guru to access the appropriate resources.

The Get started dialog for Amazon DevOps Guru including instructions on how the service operates, IAM Role Permissions and Amazon DevOps Guru analysis coverage.

Under the Amazon DevOps Guru analysis coverage, Choose later is selected. This will let us specify the CloudFormation stacks to monitor. Select Create a new SNS topic, and provide a name. This will be used to collect notifications and allow for subscribers to then be notified. Select Enable when complete.

The Amazon DevOps Guru analysis coverage allowing the user to select all resources in a region or to choose later. In addition the image shows the dialog that requests the user specify an Amazon SNS topic for notification when insights occur.

On the Manage DevOps Guru analysis coverage, select Analyze all AWS resources in the specified CloudFormation stacks in this Region. Then, select the cluster and managed node group AWS CloudFormation stacks so that DevOps Guru can monitor Amazon EKS.

A dialog where the user is able to specify the AWS CloudFormation stacks in a region for analysis coverage. Two stacks are select including the eks cluster and eks cluster managed node group.

Once this is selected, the display will update indicating that two CloudFormation stacks were added.

Amazon DevOps Guru Settings including DevOps Guru analysis coverage and Amazon SNS notifications.

Amazon DevOps Guru will finally start analysis for those two stacks. This will take several hours to collect data and to identify normal operating conditions. Once this process is complete, the Dashboard will display that those resources have been analyzed, as shown in the following figure.

The completed analysis by DevOps guru of the two AWS Cloudformation stacks indicating a healthy status for both.

Enable Encryption on Amazon SNS Topic

The Amazon SNS Topic created by Amazon DevOps Guru will not enable encryption by default. It is important to enable this feature to encrypt notifications at rest. Go to Amazon SNS, select the topic that is created and then Edit topic. Open the Encryption dialog box and enable encryption as shown in the following figure, specifying an alias, or accepting the default.

The Encryption dialog for Amazon SNS topic when it is Edited.

Deploy Sample Application on Amazon EKS To Trigger Insights

You will employ a sample application that is part of the AWS Distro for OpenTelemetry Collector to simulate failure. Using the following manifest, you will deploy a sample application that has pod resource limits for memory and CPU shares. These limits are artificially low and insufficient for the pod to run. The pod will exceed memory and will be identified for eviction by Amazon EKS. When it is evicted, it will attempt to be redeployed per the manifest requirement for a replica of one. In turn, this will repeat the process and generate memory and pod restart errors in Amazon CloudWatch. For this example, the deployment was left for over an hour, thereby causing the pod failure to repeat numerous times. The following is the manifest that you will create on the filesystem.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: java-sample-app
  namespace: aws-otel-eks
  labels:
    name: java-sample-app
spec:
  replicas: 1
  selector:
    matchLabels:
      name: java-sample-app
  template:
    metadata:
      labels:
        name: java-sample-app
    spec:
      containers:
        - name: aws-otel-emitter
          image: aottestbed/aws-otel-collector-java-sample-app:0.9.0
          resources:
            limits:
              memory: "128Mi"
              cpu: "200m"
          ports:
          - containerPort: 4567
          env:
          - name: OTEL_OTLP_ENDPOINT
            value: "localhost:4317"
          - name: OTEL_RESOURCE_ATTRIBUTES
            value: "service.namespace=AWSObservability,service.name=CloudWatchEKSService"
          - name: S3_REGION
            value: "us-east-1"
          imagePullPolicy: Always

To deploy the application, use the following command:

$ kubectl apply -f <manifest file name>
deployment.apps/java-sample-app created

Scenario: Improved context from DevOps Guru Container Cluster Grouping and Increased Metrics

For our scenario, Amazon DevOps Guru is monitoring additional Amazon CloudWatch Container Insight Metrics for EKS. The following figure shows the flow of information and eventual notification of the operator, so that they can examine the Amazon DevOps Guru Insight. Starting at step 1, the container agent (AWS Distro for OpenTelemetry) forwards container metrics to Amazon CloudWatch. In step 2, Amazon DevOps Guru is continually consuming those metrics and performing anomaly detection. If an anomaly is detected, then this generates an Insight, thereby triggering Amazon SNS notification as shown in step 3. In step 4, the operators access Amazon DevOps Guru console to examine the insight. Then, the operators can leverage the new user interface capability displaying which cluster, namespace, and pod/service is impacted along with correlated Amazon EKS metric(s).

New EKS Container Metrics in DevOps Guru

As part of the release, the following pod and node metrics are now tracked by DevOps Guru:

pod_number_of_container_restarts – number of times that a pod is restarted (e.g., image pull issues, container failure).
pod_memory_utilization_over_pod_limit – memory that exceeds the pod limit called out in resource memory limits.
pod_cpu_utilization_over_pod_limit – CPU shares that exceed the pod limit called out in resource CPU limits.
pod_cpu_utilization – percent CPU Utilization within an active pod.
pod_memory_utilization – percent memory utilization within an active pod.
node_network_total_bytes – total bytes over the network interface for the managed node (e.g., EC2 instance)
node_filesystem_utilization – percent file system utilization for the managed node (e.g., EC2 instance).
node_cpu_utilization – percent CPU Utilization within a managed node (e.g., EC2 instance).
node_memory_utilization – percent memory utilization within a managed node (e.g., EC2 instance).

Operator Scenario

The Kubernetes Operator in the following figure is informed of an insight via Amazon SNS. The Amazon SNS message content appears in the following code, showing the originator and information identifying the InsightDescription, InsightSeverity, name of the container metric, and the Pod / EKS Cluster:

{
 "AccountId": "XXXXXXX",
 "Region": "<REGION>",
 "MessageType": "NEW_INSIGHT",
 "InsightId": "ADFl69Pwq1Aa6M373DhU0zkAAAAAAAAABuZzSBHxeiNexxnLYD7Lhb0vuwY9hLtz",
 "InsightUrl": "https://<REGION>.console.aws.amazon.com/devops-guru/#/insight/reactive/ADFl69Pwq1Aa6M373DhU0zkAAAAAAAAABuZzSBHxeiNexxnLYD7Lhb0vuwY9hLtz",
 "InsightType": "REACTIVE",
 "InsightDescription": "ContainerInsights pod_number_of_container_restarts Anomalous In Stack eksctl-devopsguru-eks-cluster-cluster",
 "InsightSeverity": "high",
 "StartTime": 1636147920000,
 "Anomalies": [
 {
 "Id": "ALAGy5sIITl9e6i66eo6rKQAAAF88gInwEVT2WRSTV5wSTP8KWDzeCYALulFupOQ",
 "StartTime": 1636147800000,
 "SourceDetails": [
 {
 "DataSource": "CW_METRICS",
 "DataIdentifiers": {
 "name": "pod_number_of_container_restarts",
 "namespace": "ContainerInsights",
 "period": "60",
 "stat": "Average",
 "unit": "None",
 "dimensions": "{\"PodName\":\"java-sample-app\",\"ClusterName\":\"devopsguru-eks-cluster\",\"Namespace\":\"aws-otel-eks\"}"
 }
 ....
 "awsInsightSource": "aws.devopsguru"
}

Amazon DevOps Guru Console collects the insights under the Insights selection as shown in the following figure. Select Insights to view the details.

Amazon DevOps Guru Insights. An insight is displayed with a status of Ongoing and Severity of High.

Aggregated Metrics provides the identification of the EKS Container Metrics that have errored. In this case, pod_memory_utilization_over_pod_limit and pod_number_of_container_restarts.

Aggregated Metrics panel with pod_memory_utilization_over_pod_limit and pod_number_of_container_restarts for the Amazon EKS cluster names devopsguru-eks-cluster. Graphically a timeline including time and date is displayed conveying the length of the anomaly.

Further details can be identified by selecting and expanding each insight as shown in the following figure.

Displays the ability to expand the cluster metrics providing further information on the PodName, Namespace and ClusterName. Furthermore, a search bar is provided to search on name, stack or service name.

Note that the display provides information around the Cluster, PodName, and Namespace. This helps operators maintaining large numbers of EKS Clusters to quickly isolate the offending Pod, its operating Namespace, and EKS Cluster to which it belongs. A search bar provides further filtering to isolate the name, stack, or service name displayed.

Cleaning Up

Follow the steps to delete the resources to prevent additional charges being posted to your account.

Amazon EKS Cluster Cleanup

Follow these steps to detach the customer managed policy and delete the cluster.

Detach customer managed policy, AWSDistroOpenTelemetryPolicy, via IAM Console.
Delete cluster using eksctl.

$ eksctl delete cluster devopsguru-eks-cluster --region <region>
2021-10-13 14:08:28 [i] eksctl version 0.69.0
2021-10-13 14:08:28 [i] using region <region>
2021-10-13 14:08:28 [i] deleting EKS cluster "devopsguru-eks-cluster"
2021-10-13 14:08:30 [i] will drain 0 unmanaged nodegroup(s) in cluster "devopsguru-eks-cluster"
2021-10-13 14:08:32 [i] deleted 0 Fargate profile(s)
2021-10-13 14:08:33 [✔] kubeconfig has been updated
2021-10-13 14:08:33 [i] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
2021-10-13 14:09:02 [i] 2 sequential tasks: { delete nodegroup "managed-ng-private", delete cluster control plane "devopsguru-eks-cluster" [async] }
2021-10-13 14:09:02 [i] will delete stack "eksctl-devopsguru-eks-cluster-nodegroup-managed-ng-private"
2021-10-13 14:09:02 [i] waiting for stack "eksctl-devopsguru-eks-cluster-nodegroup-managed-ng-private" to get deleted
2021-10-13 14:12:30 [i] will delete stack "eksctl-devopsguru-eks-cluster-cluster"
2021-10-13 14:12:30 [✔] all cluster resources were deleted

Conclusion

In the previous scenarios, demonstration of the new cluster organization and additional container metrics was performed. Both of these features further simplify and expand the ability for an operator to more easily identify issues within a container cluster when Amazon DevOps Guru detects anomalies. You can start building your own solutions that employ Amazon CloudWatch Agent / AWS Distro for OpenTelemetry Agent and Amazon DevOps Guru by reading the documentation. This provides a conceptual overview and practical examples to help you understand the features provided by Amazon DevOps Guru and how to use them.

About the authors

Integrate Okta to Extend Active Directory Infrastructure into AWS

2021-12-03 Pavankumar Kasani

Post Syndicated from Pavankumar Kasani original https://aws.amazon.com/blogs/architecture/integrate-okta-to-extend-active-directory-infrastructure-into-aws/

Are you ready to extend your on-premises Active Directory to Amazon Web Services (AWS) to remove undifferentiated heavy lifting? Would you like to maintain a highly available Directory Service for your applications? Companies who have already set up integration with Okta Identity Cloud for external or internal applications require Active Directory objects to be synced to Okta for authentication. To achieve centralized access for on-premises and cloud applications, you can extend your on-premises Active Directory to AWS Managed Microsoft Active Directory (AD) using a trust relationship.

This blog shows an architecture pattern that you can use to synchronize your on-premises AD and AWS Managed AD objects. You can use Okta Identity Cloud using an Okta AD agent for syncing users and groups. The Okta AD agent can be installed and configured on a domain-joined on-premises server or an Amazon EC2 instance on AWS (see Figure 1).

AWS Directory Service lets you run Microsoft Active Directory (AD) as a managed service, and is powered by Windows Server 2012 R2. When you select and launch this directory type, it is created as a highly available pair of domain controllers connected to your Amazon Virtual Private Cloud (VPC). The domain controllers run in different Availability Zones in an AWS Region of your choice.

Okta is an enterprise-grade identity management service, which is compatible with many on-premises and cloud applications. The Okta AD agent enables you to integrate Okta with your on-premises AD. This way you can integrate your SaaS applications and your AD instances with Okta. You can simplify and centralize user management and share user credentials with other integrated cloud and on-premises applications.

Figure 1. Active Directory objects synchronization to Okta identity cloud

Network connectivity between corporate data center and AWS Regions

Before getting started with configuring a trust relationship with on-premises AD and AWS managed AD, be sure you’ve read and understand the prerequisites for setting up trust. For example, it is highly recommended to have a VPN or AWS Direct Connect circuit in place between your VPC and your on-premises environment. To create a resilient VPN connection, refer to the Site-to-Site VPN User Guide.

AWS Site-to-Site VPN is a fully managed service that uses IP security (IPsec) tunnels to create a secure connection between your data center or branch office, and your AWS resources. When using Site-to-Site VPN, you can connect to Amazon VPC and also AWS Transit Gateway. Two tunnels per connection are used for increased redundancy. You can also create a dedicated or a hosted connection using AWS Direct Connect.

Trust relationship between on-premises AD and AWS Managed AD

A trust relationship is a link between two different domains. For example, a one-way trust scenario allows the user accounts from the trusted domain to access resources in the trusting domain. In a two-way trust scenario, user accounts and resources can be passed between the two domains bidirectionally. A two-way trust relationship is commonly set up between on-premises AD and AWS Managed AD to extend authentication. This is used for any directory-aware workloads in the AWS Cloud, providing users and groups access to resources in either domain using single sign-on (SSO).

AWS Managed Microsoft Active Directory (AD) supports external and forest trust relationships with your existing on-premises domain in all three trust relationship directions:

One-way incoming
One-way outgoing
Two-way bidirectional

To create a trust relationship, follow these steps:

Prepare your on-premises domain for the trust relationship. This includes preparing your firewall configuration, enable Kerberos pre-authentication, and configuring conditional forwarders.
Prepare your AWS Managed Microsoft AD for the trust relationship. This includes configuring your VPC subnets, security groups, and enabling Kerberos pre-authentication.
Create the trust relationship between your on-premises AD and your AWS Managed Microsoft Active Directory (AD).

Install and configure Okta agent

Download and install Okta AD agent on your Amazon EC2 instance, which should be domain-joined with AWS Managed AD. One Okta AD agent can associate with multiple domains. Once the trust has been set up between on-premises AD and AWS Managed AD, you can associate multiple domains under the same Okta AD agent on Amazon EC2, instead of hosting and managing separate Okta AD agent servers in your own data center and AWS.

For a highly available architecture, a redundant Okta AD agent running in your corporate data center is recommended. This will help you avoid the impact of network connectivity failure between data centers and AWS Regions. Okta recommends installing multiple Okta AD agents on each domain server to achieve high availability and failover protection.

Read Okta AD integration step-by-step setup for installing and configuring Okta agent.

Validate AD objects

Once the Okta agent is installed and configured on the Amazon EC2 instance, log in to the Okta admin console. Under the provisioning to Okta tab, do a full import of users from AWS Managed AD (see Figure 2, Figure 3). The subsequent objects synchronization will be done through scheduled import with a minimum interval of one hour. After the import is done, if there are any user account overlaps between AWS Managed AD and Okta, manually assign the AD users to Okta users. You can create matching rules to automatically map the users from AD to Okta. Read Import AD users to Okta.

Figure 2. Import users under Okta admin console

Figure 3. Import users results under Okta admin console

Matching rules are used in the import of users from all apps and directories that provide importing. If there is an existing Okta account, AD allows you to import and confirm users automatically (see Figure 4).

Figure 4. User creation and matching under Okta admin console

You can import groups from any forest or domain connected to Okta. The Okta AD Agent detects all groups in the domain or the organizational units (OUs) that you select. If you register an Okta AD Agent for more than one domain and you have the root OU selected for all domains, all groups will be imported. Read Import AD Groups to Okta to synchronize groups from AD to Okta.

Synchronize passwords to Okta

When you sign in to Okta using your organization’s AD credentials, the authentication process is delegated to the connected on-premises AD. Okta does not see or store the credentials.

In some cases, the credentials must be synchronized from a directory across Okta to an application. If a user changes the password stored in Active Directory and then tries to access applications using the same single sign-on session, they will receive a password error message. This is because the new password has not been synchronized to the application, so a new sign-in process is required for password validation.

To avoid a disruptive user experience, use the Okta AD Password Sync Agent to synchronize passwords from AD to Okta and to integrated apps. The Okta AD Password Sync Agent will track password changes in AD and then synchronize to Okta.

For more details on the password synchronization and password reset workflow, you can read step-by-step instructions on Synchronize passwords from Active Directory to Okta.

Summary

In this blog post, we discussed a way for synchronizing users and credentials from on-premises Active Directory and AWS Managed AD to Okta Identity Cloud. With synchronization, you can centralize access of cloud and on-premises applications and provide seamless access to AD-aware services within AWS.

Customers can also migrate on-premises AD to AWS using Active Directory Migration Tool (ADMT) along with the Password Export Server (PES) service.

Read more:

Ingesting PI Historian data to AWS Cloud using AWS IoT Greengrass and PI Web Services

2021-11-17 Piyush Batwal

Post Syndicated from Piyush Batwal original https://aws.amazon.com/blogs/architecture/ingesting-pi-historian-data-to-aws-cloud-using-aws-iot-greengrass-and-pi-web-services/

In process manufacturing, it’s important to fetch real-time data from data historians to support decisions-based analytics. Most manufacturing use cases require real-time data for early identification and mitigation of manufacturing issues. A limited set of commercial off-the-shelf (COTS) tools integrate with OSIsoft’s PI Historian for real-time data. However, each integration requires months of development effort, can lack full data integrity, and often doesn’t address data loss issues. In addition, these tools may not provide native connectivity to the Amazon Web Services (AWS) Cloud. Leveraging legacy COTS applications can limit your agility, both in initial setup and ongoing updates. This can impact time to value (TTV) for critical analytics.

In this blog post, we’ll illustrate how you can integrate your on-premises PI Historian with AWS services for your real-time manufacturing use cases. We will highlight the key connector features and a common deployment architecture for your multiple manufacturing use cases.

Scope of OSIsoft PI data historian use

OSIsoft’s PI System is a plant process historian. It collects machine data from various sensors and operational technology (OT) systems during the manufacturing process. PI Historian is the most widely used data historian in process industries such as Healthcare & Life Sciences (HCLS), Chemicals, and Food & Beverage. Large HCLS companies use the PI system extensively in their manufacturing plants.

The PI System usually contains years of historical data ranging from terabytes to petabytes. The data from the PI system can be used in preventive maintenance, bioreactor yield improvement, golden batch analysis, and other machine learning (ML) use cases. It can be a powerful tool when paired with AWS compute, storage, and AI/ML services.

Analyzing real-time and historical data can garner many business benefits. For example, your batch yield could improve by optimizing inputs or you could reduce downtime by proactive intervention and maintenance. You could improve overall equipment effectiveness (OEE) by improving productivity and reducing waste. This could give you the ability to conduct key analysis and deliver products to your end customers in a timely manner.

PI integration options

The data from the PI System can be ingested to AWS services in a variety of ways:

With commercial off-the-shelf (COTS) tools from various business intelligence (BI) integration service providers
Using a PI Integrator for Business Analytics
Integrating Operational Technology data using PI Business Integrator
Deploying Industrial Machine Connectivity on AWS
Using an AWS Professional Services (Proserve) PI Web Connector for AWS IoT Greengrass

PI Connector for AWS IoT Greengrass

The PI Connector was developed by the AWS ProServe team as an extended AWS IoT Greengrass connector. The connector collects real-time and historical data from the PI system using PI Web Services. It publishes the data to various AWS services such as local ML models running at the edge, AWS IoT Core / AWS IoT SiteWise, and Amazon S3.

Connector requirements and design considerations

Specific requirements and design considerations were gathered in collaboration with various customers. These are essential for the most effective integration:

The connector should support reliable connectivity to the PI system for fetching real-time and historical data from the PI.
The connector should support subscription to various PI data modes like real-time, compressed/recorded, and interpolated, to support various use cases.
The initial setup and incremental updates to the PI tag configuration should be seamless without requiring any additional development effort.
The connector should support data contextualization in terms of asset/equipment hierarchy and process batch runs.
The connector should ensure full data integrity, reliable real-time data access, and support re-usability.
The connector should have support for handling data loss prevention scenarios for connectivity loss and/or maintenance/configuration updates.
The setup, deployment, and incremental updates should be fully automated.

Deployment architecture for PI Connector

The connector has been developed as part of AWS IoT Greengrass Connectivity Framework and can be deployed remotely on an edge machine. This can be running on-premises or in the AWS Cloud with access to the on-premises PI system. This machine can be run on a virtual machine (VM), a physical server, or a smaller device like a Raspberry Pi.

The connector incorporates a configuration file. You can specify connector functions such as authentication type, data access modes (polling or subscription), batch contextualization and validation on the data, or historical data access timeframe. It integrates with the PI Web APIs for subscription to real-time data for defined PI tags using secure WebSockets (wss). It can also invoke WebAPI calls for polling data with configured interval time.

The connector can be deployed as an AWS IoT Greengrass V1 AWS Lambda function or a Greengrass V2 component.

Figure 1. PI Connector architecture

Connector features and benefits

The connector supports subscription to real-time and recorded data to track tag value changes in streaming mode. This is useful in situations where process parameter changes must be closely monitored for decision support, actions, and notifications. The connector supports data subscription for individual PI event tags, PI Asset Framework (AF), and PI Event Frames (EF).
The connector supports fetching recorded/compressed or interpolated data based on recorded timestamps or defined intervals, to sample all tags associated with an asset at those intervals.
The connector helps define asset hierarchy and batch tags as part of configuration, and contextualizes all asset data with hierarchy and batch context at the event level. This offloads heavy data post-processing for real-time use cases.
The connector initiates event processing at the edge and provides configurable options to push data to the Cloud. This occurs only when a valid batch is running and/or when a reported tag data quality attribute is good.
The connector ensures availability and data integrity by doing graceful reconnects in case of session closures from the PI side. It fetches, contextualizes, and pushes any missed data due to disconnections, maintenance, or update scenarios.
The connector accelerates the TTV for business by providing a reusable no-code, configuration-only PI integration capability.

Summary

The PI Connector developed by AWS Proserve makes your real-time, data ingestion from PI historian into AWS services fast, secure, scalable, and reliable. The connector can be configured and deployed into your edge network quickly.

With this connector, you can ingest data into many AWS services such as Amazon S3, AWS IoT Core, AWS IoT SiteWise, Amazon Timestream, and more. Try the PI Connector for your manufacturing use cases, and realize the full potential of OSI PI Historian data.

Further reading:

Get started with AWS DevOps Guru Multi-Account Insight Aggregation with AWS Organizations

2021-11-17 Ifeanyi Okafor

Post Syndicated from Ifeanyi Okafor original https://aws.amazon.com/blogs/devops/get-started-with-aws-devops-guru-multi-account-insight-aggregation-with-aws-organizations/

Amazon DevOps Guru is a fully managed service that uses machine learning (ML) to continuously analyze and consolidate operational data streams from multiple sources, such as Amazon CloudWatch metrics, AWS Config, AWS CloudFormation, AWS X-Ray, and provide you with a single console dashboard. This dashboard helps customers improve operational performance and avoid expensive downtime by generating actionable insights that flag operational anomalies, identify the likely root cause, and recommend corrective actions.

As customers scale their AWS resources across multiple accounts and deploy DevOps Guru across applications and use cases on these accounts, they get a siloed view of the operational health of their applications. Now you can enable multi-account support with AWS Organizations and designate a member account to manage operational insights across your entire organization. This delegated administrator can get a holistic view of the operational health of their applications across the organization—without the need for any additional customization.

In this post, we will walk through the process of setting up a delegated administrator. We will also explore how to monitor insights across your entire organization from this account.

Overview of the multi-account environment

The multi-account environment operates based on the organizational hierarchy that your organization has defined in AWS Organizations. This AWS service helps you centrally manage and govern your cloud environment. For this reason, you must have an organization set up in AWS Organizations before you can implement multi-account insights visibility. AWS Organizations is available to all AWS customers at no additional charge, and the service user guide has instructions for creating and configuring an organization in your AWS environment.

Understanding the management account, a delegated administrator, and other member accounts is fundamental to the multi-account visibility functionality in DevOps Guru. Before proceeding further, let’s recap these terms.

A management account is the AWS account you use to create your organization. You can create other accounts in your organization, invite and manage invitations for other existing accounts to join your organization, and remove accounts from your organization. The management account has wide permissions and access to accounts within the organization. It should only be used for absolutely essential administrative tasks, such as managing accounts, OUs, and organization-level policies. You can refer to the AWS Organizations FAQ for more information on management accounts.
When the management account gives a member account service-level administrative permissions, it becomes a delegated administrator. Because the permissions are assigned at the service level, delegated administrator’s privileges are confined to the AWS service in question (DevOps Guru, in this case). The delegated administrator manages the service on behalf of the management account, leaving the management account to focus on administrative tasks, such as account and policy management. Currently, DevOps Guru supports a single delegated administrator, which operates at the root level (i.e., at the organization level). When elevating a member account to a delegated administrator, it must be in the organization. For more information about adding a member account to an organization, see inviting an AWS account to join your organization.
Member accounts are accounts without any administrative privilege. An account can be a member of only one organization at a time.

Prerequisites

You must have the following to enable multi-account visibility:

An organization already set up in AWS Organizations. AWS Organizations is available to all AWS customers at no additional charge, and the service user guide has instructions for creating and configuring an organization in your AWS environment.
A member account that is in your organization and already onboarded in DevOps Guru. This account will be registered as a delegated administrator. For more information about adding a member account to an organization, see inviting an AWS account to join your organization.

Setting up multi-account insights visibility in your organization

In line with AWS Organizations’ best practices, we recommend first assigning a delegated administrator. Although a management account can view DevOps Guru insights across the organization, management accounts should be reserved for organization-level management, while the delegated administrator manages at the service level.

Registering a Delegated Administrator

A delegated administrator must be registered by the management account. The steps below assume that you have a member account to register as a delegated administrator. If your preferred account is not yet in your organization, then invite the account to join your organization.

To register a delegated administrator

Log in to the DevOps Guru Console with the management account.
On the welcome page, under Set up type, select Monitor applications across your organizations. If you select Monitor applications in the current AWS account, then your dashboard will display insights for the management account.
Under Delegated administrator, select Register a delegated administrator (Recommended).
Select Register delegated administrator to complete the process.

To de-register a delegated administrator

Log in to the DevOps Guru Console and navigate to the Management account
On the Management account page, select De-register administrator.

De-register a delegated Administrator. Steps 1 and 2 highlighted on console.

Viewing insights as a Delegated Administrator

As the delegated administrator, you can choose to view insights from

specific accounts
specific OUs
the entire organization

To view insights from specific accounts

Log in to the DevOps Guru console, and select Accounts from the dropdown menu on the dashboard.
Select the search bar to display a list of member accounts.
Select up to five accounts, and select anywhere outside the dropdown menu to apply your selection. Simply select the delegated administrator account from the dropdown menu to view insights from the delegated administrator account.

View Insights from specific account. Steps 1-3 highlighted on dashboard screenshot.

The system health summary now will display information for the selected accounts.

To view insights from specific organizational units

Log in to the DevOps Guru console, and select Organizational Units from the dropdown menu on the dashboard.
Select the search bar to display the list of OUs.
Select up to five OUs, and select anywhere outside of the dropdown menu to apply your selection.

View insights from specific organizational units. Steps 1-3 highlighted in dashboard screenshot.

Now the system health summary will display information for the selected OUs. Nested OUs are currently not supported, so only the accounts directly under the OU are included when an OU is selected. Select the sub-OU in addition to the parent OU to include accounts in a sub-OU,.

To view insights across the entire organization

Log in to the DevOps Guru console and navigate to the Insights
On the Reactive tab, you can see a list of all the reactive insights in the organization. On the Proactive tab, you can also see a list of all the proactive insights in the organization. You will notice that the table displaying the insights now has an Account ID column (highlighted in the snapshot below). This table aggregates insights from different accounts in the organization.

View insights across the entire organization.

Use one or more of the following filters to find the insights that you are looking for
1. Choose the Reactive or Proactive
2. In the search bar, you can add an account ID, status, severity, stack, or resource name to specify a filter.

View insights across the entire organization and filter. Steps 1-3 highlighted in Insights summary page.

1. To search by account ID, select the search bar, select Account, then select an account ID in the submenu.
2. Choose or specify a time range to filter by insight creation time. For example, 12h shows insights created in the past 12 hours, and 1d shows the insights of the previous day. 1w will show the past week’s insights, and 1m will show the last month’s insights. Custom lets you specify another time range. The maximum time range you can use to filter insights is 180 days.

View insights across the entire organization and fitler by time.

Viewing insights from the Management Account

Viewing insights from the management account is similar to viewing insights from the delegated administrator, so the process listed for the delegated administrator also applies to the management account. Although the management account can view insights across the organization, it should be reserved for running administrative tasks across various AWS services at an organization level.

Important notes

Multi-account insight visibility works at a region level, meaning that you can only view insights across the organization within a single AWS region. You must change the AWS region from the region dropdown menu at the top-right corner of the console to view insights from a different AWS region.

For data security reasons, the delegated administrator can only access insights generated across the organization after the selected member account became the delegated administrator. Insights generated across the organization before the delegated administrator registration will remain inaccessible to the delegated administrator.

Conclusion

The steps detailed above show how you can quickly enable multi-account visibility to monitor application health across your entire organization.

AWS Customers are now using AWS DevOps Guru to monitor and improve application performance, and you too can start monitoring your applications by following the instructions in the product documentation. Head over to the AWS DevOps Guru console to get started today.

About the authors

Everything you wanted to know about trusts with AWS Managed Microsoft AD

2021-11-16 Jeremy Girven

Post Syndicated from Jeremy Girven original https://aws.amazon.com/blogs/security/everything-you-wanted-to-know-about-trusts-with-aws-managed-microsoft-ad/

Many Amazon Web Services (AWS) customers use Active Directory to centralize user authentication and authorization for a variety of applications and services. For these customers, Active Directory is a critical piece of their IT infrastructure. AWS offers AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, to provide a highly available and resilient Active Directory service.

One of the most common AWS Managed Microsoft AD use cases is for customers who need to integrate their on-premises Active Directory domain or forest with AWS services like Amazon Relational Database Service (Amazon RDS), Amazon FSx, Amazon WorkSpaces, and other AWS applications and services. This type of integration can require a trust relationship. When it comes to trusts, there are some common misconceptions about what happens and doesn’t happen when a trust is created.

In this post, I’m going to dive deep into various aspects of Active Directory trusts and debunk some common myths along the way. This post will cover the following areas:

Starting with Kerberos

The first part of understanding how trusts work is to understand how authentication flows across a trust, particularly with Kerberos. Kerberos is a subject that, on the surface, is simple enough, but can quickly become much more complex. This post isn’t going to go into detail about Kerberos in Microsoft Windows. If you wish to look further into the topic, see the Microsoft Kerberos documentation. In this post, I’m just going to give you an overview of how Kerberos authentication works across trusts.

Figure 1: Kerberos authentication across trusts

If you only remember one thing about Kerberos and trust, it should be referrals. Let’s look at the workflow in Figure 1, which shows a user from Domain A who is logged into a computer in Domain A and wants to access an Amazon FSx file share in Domain B. For simplicity’s sake, I’ll say there is a two-way trust between Domains A and B.

Note: When a trust is integrated with AWS Managed Microsoft AD, you need to enable Kerberos preauthentication for accounts that traverse the trusts. Disabling Kerberos preauthentication isn’t recommended, because a malicious user can directly send dummy requests for authentication. The key distribution center (KDC) will return an encrypted Ticket-Granting Ticket (TGT), which the malicious user can brute force offline. See Kerberos Pre-Authentication: Why It Should Not Be Disabled for more details.

The steps of the Kerberos authentication process over trusts are as follows:

1. Kerberos authentication service request (KRB_AS_REQ): The client contacts the authentication service (AS) of the KDC (which is running on a domain controller) for Domain A, which the client is a member of, for a short-lived ticket called a Ticket-Granting Ticket (TGT). The default lifetime of the TGT is 10 hours. For Windows clients this happens at logon, but Linux clients might need to run a kinit command.

2. Kerberos authentication service response (KRB_AS_REP): The AS constructs the TGT and creates a session key that the client can use to encrypt communication with the ticket-granting service (TGS). At the time that the client receives the TGT, the client has not been granted access to any resources, even to resources on the local computer.

3. Kerberos ticket-granting service request (KRB_TGS_REQ): The user’s Kerberos client sends a KRB_TGS_REQ message to a local KDC in Domain A, specifying fsx@domainb as the target. The Kerberos client compares the location with its own workstation’s domain. Because these values are different, the client sets a flag in the KDC Options field of the KRB_TGS_REQ message for NAME_CANONICALIZE, which indicates to the KDC that the server might be in another realm (domain).

4. Kerberos ticket-granting service response (KRB_TGS_REP): The user’s local KDC (for Domain A) receives the KRB_TGS_REQ and sends back a TGT referral ticket for Domain B. The TGT is issued for the next intervening domain along the shortest path to Domain B. The TGT also has a referral flag set, so that the KDC will be informed that the KRB_TGS_REQ is coming from another realm. This flag also tells the KDC to fill in the Transited Realms field. The referral ticket is encrypted with the interdomain key that is decrypted by Domain B’s TGS.

Note: When a trust is established between domains or forests, an interdomain key based on the trust password becomes available for authenticating KDC functions and is used to encrypt and decrypt Kerberos tickets.

5. Kerberos ticket-granting service request (KRB_TGS_REQ): The user’s Kerberos client sends a KRB_TGS_REQ along with the TGT it received from the Domain A KDC to a KDC in Domain B.

6. Kerberos ticket-granting service response (KRB_TGS_REP): The TGS in Domain B examines the TGT and the authenticator. If these are acceptable, the TGS creates a service ticket. The client’s identity is taken from the TGT and copied to the service ticket. Then the ticket is sent to the client.

For more details on the authenticator, see How the Kerberos Version 5 Authentication Protocol Works.

7. Application server service request (KRB_TGS_REQ): After the client has the service ticket, the client sends the ticket and a new authenticator to the target server, requesting access. The server will decrypt the ticket, validate the authenticator, and (for Windows services), create an access token for the user based on the SIDs in the ticket.

8. Application server service response (KRB_TGS_REP): Optionally, the client might request that the target server verify its own identity. This is called mutual authentication. If mutual authentication is requested, the target server takes the client computer’s timestamp from the authenticator, encrypts it with the session key the TGS provided for client-target server messages, and sends it to the client.

The basics of trust transitivity, direction, and types

Let’s start off by defining a trust. Active Directory trusts are a relationship between domains, which makes it possible for users in one domain to be authenticated by a domain controller in the other domain. Authenticated users, if given proper permissions, can access resources in the other domain.

Active Directory Domain Services supports four types of trusts: External (Domain), Forest, Realm, and Shortcut. Out of those four types of trusts, AWS Managed Microsoft AD supports the External (Domain) and Forest trust types. I’ll focus on External (Domain) and Forest trust types for this post.

Transitivity: What is it?

Before I dive into the types of trusts, it’s important to understand the concept of transitivity in trusts. A trust that is transitive allows authentication to flow through other domains (Child and Trees) in the trusted forests or domains. In contrast, a non-transitive trust is a point-to-point trust that allows authentication to flow exclusively between the trusted domains.

Figure 2: Forest trusts between the Example.local and Example.com forests

Don’t worry about the trust types at this point, because I’ll cover those shortly. The example in Figure 2 shows a Forest trust between Example.com and Example.local. The Example.local forest has a child domain named Child. With a transitive trust, users from the Example.local and Child.Example.local domain can be authenticated to resources in the Example.com domain.

If Figure 2 has an External trust, only users from Example.local can be authenticated to resources in the Example.com domain. Users from Child.Example.local cannot traverse the trust to access resources in the Example.com domain.

Trust direction

Two-way trusts are bidirectional trusts that allow authentication referrals from either side of the trust to give users access resources in either domain or forest. If you look in the Active Directory Domains and Trusts area of the Microsoft Management Console (MMC), which provides consoles to manage the hardware, software, and network components of Microsoft Windows operating system, you can see both an incoming and an outgoing trust for the trusted domain.

One-way trusts are a single-direction trust that allows authentication referrals from one side of the trust only. A one-way trust is either outgoing or incoming, but not both (that would be a two-way trust).

An outgoing trust allows users from the trusted domain (Example.com) to authenticate in this domain (Example.local).
An incoming trust allows users from this domain (Example.local) to authenticate in the trusted domain (Example.com).

Figure 3: One-way trust direction

Let’s use a diagram to further explain this concept. Figure 3 shows a one-way trust between Example.com and Example.local. This an outgoing trust from Example.com and an incoming trust on Example.local. Users from Example.local can authenticate and, if given proper permissions, access resources in Example.com. Users from Example.com cannot access or authenticate to resources in Example.local.

Trust types

In this section of the post, I’ll examine the various types of Active Directory trusts and their capabilities.

External trusts

This trust type is used to share resources between two domains. These can be individual domains within or external to a forest. Think of this as a point-to-point trust between two domains. See Understanding When to Create an External Trust for more details on this trust type.

Transitivity: Non-transitive
Direction: One-way or two-way
Authentication types: NTLM Only* (Kerberos is possible with caveats; see the Microsoft Windows Server documentation for details)
AWS Managed Microsoft AD support: Yes

Forest trusts

This trust type is used to share resources between two forests. This is the preferred trust model, because it works fully with Kerberos without any caveats. See Understanding When to Create a Forest Trust for more details.

Transitivity: Transitive
Direction: One-way or two-way
Authentication types: Kerberos and NTLM
AWS Managed Microsoft AD support: Yes

Realm trusts

This trust type is used to form a trust relationship between a non-Windows Kerberos realm and an Active Directory domain. See Understanding When to Create a Realm Trust for more details.

Transitivity: Non-transitive or transitive
Direction: One-way or two-way
Authentication types: Kerberos Only
AWS Managed Microsoft AD support: No

Shortcut trusts

This trust type is used to shorten the authentication path between domains within complex forests. See Understanding When to Create a Shortcut Trust for more details.

Transitivity: Transitive
Direction: One-way or two-way
Authentication types: Kerberos and NTLM
AWS Managed Microsoft AD support: No

User Principal Name suffixes

The default User Principal Name (UPN) suffix for a user account is the Domain Name System (DNS) domain name of the domain where the user account resides. In AWS Managed Microsoft AD and self-managed AD, alternative UPN suffixes are added to simplify administration and user logon processes by providing a single UPN suffix for all users. The UPN suffix is used within the Active Directory forest, and is not required to be a valid DNS domain name. See Adding User Principal Name Suffixes for the process to add UPN suffixes to a forest.

For example, if your domain is Example.local but you want your users to sign in with what appears to be another domain name (such as ExampleSuffix.local), you would need to add a new UPN suffix to the domain. Figure 4 shows a user being created with an alternate UPN suffix.

Figure 4: UPN selection on object creation

If you’re logged into a Windows system, you can use the whoami /upn command to see the UPN of the current user.

Forest trusts and name suffix routing

Name suffix routing manages how authentication requests are routed across forest trusts. A unique name suffix is a name suffix within a forest, such as a UPN suffix or DNS forest or domain tree name, that isn’t subordinate to any other name suffix. For example, the DNS forest name Example.com is a unique name suffix within the example.com forest.

All names that are subordinate to unique name suffixes are routed implicitly. For example, if your forest root is named Example.local, authentication requests for all child domains of Example.local (Child.Example.local) will be routed because the child domains are subordinate to the Example.local name suffix. If you want to exclude members of a child domain from authenticating in the specified forest, you can disable name suffix routing for that name. You can also disable routing for the forest name itself, if necessary. With domain trees and additional UPN suffixes, name suffix routing by default is disabled and must be enabled if those suffixes are to be able to traverse the trust.

Note: In AWS Managed Microsoft AD, customers don’t have the ability to create or modify trusts by using the native Microsoft tools. If you need a name suffix route enabled for your trust, open a support case with Premium Support.

A couple of diagrams will make it easier to digest this information. Figure 5 shows the trust configuration. There is a one-way outgoing forest trust from Example.com to Example.local. Example.local has a UPN suffix named ExampleSuffix.local added to it. Example.local also has a child domain named Child and a tree domain named ExampleTree.local. By default, users in Example.local and Child.Example.local will be able to authenticate to resources in Example.com. Users in the ExampleTree.local domain will not be able to authenticate to resources in Example.com, unless the name suffix route for ExampleTree.local is enabled on the trust object in Example.com.

Figure 5: Multi-domain and suffix forest with a trust

Figure 6 is from the trust properties dialog from the Example.com forest of a trust between Example.com and Example.local. As you can see, *.example.local is enabled. But the custom UPN suffix ExampleSuffix.local and the tree domain ExampleTree.local are disabled by default.

Figure 6: Example.local trusts details

Selective authentication

With AWS Managed Microsoft AD and self-managed AD, you have the option of configuring Selective Authentication. This option restricts authentication access over a trust to only the users in a trusted domain or forest who have been explicitly given authentication permissions to computer objects that reside in the trusting domain or forest.

When you use domain or forest-wide authentication, depending on the trust direction, users can authenticate across the trust. Authentication by itself doesn’t provide access—users have to be delegated permissions to access resources. When Selective Authentication is enabled, you must set the Allowed to Authenticate permission on each computer object the trusted user will be accessing, in addition to any other permissions that are required to access the computer object.

While Selective Authentication is a way to provide additional hardening of trusts, it requires a significant amount of planning and delegation, because you have to set the Allowed to Authenticate permission on all computer objects that are being accessed. It can also make troubleshooting permissions and trust issues more difficult.

For more details on Selective Authentication, see Selective Authentication and Configuring Selective Authentication Settings in the Microsoft documentation.

SID filtering

I won’t spend a lot of time on the subject of SID filtering, since this feature is enabled in AWS Managed Microsoft AD and can’t be disabled. SID filtering prevents malicious users who have domain or enterprise administrator level access in a trusted forest from granting elevated user rights to a trusting forest. It does this by preventing misuse of the attributes containing SIDs on security principals in the trusted forest. For example, a malicious user with administrative credentials located in a trusted forest could, through various means, obtain the SID information of a domain or enterprise admin in the trusting forest. After obtaining the SID of an administrator from the trusting forest, a malicious user with administrative credentials can add that SID to the SID history attribute of a security principal in the trusted forest and attempt to gain full access to the trusting forest and the resources within it.

Keeping SID filtering disabled on your on-premises domain can open your domain up to risks from malicious users. We understand that during a domain migration, you may need to disable it to allow an object’s SID from the original domain to be used during the migration. But in AWS Managed Microsoft AD, this filtering cannot be disabled. See SID Filtering for more details.

Network ports that are required to create trusts

The following network ports are required to be open between domain controllers on both domains or forests prior to attempting to create a trust. Note, the Security Group used by your AWS Managed Microsoft AD directory already has these inbound ports open. You will need to adjust the outbound rules of the Security Group to let it communicate with the to be trusted domains or forests. The following table is based on Microsoft’s recommendations. Depending on your use case, some of these ports might not need to be opened. For example, if LDAP over SSL isn’t configured, then TCP 636 isn’t needed.

Port	Protocol	Service
53	TCP and UDP	DNS
88	TCP and UDP	Kerberos
123	UDP	Windows Time
135	TCP	Remote Procedure Call (RPC)
389	TCP and UDP	Lightweight Directory Access Protocol (LDAP)
445	TCP	Server Message Block (SMB)
464	TCP and UDP	Kerberos Password Change
636	TCP	LDAP over SSL
3268	TCP	LDAP Global Catalog (GC)
3269	TCP	LDAP GC over SSL
49152–65535	TCP and UDP	RPC

Trust creation process overview

AWS Managed Microsoft AD is based on Windows Server Active Directory Domain Services, which means that Active Directory trusts function the same way they do with self-managed Active Directory. The only difference is how the trust is created. You use the AWS Management Console or APIs to create the trust for the AWS Managed Microsoft AD side. This process has been documented thoroughly in the AWS Directory Service Administration Guide, so I won’t go into detail on the steps.

The high-level overview of the process is:

Ensure that network and DNS name resolution is available and functional between the domains.
Create the trust on the on-premises Active Directory.
Complete the trust on the AWS Managed Microsoft AD in the AWS Directory Service console.

Common trust scenarios with AWS Managed Microsoft AD

When you create trust between an on-premises domain and AWS Managed Microsoft AD, there are some items to take into consideration that will help you decide what direction of trust you need to deploy. In this post, I’ll cover a couple of the most common scenarios.

All scenarios: Selecting a trust type

Let’s start with the choice between a Forest or External trust. We generally recommend using a Forest trust type. The reason for that is that Forest trusts fully support Kerberos without any caveats. With that said, if you have a specific requirement to implement an External trust, you can do so—just be aware of these caveats.

Scenario 1: Use AWS Managed Microsoft AD as a resource forest for Amazon RDS, Amazon FSx for Windows File Server, or Amazon EC2 instances

In this scenario, you might want to use AWS Managed Microsoft AD as a resource forest for Amazon RDS, Amazon FSx for Windows File Server, or Amazon Elastic Compute Cloud (Amazon EC2). AWS Managed Microsoft AD is going to be a resource domain, and user accounts will reside on the on-premises side of the trust and need to be able to access the resources in the AWS Managed Microsoft AD side of the trust.

In this scenario, the AWS applications (Amazon RDS, Amazon FSx for Windows File Server, or Amazon EC2) don’t require a two-way trust to function, because they are natively integrated with Active Directory. This tells you that you only need authentication to flow one way. This scenario requires a one-way incoming trust on the on-premises domain and one-way outgoing trusts on the AWS Managed Microsoft AD domain. Figure 7 demonstrates this.

Figure 7: A one-way trust

Scenario 2: Use AWS Managed Microsoft AD as a resource forest for all other supported AWS applications

In this scenario, you want to use AWS Managed Microsoft AD as a resource domain for all other supported AWS applications that aren’t included in Scenario 1. As the previous scenario stated, AWS Managed Microsoft AD will be a resource domain, and the user accounts will reside on the on-premises side of the trust and need to be able to access the resources in the AWS Managed Microsoft AD.

In this scenario, AWS applications (Amazon Chime, Amazon Connect, Amazon QuickSight, AWS Single Sign-On, Amazon WorkDocs, Amazon WorkMail, Amazon WorkSpaces, AWS Client VPN, AWS Management Console, and AWS Transfer Family) need to be able to look up objects from the on-premises domain in order for them to function. This tells you that authentication needs to flow both ways. This scenario requires a two-way trust between the on-premises and AWS Managed Microsoft AD domains. Figure 8 demonstrates this.

Figure 8: A two-way trust

Common trust myths and misconceptions

I have had many conversations with customers concerning trusts between their on-premises domain and their AWS Managed Microsoft AD domain. These are some of the common myths and misconceptions we’ve come across in our conversations.

Trusts synchronize objects between each domain.

This is false. A trust between domains or forests acts as a bridge that allows validated authentication requests, in the form of Kerberos or NTLM traffic, to travel between domains or forests. Objects are not synchronized between the domains or forests. Only the trust password is synchronized, which is used for Kerberos.

My password is passed over the trust when authenticating.

This is false. As I showed earlier in the Starting with Kerberos section, when authenticating across trusts, the user’s password is not passed between domains. The only things passed between domains are the Ticket Granting Service (TGS) requests and responses, which are generated in real time, are single use, and expire within hours.

A one-way trust allows bidirectional authentication.

This is false. One-way trusts allow authentications to traverse in one direction only. Users or objects from the trusted domain are able to authenticate and, if they are delegated, to access resources in the trusting domain. Users in the trusting domain can’t authenticate into the trusted domain, and aren’t granted permissions to access resources. Let’s say there is an Amazon FSx file system in Example.local and a one-way trust between Example.com (outgoing trust direction) and Example.local (incoming trust direction). A user in Example.com can’t be delegated permission to the Amazon FSx file system Example.local with the current trust configuration. That’s the nature of a one-way trust.

Trusts are inherently insecure by default.

This is false, although an improperly configured trust can increase your risk and exposure. Trusts by themselves do very little to increase an Active Directory’s attack surface. You should always use best practices when creating a trust to minimize risk. For example, a trust without a purpose should be removed. You should disable the SID History unless you’re in the process of migrating domains. See Security Considerations for Trusts for more guidance on securing trusts.

Users in the trusted domain are granted permissions to my domain when a trust is created.

This is false. By default, with two-way trusts, objects have read-only permission to Active Directory in both directions. Objects are not delegated permissions or access to resources or servers by default. For example, if you want a user to log into a computer in another domain, you first must delegate the user access to the resource in the other domain. Without that delegation, the user won’t be able to access the resource.

Troubleshooting trusts

Based on our experience working with many customers, the vast majority of trust configuration issues are either DNS resolution or networking connectivity errors. These are some troubleshooting steps to help you resolve any of these common issues:

Check whether you allowed outbound networking traffic on the AWS Managed Microsoft AD. See Step 1: Set up your environment for trusts to learn how to find your directory’s security group and how to modify it.
If the DNS server or the network for your on-premises domain uses a public (non-RFC 1918) IP address space, follow these steps:
1. In the AWS Directory Service console, go to the IP routing section for your directory, choose Actions, and then choose Add route.
2. Enter the IP address block of your DNS server or on-premises network using CIDR format, for example 203.0.113.0/24.
  This step isn’t necessary if both your DNS server and your on-premises network are using RFC 1918 private IP address spaces.
After you verify the security group and check whether any applicable routes are required, launch a Windows Server instance and join it to the AWS Managed Microsoft AD directory. See Step 3: Deploy an EC2 instance to manage your AWS Managed Microsoft AD to learn how to do this. Once the instance is launched, do the following:
- Run the following PowerShell command to test DNS connectivity:
  Resolve-DnsName -Name 'example.local' -DnsOnly
You should also look through the message explanations in the Trust creation status reasons guide in the AWS Directory Service documentation.

Summary of AWS Managed Microsoft AD trust considerations

In this blog post, I covered Kerberos authentication over Active Directory trusts and provided details on what Active Directory trusts are and how they function. Here’s a quick list of items that you should consider when you plan trust creation with AWS Managed Microsoft AD:

Ensure that you have a network connection and the appropriate network ports opened between both domains. Note, it is recommended all Active Directory traffic occur over private network connection like a VPN or Direct Connect.
Ensure that DNS resolution is working on both sides of the trust.
Decide whether you will implement selective authentication. If it will be used, plan your Active Directory access control list (ACL) delegation strategy before implementation.
As of this blog’s publication, keep in mind that AWS Managed Microsoft AD currently supports Forest trusts and External trusts only.
Ensure that Kerberos preauthentication is enabled for all objects that traverse trusts with AWS Managed Microsoft AD.
If you find that you need a name suffix route enabled for your trust, open a support case with AWS Support, requesting that the name suffix route be enabled.
Finally, review Security Considerations for Trusts: Domain and Forest Trusts for additional considerations for trust configuration.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Anomaly Detection in AWS Lambda using Amazon DevOps Guru’s ML-powered insights

2021-11-04 Harish Vaswani

Post Syndicated from Harish Vaswani original https://aws.amazon.com/blogs/devops/anomaly-detection-in-aws-lambda-using-amazon-devops-gurus-ml-powered-insights/

Critical business applications are monitored in order to prevent anomalies from negatively impacting their operational performance and availability. Amazon DevOps Guru is a Machine Learning (ML) powered solution that aids operations by detecting anomalous behavior and providing insights and recommendations for how to address the root cause before it impacts the customer.

This post demonstrates how Amazon DevOps Guru can detect an anomaly following a critical AWS Lambda function deployment and its remediation recommendations to fix such behavior.

Solution Overview

Amazon DevOps Guru lets you monitor resources at the region or AWS CloudFormation level. This post will demonstrate how to deploy an AWS Serverless Application Model (AWS SAM) stack, and then enable Amazon DevOps Guru to monitor the stack.

You will utilize the following services:

AWS Lambda
Amazon EventBridge
Amazon DevOps Guru

The architecture diagram shows an AWS SAM stack containing AWS Lambda and Amazon EventBridge resources, as well as Amazon DevOps Guru monitoring the resources in the AWS SAM stack.

Figure 1: Amazon DevOps Guru monitoring the resources in an AWS SAM stack

The architecture diagram shows an AWS SAM stack containing AWS Lambda and Amazon EventBridge resources, as well as Amazon DevOps Guru monitoring the resources in the AWS SAM stack.

This post simulates a real-world scenario where an anomaly is introduced in the AWS Lambda function in the form of latency. While the AWS Lambda function execution time is within its timeout threshold, it is not at optimal performance. This anomalous execution time can result in larger compute times and costs. Furthermore, this post demonstrates how Amazon DevOps Guru identifies this anomaly and provides recommendations for remediation.

Here is an overview of the steps that we will conduct:

First, we will deploy the AWS SAM stack containing a healthy AWS Lambda function with an Amazon EventBridge rule to invoke it on a regular basis.
We will enable Amazon DevOps Guru to monitor the stack, which will show the AWS Lambda function as healthy.
After waiting for a period of time, we will make changes to the AWS Lambda function in order to introduce an anomaly and redeploy the AWS SAM stack. This anomaly will be identified by Amazon DevOps Guru, which will mark the AWS Lambda function as unhealthy, provide insights into the anomaly, and provide remediation recommendations.
After making the changes recommended by Amazon DevOps Guru, we will redeploy the stack and observe Amazon DevOps Guru marking the AWS Lambda function healthy again.

This post also explores utilizing Provisioned Concurrency for AWS Lambda functions and the best practice approach of utilizing Warm Start for variables reuse.

Pricing

Before beginning, note the costs associated with each resource. The AWS Lambda function will incur a fee based on the number of requests and duration, while Amazon EventBridge is free. With Amazon DevOps Guru, you only pay for the data analyzed. There is no upfront cost or commitment. Learn more about the pricing per resource here.

Prerequisites

To complete this post, you need the following prerequisites:

An AWS account. For this post, we utilize the account number 111111111111. We will conduct AWS Serverless Application Model (AWS SAM) stack operations and monitoring in this account.
Access to your local terminal with the AWS SAM command line interface (CLI) installed.
Access to your local terminal with the git CLI.
AWS credentials for enabling the AWS SAM CLI to make calls to AWS Services on your behalf. In this post, AWS SAM needs access to AWS CloudFormation.
An Integrated Development Environment (IDE) of choice installed on your local machine.

Getting Started

We will set up an application stack in our AWS account that contains an AWS Lambda and an Amazon EventBridge event. The event will regularly trigger the AWS Lambda function, which simulates a high-traffic application. To get started, please follow the instructions below:

In your local terminal, clone the amazon-devopsguru-samples repository.

git clone https://github.com/aws-samples/amazon-devopsguru-samples.git

In your IDE of choice, open the amazon-devopsguru-samples repository.
In your terminal, change directories into the repository’s subfolder amazon-devopsguru-samples/generate-lambda-devopsguru-insights.

cd amazon-devopsguru-samples/generate-lambda-devopsguru-insights

Utilize the SAM CLI to conduct a guided deployment of lambda-template.yaml.

sam deploy --guided --template lambda-template.yaml
    Stack Name [sam-app]: DevOpsGuru-Sample-AnomalousLambda-Stack
    AWS Region [us-east-1]: us-east-1
    #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
    Confirm changes before deploy [y/N]: y
    #SAM needs permission to be able to create roles to connect to the resources in your template
    Allow SAM CLI IAM role creation [Y/n]: y
    Save arguments to configuration file [Y/n]: y
    SAM configuration file [samconfig.toml]: y
    SAM configuration environment [default]: default

You should see a success message in your terminal, such as:

Successfully created/updated stack - DevOpsGuru-Sample-AnomalousLambda-Stack in us-east-1.

Enabling Amazon DevOps Guru

Now that we have deployed our application stack, we can enable Amazon DevOps Guru.

Log in to your AWS Account.
Navigate to the Amazon DevOps Guru service page.
Click “Get started”.
In the “Amazon DevOps Guru analysis coverage” section, select “Choose later”, then click “Enable”.

Amazon DevOps Guru analysis coverage menu which asks which AWS resources to analyze. The “Choose later” option is selected.

Figure 2.1: Amazon DevOps Guru analysis coverage menu

On the left-hand menu, select “Settings”
In the “DevOps Guru analysis coverage” section, click on “Manage”.
Select the “Analyze all AWS resources in the specified CloudFormation stacks in this Region” radio button.
The stack created in the previous section should appear. Select it, click “Save”, and then “Confirm”.

Amazon DevOps Guru analysis coverage menu which asks which AWS resources to analyze. The “Analyze all AWS resources in the specified CloudFormation stacks in this Region” option is selected and CloudFormation stacks are displayed to choose from.

Figure 2.2: Amazon DevOps Guru analysis coverage resource selection

Before moving on to the next section, we must allow Amazon DevOps Guru to baseline the resources and benchmark the application’s normal behavior. For our serverless stack with two resources, we recommend waiting two hours before carrying out the next steps. When enabled in a production environment, depending upon the number of resources selected for monitoring, it can take up to 24 hours for Amazon DevOps Guru to complete baselining.

Once baselining is complete, the Amazon DevOps Guru dashboard, an overview of the health of your resources, will display the application stack, DevOpsGuru-Sample-AnomalousLambda-Stack, and mark it as healthy, shown below.

Amazon DevOps Guru Dashboard displays the system health summary and system health overview of each CloudFormation stack. The DevOpsGuru-Sample-AnomalousLambda-Stack is marked as healthy with 0 reactive insights and 0 proactive insights.

Figure 2.3: Amazon DevOps Guru Healthy Dashboard

Enabling SNS

If you would like to set up notifications upon the detection of an anomaly by Amazon DevOps Guru, then please follow these additional instructions.

Amazon DevOps Guru Specify an SNS topic menu which enables notifications for important DevOps Guru events. No SNS topics are currently configured.

Figure 3: Amazon DevOps Guru Specify an SNS topic

Invoking an Anomaly

Once Amazon DevOps Guru has identified the stack as healthy, we will update the AWS Lambda function with suboptimal code. This update will simulate an update to critical business applications which are causing the anomalous performance.

Open the amazon-devopsguru-samples repository in your IDE.
Open the file generate-lambda-devopsguru-insights/lambda-code.py
Uncomment lines 7-8 and save the file. These lines of code will produce an anomaly due to the function’s increased runtime.
Deploy these updates to your stack by running:

cd generate-lambda-devopsguru-insights 
sam deploy --template lambda-template.yaml -stack-name DevOpsGuru-Sample-AnomalousLambda-Stack

Anomaly Overview

Shortly after, Amazon DevOps Guru will generate a reactive insight from the sample stack. This insight contains recommendations, metrics, and events related to anomalous behavior. View the unhealthy stack status in the Dashboard.

Amazon DevOps Guru Dashboard displays the system health summary and system health overview of each CloudFormation stack. The DevOpsGuru-Sample-AnomalousLambda-Stack is marked as unhealthy with 1 reactive insights and 0 proactive insights.

Figure 4.1: Amazon DevOps Guru Unhealthy Dashboard

By clicking on the “Ongoing reactive insight” within the tile, you will be brought to the Insight Details page. This page contains an array of useful information to help you understand and address anomalous behavior.

Insight overview

Utilize this section to get a high-level overview of the insight. You can see that the status of the insight is ongoing, 1 AWS CloudFormation stack is affected, the insight started on Sept-08-2021, it does not have an end time, and it was last updated on Sept-08-2021.

Amazon DevOps Guru Insight Details page has multiple information sections. The Insight overview is the first section which displays the status is ongoing, there is 1 affected stack, the start time and last updated time. The end time is empty as the insight is ongoing.

Figure 4.2: Amazon DevOps Guru Ongoing Reactive Insight Overview

Aggregated metrics

The Aggregated metrics tab displays metrics related to the insight. The table is grouped by AWS CloudFormation stacks and subsequent resources that created the metrics. In this example, the insight was a product of an anomaly in the “duration p50” metric generated by the “DevOpsGuruSample-AnomalousLambda” AWS Lambda function.

AWS Lambda duration metrics derive from a percentile statistic utilized to exclude outlier values that skew average and maximum statistics. The P50 statistic is typically a great middle estimate. It is defined as 50% of estimates exceed the P50 estimate and 50% of estimates are less than the P50 estimate.

The red lines on the timeline indicate spans of time when the “duration p50” metric emitted unusual values. Click the red line in the timeline in order to view detailed information.

Choose View in CloudWatch to see how the metric looks in the CloudWatch console. For more information, see Statistics and Dimensions in the Amazon CloudWatch User Guide.
Hover over the graph in order to view details about the anomalous metric data and when it occurred.
Choose the box with the downward arrow to download a PNG image of the graph.

Amazon DevOps Guru Insight Details page contains aggregated metrics. The Duration p50 metric is selected and displayed in graph form.

Figure 4.3: Amazon DevOps Guru Ongoing Reactive Insight Aggregated Metrics

Graphed anomalies

The Graphed anomalies tab displays detailed graphs for each of the insight’s anomalies. Because our insight was comprised of a single anomaly, there is one tile with details about unusual behavior detected in related metrics.

Choose View all statistics and dimensions in order to see details about the anomaly. In the window that opens, you can:
Choose View in CloudWatch in order to see how the metric looks in the CloudWatch console.
Hover over the graph to view details about the anomalous metric data and when it occurred.
Choose Statistics or Dimension in order to customize the graph’s display. For more information, see Statistics and Dimensions in the Amazon CloudWatch User Guide.

Amazon DevOps Guru Insight Details page contains Graphed anomalies. The p50 metric of the AWS/Lambda duration in displayed in graph form.

Figure 4.4: Amazon DevOps Guru Ongoing Reactive Insight Graphed Anomaly

Related events

In Related events, view AWS CloudTrail events related to your insight. These events help understand, diagnose, and address the underlying cause of the anomalous behavior. In this example, the events are:

CreateFunction – when we created and deployed the AWS SAM template containing our AWS Lambda function.
CreateChangeSet – when we pushed updates to our stack via the AWS SAM CLI.
UpdateFunctionCode – when the AWS Lambda function code was updated.

Continuation of figure 4.4

Figure 4.5: Amazon DevOps Guru Ongoing Reactive Insight Related Events

Recommendations

The final section in the Insight Detail page is Recommendations. You can view suggestions that might help you resolve the underlying problem. When Amazon DevOps Guru detects anomalous behavior, it attempts to create recommendations. An insight might contain one, multiple, or zero recommendations.

In this example, the Amazon DevOps Guru recommendation matches the best resolution to our problem-provisioned concurrency.

Amazon DevOps Guru Insight Details page contains Recommendations. The suggested recommendation is to configure provisioned concurrency for the AWS Lambda.

Figure 4.6: Amazon DevOps Guru Ongoing Reactive Insight Recommendations

Understanding what happened

Amazon DevOps Guru recommends enabling Provisioned Concurrency for the AWS Lambda functions in order to help it scale better when responding to concurrent requests. As mentioned earlier, Provisioned Concurrency keeps functions initialized by creating the requested number of execution environments so that they can respond to invocations. This is a suggested best practice when building high-traffic applications, such as the one that this sample is mimicking.

In the anomalous AWS Lambda function, we have sample code that is causing delays. This is analogous to application initialization logic within the handler function. It is a best practice for this logic to live outside of the handler function. Because we are mimicking a high-traffic application, the expectation is to receive a large number of concurrent requests. Therefore, it may be advisable to turn on Provisioned Concurrency for the AWS Lambda function. For Provisioned Concurrency pricing, refer to the AWS Lambda Pricing page.

Resolving the Anomaly

To resolve the sample application’s anomaly, we will update the AWS Lambda function code and enable provisioned concurrency for the AWS Lambda infrastructure.

Opening the sample repository in your IDE.
Open the file generate-lambda-devopsguru-insights/lambda-code.py.
Move lines 7-8, the code forcing the AWS Lambda function to respond slowly, above the lambda_handler function definition.
Save the file.
Open the file generate-lambda-devopsguru-insights/lambda-template.yaml.
Uncomment lines 15-17, the code enabling provisioned concurrency in the sample AWS Lambda function.
Save the file.
Deploy these updates to your stack.

cd generate-lambda-devopsguru-insights 
sam deploy --template lambda-template.yaml --stack-name DevOpsGuru-Sample-AnomalousLambda-Stack

After completing these steps, the duration P50 metric will emit more typical results, thereby causing Amazon DevOps Guru to recognize the anomaly as fixed, and then close the reactive insight as shown below.

Amazon DevOps Guru Insight Summary page displays the reactive insight has been closed.

Figure 5: Amazon DevOps Guru Closed Reactive Insight

Clean Up

When you are finished walking through this post, you will have multiple test resources in your AWS account that should be cleaned up or un-provisioned in order to avoid incurring any further charges.

Opening the sample repository in your IDE.
Run the below AWS SAM CLI command to delete the sample stack.

cd generate-lambda-devopsguru-insights 
sam delete --stack-name DevOpsGuru-Sample-AnomalousLambda-Stack

Conclusion

As seen in the example above, Amazon DevOps Guru can detect anomalous behavior in an AWS Lambda function, tie it to relevant events that introduced that anomaly, and provide recommendations for remediation by using its pre-trained ML models. All of this was possible by simply enabling Amazon DevOps Guru to monitor the resources with minimal configuration changes and no previous ML expertise. Start using Amazon DevOps Guru today.

About the authors

How Cimpress Built a Self-service, API-driven Data Platform Ingestion Service

2021-10-22 Ethan Fahy

Post Syndicated from Ethan Fahy original https://aws.amazon.com/blogs/architecture/how-cimpress-built-a-self-service-api-driven-data-platform-ingestion-service/

Cimpress is a global company that specializes in mass customization, empowering individuals and businesses to design, personalize and customize their own products – such as packaging, signage, masks, and clothing – and to buy those products affordably.

Cimpress is composed of multiple businesses that have the option to use the Cimpress data platform. To provide business autonomy and keep the central data platform team lean and nimble, Cimpress designed their data platform to be scalable, self-service, and API-driven. During a design update to River, the data platform’s data ingestion service, Cimpress engaged AWS to run an AWS Data Lab. In this post, we’ll show how River provides this data ingestion interface across all Cimpress businesses. If your company has a decentralized organizational structure and challenges with a centralized data platform, read on and consider how a self-service, API-driven approach might help.

Evaluating the legacy data ingestion platform and identifying requirements for its replacement

The collaboration process between AWS and Cimpress started with a discussion of pain points with the existing Cimpress data platform’s ingestion service:

Changes were time consuming. The existing data ingestion service was built using Apache Kafka and Apache Spark. When a data stream owner needed new fields or changes in the stream payload schema, it could take weeks working with a data engineer. Spark configuration needed to be manually modified, tested, and then the Spark job had to be deployed to an operational platform.
Scaling was not automated. The existing Spark clusters required large Amazon EC2 instances that were manually scaled. This led to an inefficient use of resources that was driving unnecessary cost.

Assessing these pain points led to the design of an improved data ingestion service that was:

Fully automated and self-service via an API. This would reduce the dependency on engineering support from the central data platform team and decreased the time to production for new data streams.
Serverless with automatic scaling. This would improve performance efficiency and cost optimization while freeing up engineering time.

The rearchitected Cimpress data platform ingestion service

Figure 1. River architecture diagram, depicting the flow of data from data producers through the River data ingestion service into Snowflake.

The River data ingestion service provides data producers with a self-service mechanism to push their data into Snowflake, the Cimpress data platform’s data warehouse service. As seen in Figure 1, here are the steps involved in that process:

1. Data producers use the River API to create a new River data “stream.” Each Cimpress business is given its own Snowflake account in which they can create logically separated data ingestion “streams.” A River data stream can be configured with multiple data layers, fine-tuned permissions, entity tables, entry de-duplication, data delivery monitoring, and Snowflake data clustering.

2. Data producers get access to a River WebUI. In addition to the River API, end users are also able to access a web-based user interface for simplified configuration, monitoring, and management.

3. Data producers push data to the River API. The River RESTful API uses an Application Load Balancer (ALB) backed by AWS Lambda.

4. Data is processed by Amazon Kinesis Data Firehose.

- For payloads under 1 MB, data is passed directly through the ALB into a Lambda function, which deposits the data into Amazon Kinesis Data Firehose. Amazon Kinesis Data Firehose buffers and transforms the data before delivering it to Amazon S3.
- Data payloads over 1 MB can’t pass through the same pipeline. There is a 1-MB limit on payload size coming through an ALB into an AWS Lambda function, and a 1-MB maximum record size for Amazon Kinesis Data Firehose. To circumvent these limits, the River API generates and returns an Amazon S3 presigned URL to the data producer. They can use this Amazon S3 presigned URL to put their large data payloads directly into Amazon S3. Once the data lands in S3, Amazon Simple Notification Service (SNS) invokes a Lambda function to validate the payload format. It then moves it from the S3 landing zone to the standard S3 location with the rest of the buffered data.

5. Data is uploaded to Snowflake. Data objects that land on Amazon S3 initiate an event notification to Snowflake’s Snowpipe continuous data ingestion service. This loads the data from S3 into a Snowflake account.

All the AWS services used in the River architecture are serverless with automatic scaling. This means that the Cimpress data platform team can remain lean and agile. Engineering resources to infrastructure management tasks like capacity provisioning and patching are minimized. These AWS services also have pay-as-you-go pricing, so the cost scales predictably with the amount of data ingested.

Conclusions

The Cimpress data platform team redesigned their data ingestion service to be self-service, API-driven, and highly scalable. As usage of the Cimpress data platform has grown, Cimpress businesses operate more autonomously with speed and agility. As of today, the River data ingestion service reliably processes 2–3 billion messages per month across more than 1,000 different data streams. It drives data insights for more than 7,000 active users of the data platform across all of the Cimpress businesses.

Read more on the topics in this post and get started with building a data platform on AWS:

Enabling data classification for Amazon RDS database with Macie

2021-10-05 Bruno Silveira

Post Syndicated from Bruno Silveira original https://aws.amazon.com/blogs/security/enabling-data-classification-for-amazon-rds-database-with-amazon-macie/

Customers have been asking us about ways to use Amazon Macie data discovery on their Amazon Relational Database Service (Amazon RDS) instances. This post presents how to do so using AWS Database Migration Service (AWS DMS) to extract data from Amazon RDS, store it on Amazon Simple Storage Service (Amazon S3), and then classify the data using Macie. Macie’s resulting findings will also be made available to be queried with Amazon Athena by appropriate teams.

The challenge

Let’s suppose you need to find sensitive data in an RDS-hosted database using Macie, which currently only supports S3 as a data source. Therefore, you will need to extract and store the data from RDS in S3. In addition, you will need an interface for audit teams to audit these findings.

Solution overview

Figure 1: Solution architecture workflow

The architecture of the solution in Figure 1 can be described as:

A MySQL engine running on RDS is populated with the Sakila sample database.
A DMS task connects to the Sakila database, transforms the data into a set of Parquet compressed files, and loads them into the dcp-macie bucket.
A Macie classification job analyzes the objects in the dcp-macie bucket using a combination of techniques such as machine learning and pattern matching to determine whether the objects contain sensitive data and to generate detailed reports on the findings.
Amazon EventBridge routes the Macie findings reports events to Amazon Kinesis Data Firehose.
Kinesis Data Firehose stores these reports in the dcp-glue bucket.
S3 event notification triggers an AWS Lambda function whenever an object is created in the dcp-glue bucket.
The Lambda function named Start Glue Workflow starts a Glue Workflow.
Glue Workflow transforms the data from JSONL to Apache Parquet file format and places it in the dcp-athena bucket. This provides better performance during data query and optimized storage usage using a binary optimized columnar storage.
Athena is used to query and visualize data generated by Macie.

Note: For better readability, the S3 bucket nomenclature omits the suffix containing the AWS Region and AWS account ID used to meet the global uniqueness naming requirement (for example, dcp-athena-us-east-1-123456789012).

The Sakila database schema consists of the following tables:

actor
address
category
city
country
customer

Building the solution

Prerequisites

Before configuring the solution, the AWS Identity and Access Management (IAM) user must have appropriate access granted for the following services:

You can find an IAM policy with the required permissions here.

Step 1 – Deploying the CloudFormation template

You’ll use CloudFormation to provision quickly and consistently the AWS resources illustrated in Figure 1. Through a pre-built template file, it will create the infrastructure using an Infrastructure-as-Code (IaC) approach.

Download the CloudFormation template.
Go to the CloudFormation console.
Select the Stacks option in the left menu.
Select Create stack and choose With new resources (standard).
On Step 1 – Specify template, choose Upload a template file, select Choose file, and select the file template.yaml downloaded previously.
On Step 2 – Specify stack details, enter a name of your preference for Stack name. You might also adjust the parameters as needed, like the parameter CreateRDSServiceRole to create a service role for RDS if it does not exist in the current account.
On Step 3 – Configure stack options, select Next.
On Step 4 – Review, check the box for I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
Wait for the stack to show status CREATE_COMPLETE.

Note: It is expected that provisioning will take around 10 minutes to complete.

Step 2 – Running an AWS DMS task

To extract the data from the Amazon RDS instance, you need to run an AWS DMS task. This makes the data available for Amazon Macie in an S3 bucket in Parquet format.

Go to the AWS DMS console.
In the left menu, select Database migration tasks.
Select the task Identifier named rdstos3task.
Select Actions.
Select the option Restart/Resume.

When the Status changes to Load Complete the task has finished and you will be able to see migrated data in your target bucket (dcp-macie).

Inside each folder you can see parquet file(s) with names similar to LOAD00000001.parquet. Now you can use Macie to discover if you have sensitive data in your database contents as exported to S3.

Step 3 – Running a classification job with Amazon Macie

Now you need to create a data classification job so you can assess the contents of your S3 bucket. The job you create will run once and evaluate the complete contents of your S3 bucket to determine whether it can identify PII among the data. As mentioned earlier, this job only uses the managed identifiers available with Macie – you could also add your own custom identifiers.

Go to the Macie console.
Select the S3 buckets option in the left menu.

Choose the S3 bucket dcp-macie containing the output data from the DMS task. You may need to wait a minute and select the Refresh icon if the bucket names do not display.

Select Create job.
Select Next to continue.
Create a job with the following scope.
1. Sensitive data Discovery options: One-time job
2. Sampling Depth: 100%
3. Leave all other settings with their default values
Select Next to continue.
Select Next again to skip past the Custom data identifiers section.
Give the job a name and description.
Select Next to continue.
Verify the details of the job you have created and select Submit to continue.

You will see a green banner stating that The Job was successfully created. The job can take up to 15 minutes to conclude and the Status will change from Active to Complete. To open the findings from the job, select the job’s check box, choose Show results, and select Show findings.

Figure 2: Macie Findings screen

Note: You can navigate in the findings and select each checkbox to see the details.

Step 4 – Enabling querying on classification job results with Amazon Athena

Go to the Athena console and open the Query editor.
If it’s your first-time using Athena you will see a message Before you run your first query, you need to set up a query result location in Amazon S3. Learn more. Select the link presented with this message.
In the Settings window, choose Select and then choose the bucket dcp-assets to store the Athena query results.
(Optional) To store the query results encrypted, check the box for Encrypt query results and select your preferred encryption type. To learn more about Amazon S3 encryption types, see Protecting data using encryption.
Select Save.

Step 5 – Query Amazon Macie results with Amazon Athena.

It might take a few minutes for the data to complete the flow between Amazon Macie and AWS Glue. After it’s finished, you’ll be able to see in Athena’s console the table dcp_athena within the database dcp.

Then, select the three dots next to the table dcp_athena and select the Preview table option to see a data preview, or run your own custom queries.

Figure 3: Athena table preview

As your environment grows, this blog post on Top 10 Performance Tuning Tips for Amazon Athena can help you apply partitioning of data and consolidate your data into larger files if needed.

Clean up

After you finish, to clean up the solution and avoid unnecessary expenses, complete the following steps:

Go to the Amazon S3 console.
Navigate to each of the buckets listed below and delete all its objects:
- dcp-assets
- dcp-athena
- dcp-glue
- dcp-macie
Go to the CloudFormation console.
Select the Stacks option in the left menu.
Choose the stack you created in Step 1 – Deploying the CloudFormation template.
Select Delete and then select Delete Stack in the pop-up window.

Conclusion

In this blog post, we show how you can find Personally Identifiable Information (PII), and other data defined as sensitive, in Macie’s Managed Data Identifiers in an RDS-hosted MySQL database. You can use this solution with other relational databases like PostgreSQL, SQL Server, or Oracle, whether hosted on RDS or EC2. If you’re using Amazon DynamoDB, you may also find useful the blog post Detecting sensitive data in DynamoDB with Macie.

By classifying your data, you can inform your management of appropriate data protection and handling controls for the use of that data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Enable Security Hub PCI DSS standard across your organization and disable specific controls

2021-09-30 Pablo Pagani

Post Syndicated from Pablo Pagani original https://aws.amazon.com/blogs/security/enable-security-hub-pci-dss-standard-across-your-organization-and-disable-specific-controls/

At this time, enabling the PCI DSS standard from within AWS Security Hub enables this compliance framework only within the Amazon Web Services (AWS) account you are presently administering.

This blog post showcases a solution that can be used to customize the configuration and deployment of the PCI DSS standard compliance standard using AWS Security Hub across multiple AWS accounts and AWS Regions managed by AWS Organizations. It also demonstrates how to disable specific standards or controls that aren’t required by your organization to meet its compliance requirement. This solution can be used as a baseline for implementation when creating new AWS accounts through the use of AWS CloudFormation StackSets.

Solution overview

Figure 1 that follows shows a sample account setup using the automated solution in this blog post to enable PCI DSS monitoring and reporting across multiple AWS accounts using AWS Organizations. The hierarchy depicted is of one management account used to monitor two member accounts with infrastructure spanning across multiple Regions. Member accounts are configured to send their Security Hub findings to the designated Security Hub management account for centralized compliance management.

Figure 1: Security Hub deployment using AWS Organizations

Prerequisites

The following prerequisites must be in place in order to enable the PCI DSS standard:

A designated administrator account for Security Hub.
Security Hub enabled in all the desired accounts and Regions.
Access to the management account for the organization. The account must have the required permissions for stack set operations.
Choose which deployment targets (accounts and Regions) you want to enable the PCI DSS standard. Typically, you set this on the accounts where Security Hub is already enabled, or on the accounts where PCI workloads reside.
(Optional) If you find standards or controls that aren’t applicable to your organization, get the Amazon Resource Names (ARNs) of the desired standards or controls to disable.

Solution Resources

The CloudFormation template that you use in the following steps contains:

An AWS Lambda function—SHLambdaFunction—to configure and deploy the setup procedures within Security Hub.
An AWS Identity and Access Management (IAM) role—SHLambdaRole—with the required permissions needed to deploy the solution.
A custom resource—SHConfiguration—triggers the Lambda function to begin setup procedures.

Solution deployment

To set up this solution for automated deployment, stage the following CloudFormation StackSet template for rollout via the AWS CloudFormation service. The stack set runs across the organization at the root or organizational units (OUs) level of your choice. You can choose which Regions to run this solution against and also to run it each time a new AWS account is created.

To deploy the solution

Open the AWS Management Console.
Download the sh-pci-enabler.yaml template and save it to an Amazon Simple Storage Services (Amazon S3) bucket on the management account. Make a note of the path to use later.
Navigate to CloudFormation service on the management account. Select StackSets from the menu on the left, and then choose Create StackSet.

Figure 2: CloudFormation – Create StackSet
On the Choose a template page, go to Specify template and select Amazon S3 URL and enter the path to the sh-pci-enabler.yaml template you saved in step 2 above. Choose Next.

Figure 3: CloudFormation – Choose a template
Enter a name and (optional) description for the StackSet. Choose Next.

Figure 4: CloudFormation – enter StackSet details
(Optional) On the Configure StackSet options page, go to Tags and add tags to identify and organize your stack set.

Figure 5: CloudFormation – Configure StackSet options
Choose Next.
On the Set deployment options page, select the desired Regions, and then choose Next.

Figure 6: CloudFormation – Set deployment options
Review the definition and select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit.

Figure 7: CloudFormation – Review, acknowledge, and submit
After you choose Submit, you can monitor the creation of the StackSet from the Operations tab to ensure that deployment is successful.

Figure 8: CloudFormation – Monitor creation of the StackSet

Disable standards that don’t apply to your organization

To disable a standard that isn’t required by your organization, you can use the same template and steps as described above with a few changes as explained below.

To disable standards

Start by opening the SH-PCI-enabler.yaml template and saving a copy under a new name.
In the template, look for sh.batch_enable_standards. Change it to sh.batch_disable_standards.
Locate standardArn=f”arn:aws:securityhub:{region}::standards/pci-dss/v/3.2.1″ and change it to the desired ARN. To find the correct standard ARN, you can use the AWS Command Line Interface (AWS CLI) or AWS CloudShell to run the command aws securityhub describe-standards.

Figure 9: Describe Security Hub standards using CLI

Note: Be sure to keep the f before the quotation marks and replace any Region you might get from the command with the {region} variable. If the CIS standard doesn’t have the Region defined, remove the variable.

Disable controls that don’t apply to your organization

When you enable a standard, all of the controls for that standard are enabled by default. If necessary, you can disable specific controls within an enabled standard.

When you disable a control, the check for the control is no longer performed, no additional findings are generated for that control, and the related AWS Config rules that Security Hub created are removed.

Security Hub is a regional service. When you disable or enable a control, the change is applied in the Region that you specify in the API request. Also, when you disable an entire standard, Security Hub doesn’t track which controls were disabled. If you enable the standard again later, all of the controls in that standard will be enabled.

To disable a list of controls

Open the Security Hub console and select Security standards from the left menu. For each check you want to disable, select Finding JSON and make a note of each StandardsControlArn to add to your list.

Note: Another option is to use the DescribeStandardsControls API to create a list of StandardsControlArn to be disabled.

Figure 10: Security Hub console – finding JSON download option
Download the StackSet SH-disable-controls.yaml template to your computer.
Use a text editor to open the template file.

Locate the list of controls to disable, and edit the template to replace the provided list of StandardsControlArn with your own list of controls to disable, as shown in the following example. Use a comma as the delimiter for each ARN.

controls=f"arn:aws:securityhub:{region}:{account_id}:control/aws-foundational-security-best-practices/v/1.0.0/ACM.1, arn:aws:securityhub:{region}:{account_id}:control/aws-foundational-security-best-practices/v/1.0.0/APIGateway.1, arn:aws:securityhub:{region}:{account_id}:control/aws-foundational-security-best-practices/v/1.0.0/APIGateway.2"

Save your changes to the template.
Follow the same steps you used to deploy the PCI DSS standard, but use your edited template.

Note: The region and account_id are set as variables, so you decide in which accounts and Regions to disable the controls from the StackSet deployment options (step 8 in Deploy the solution).

Troubleshooting

The following are issues you might encounter when you deploy this solution:

StackSets deployment errors: Review the troubleshooting guide for CloudFormation StackSets.
Dependencies issues: To modify the status of any standard or control, Security Hub must be enabled first. If it’s not enabled, the operation will fail. Make sure you meet the prerequisites listed earlier in this blog post. Use CloudWatch logs to analyze possible errors from the Lambda function to help identify the cause.
StackSets race condition error: When creating new accounts, the Organizations service enables Security Hub in the account, and invokes the stack sets during account creation. If the stack set runs before the Security Hub service is enabled, the stack set can’t enable the PCI standard. If this happens, you can fix it by adding the Amazon EventBridge rule as shown in SH-EventRule-PCI-enabler.yaml. The EventBridge rule invokes the SHLambdaFunctionEB Lambda function after Security Hub is enabled.

Conclusion

The AWS Security Hub PCI DSS standard is fundamental for any company involved with storing, processing, or transmitting cardholder data. In this post, you learned how to enable or disable a standard or specific controls in all your accounts throughout the organization to proactively monitor your AWS resources. Frequently reviewing failed security checks, prioritizing their remediation, and aiming for a Security Hub score of 100 percent can help improve your security posture.

Migrate Resources Between AWS Accounts

2021-09-28 Ashok Srirama

Post Syndicated from Ashok Srirama original https://aws.amazon.com/blogs/architecture/migrate-resources-between-aws-accounts/

Have you ever wondered how to move resources between Amazon Web Services (AWS) accounts? You can really view this as a migration of resources. Migrating resources from one AWS account to another may be desired or required due to your business needs. Following are a few scenarios where this may be of benefit:

When you acquire, sell, or merge overseas operations from other businesses.
When you move regional operations from one Managed Service Provider (MSP) to another.
When you reorganize your AWS account and organizational structure.

This process may involve migrating the infrastructure either partially or completely.

In this blog, we will discuss various approaches to migrating resources based on type, configuration, and workload needs. Usually, the first consideration is infrastructure. What’s in your environment? What are the interdependencies? How will you migrate each resource?

Using this information, you can outline a plan on how you will approach migrating each of the resources in your portfolio, and in what order.

Here are some considerations to address for a typical migration:

Infrastructure – This includes resources required to run your workloads that do not have any persistent data. Examples are AWS Lambda functions, Amazon Elastic Container Service clusters, Elastic load balancers, and more.
Compute – This includes Amazon Elastic Compute Cloud (Amazon EC2) instances that store persistent data in Amazon Elastic Block Store (EBS) volumes.
Storage – This includes file and object storage services like Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS), or others.
Database – This includes managed database services like Amazon Relational Database Service (RDS), Amazon DynamoDB, and Amazon Redshift.

Let’s look at each of these considerations in detail.

Migrating infrastructure

To migrate infrastructure that includes ephemeral resources, you can use one of the following Infrastructure as Code (IaC) approaches, shown in Figure 1. IaC templates are like programming scripts that automate the provisioning of IT resources.

Figure 1. Approaches to migrate infrastructure using IaC

1. If you are already using AWS CloudFormation templates, you can easily import the existing templates to the target AWS account.

AWS CloudFormation simplifies provisioning and management on AWS. You can create templates for quick and reliable provisioning of services or applications (called “stacks”).

2. You can use tools like Former2 to templatize your existing resources in the source AWS account and deploy them in the target account.

Former2 is an open-source project that allows you to generate IaC templates. For example, AWS CloudFormation or HashiCorp Terraform can be generated from the existing resources within your AWS account. Read Accelerate infrastructure as code development with open source Former2 for step-by-step guidance.

Migrating compute resources

To migrate compute resources that have a persistent state, you can use one of the following approaches, shown in Figure 2. These provide a virtual computing environment, allowing you to launch instances with a variety of operating systems.

Figure 2. Approaches to migrate compute resources

1. If you are already using AWS Backup service and AWS Organizations to centrally manage backup policies, you can enable AWS Backup cross-account management feature. This manages, monitors, restores your backup, and copies jobs across AWS accounts. Ensure you have both accounts in same AWS Organization. Once the backups are available in the target account, you can restore EC2 instances. Follow detailed instructions at Creating backup copies across AWS accounts.

AWS Backup is a fully managed data protection service that centralizes and automates data across AWS services, in the cloud, and on-premises. You can configure backup policies and monitor activity for your AWS resources. You can automate and consolidate backup tasks that were previously performed service-by-service. This removes the need to create custom scripts and use manual processes.

2. Create an Amazon Machine Image of your EC2 instances and share it with the target account. You can launch new EC2 instances using the shared AMI. Follow step-by-step instructions: How do I transfer an Amazon EC2 instance or AMI to a different AWS account?

Amazon Machine Image (AMI) provides the information required to launch an instance. Specify an AMI and then launch multiple instances from a single AMI with the same configuration. You can use different AMIs to launch instances when you need instances with different configurations.

For migrating non-persistent compute resources, refer Migrating Infrastructure section.

Migrating storage resources

AWS offers various storage services including object, file, and block storage. To migrate objects from a S3 bucket, you can take the following approaches, shown in Figure 3a.

Figure 3a. Approaches to migrate S3 buckets

1. Use Amazon S3 command line interface (CLI) commands to copy the initial load of objects from the source account to the target account. Read How can I copy S3 objects from another AWS account? After the initial copy, you can enable Amazon S3 replication feature to continuously replicate object changes across accounts. Add a bucket policy to grant source bucket permission to replicate objects in destination bucket. Read this walkthrough on how to configure replications.

2. If the S3 bucket contains large number of objects, use Amazon S3 Batch operations to copy objects across AWS accounts in bulk. Read Cross-account bulk transfer of files using Amazon S3 Batch Operations.

To migrate files from an Amazon EFS file system, you can take the following approach, shown in Figure 3b.

Figure 3b. Approach to migrate EFS file systems

Use AWS DataSync agent to transfer data from one EFS file system to another. AWS DataSync is an online transfer service that simplifies moving, copying, and synchronizing large amounts of data between on-premises storage systems and AWS storage services. Read Transferring file data across AWS Regions and accounts using AWS DataSync for step-by-step guidance.

Migrating database resources

AWS offers various purpose-built database engines. These include relational, key-value, document, in-memory, graph, time series, wide column, and ledger databases. To migrate relational databases, you can take one of the following approaches, shown in Figure 4.

Figure 4. Approaches to migrate relational database resources

1. If you want to continuously replicate data changes, use AWS Database Migration Service (AWS DMS) to replicate your data across AWS accounts with high availability. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. You can set up a DMS task for either one-time migration or on-going replication. An on-going replication task keeps your source and target databases in sync. Once set up, the on-going replication task will continuously apply source changes to the target with minimal latency. Learn how to Set Up AWS DMS for Cross-Account Migration.

AWS DMS is a cloud service that streamlines the migration of relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.

2. Use RDS Snapshots to create and share database backups across AWS accounts. Use the shared snapshots to launch new Amazon Relational Database Service (RDS) instances in the target account. Read step-by-step instructions: How can I share an encrypted Amazon RDS DB snapshot with another account?

3. Use AWS Backup to create backup policies that automatically back up your AWS resources. Use AWS Backup cross-account management feature to manage and monitor your backup, restore, and copy jobs across AWS accounts. Once the backups are available in the target account, you can restore RDS instances. Learn about Creating backup copies across AWS accounts.

In this section, we discussed relational databases migration. You can also use AWS DMS for migrating other databases. Read supported AWS DMS source and target databases.

Conclusion

In this blog post, we discussed various approaches you can take to migrate resources from one account to another depending upon the resource type and configuration. Additionally, you can also explore CloudEndure Migration for continuous data replication. Learn more about Migrating workloads across AWS Regions with CloudEndure Migration.

How to automate incident response to security events with AWS Systems Manager Incident Manager

2021-09-17 Sumit Patel

Post Syndicated from Sumit Patel original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-to-security-events-with-aws-systems-manager-incident-manager/

Incident response is a core security capability for organizations to develop, and a core element in the AWS Cloud Adoption Framework (AWS CAF). Responding to security incidents quickly is important to minimize their impacts. Automating incident response helps you scale your capabilities, rapidly reduce the scope of compromised resources, and reduce repetitive work by your security team.

In this post, I show you how to use Incident Manager, a capability of AWS Systems Manager, to build an effective automated incident management and response solution to security events.

You’ll walk through three common security-related events and how you can use Incident Manager to automate your response.

AWS account root user activity: An Amazon Web Services (AWS) account root user has full access to all your resources for all AWS services, including billing information. It’s therefore elemental to adhere to the best practice of using the root user only to create your first IAM user and securely lock away the root user credentials and use them to perform only a few account and service management tasks. And it is critical to be aware when root user activity occurs in your AWS account.
Amazon GuardDuty high severity findings: Amazon GuardDuty is a threat detection service that continuously monitors for malicious or unauthorized behavior to help protect your AWS accounts and workloads. In this blog post, you’ll learn how to initiate an incident response plan whenever a high severity finding is discovered.
AWS Config rule change and S3 bucket allowing public access: AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. You will use AWS Config to monitor your Amazon Simple Storage Service (S3) bucket ACLs and policies for settings that allow public read or public write access.

Prerequisites

If this is your first time using Incident Manager, follow the initial onboarding steps in Getting prepared with Incident Manager.

Incident Manager can start managing incidents automatically using Amazon CloudWatch or Amazon EventBridge. For the solution in this blog post, you will use EventBridge to capture events and start an incident.

To complete the steps in this walkthrough, you need the following:

An AWS account and AWS Identity Access and Management (IAM) permissions to access Systems Manager, GuardDuty, Config, S3, and EventBridge. Your IAM user or role should also have iam:CreateServiceLinkedRole permissions. Incident Manager uses this permission to create the service-linked role AWSServiceRoleforIncidentManager in your account. For more information, see Using service-linked roles for Incident Manager.
To enable Systems Manager, follow the steps in the Manage instances using AWS Systems Manager Quick Setup blog post. If you want to customize Systems Manager beyond the quick setup, see Setting up AWS Systems Manager.
To enable GuardDuty in your account, follow the steps in Getting started with GuardDuty.
To enable AWS Config in your account, follow the steps in Getting started with AWS Config.

Create an Incident Manager response plan

A response plan ties together the contacts, escalation plan, and runbook. When an incident occurs, a response plan defines who to engage, how to engage, which runbook to initiate, and which metrics to monitor. By creating a well-defined response plan, you can save your security team time down the road.

Add contacts

Your contacts should include everyone who might be involved in the incident. Follow these steps to add a contact.

To add contacts

Open the AWS Management Console, and then go to Systems Manager within the console, expand Operations Management, and then expand Incident Manager.
Choose Contacts, and then choose Create contact.

Figure 1: Adding contact details
On Contact information, enter names and define contact channels for your contacts.
Under Contact channel, you can select Email, SMS, or Voice. You can also add multiple contact channels.
In Engagement plan, specify how fast to engage your responders. In the example illustrated below, the incident responder will be engaged through email immediately (0 minutes) when an incident is detected and then through SMS 10 minutes into an incident. Complete the fields and then choose Create.

Figure 2: Engagement plan

Create a response plan

Once you’ve created your contacts, you can create a response plan to define how to respond to incidents. Refer to the Best Practices for Response Plans.

Note: (Optional) You can also create an escalation plan that lets you further define the escalation path for your contacts. You can learn more in Create an escalation plan.

To create a response plan

Open the Incident Manager console, and choose Response plans in the left navigation pane.
Choose Create response plan.
Enter a unique and identifiable name for your response plan.
Enter an incident title. The incident title helps to identify an incident on the incidents home page.
Select an appropriate Impact based on the potential scope of the incident.

Figure 3: Selecting your impact level
(Optional) Choose a chat channel for the incident responders to interact in during an incident. For more information about chat channels, see Chat channels.
(Optional) For Engagement, you can choose any number of contacts and escalation plans. For this solution, select the security team responder that you created earlier as one of your contacts.

Figure 4: Adding engagements
(Optional) You can also create a runbook that can drive the incident mitigation and response. For further information, refer to Runbooks and automation.
Under Execution permissions, choose Create an IAM role using a template. Under Role name, select the IAM role you created in the prerequisites that allows Incident Manager to run SSM automation documents, and then choose Create response plan.

Monitor AWS account root activity

When you first create an AWS account, you begin with a single sign-in identity that has complete access to all AWS services and resources in the account. This identity is called the root user and is accessed by signing in with the email address and password that you used to create the account.

An AWS account root user has full access to all your resources for all AWS services, including billing information. It is critical to prevent root user access from unauthorized use and to be aware whenever root user activity occurs in your AWS account. For more information about AWS recommendations, see Security best practices in IAM.

To be certain that all root user activity is authorized and expected, it’s important to monitor root API calls to a given AWS account and to be notified when root user activity is detected.

Create an EventBridge rule

Create and validate an EventBridge rule to capture AWS account root activity.

To create an EventBridge rule

Open the EventBridge console.
In the navigation pane, choose Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern.

Enter the following event pattern:

{
  "detail-type": [
    "AWS API Call via CloudTrail",
    "AWS Console Sign In via CloudTrail"
  ],
  "detail": {
    "userIdentity": {
      "type": [
        "Root"
      ]
    }
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, choose SecurityEventResponsePlan, which you created when you set up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an existing IAM role, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

Sign in using root credentials.
This console login activity by a root user should invoke the Incident Manager response plan and show an open incident as illustrated below. The respective contact channels that you defined earlier in your Engagement Plan, will be engaged.

Figure 5: Incident Manager open incidents

Watch for GuardDuty high severity findings

GuardDuty is a monitoring service that analyzes AWS CloudTrail management and Amazon S3 data events, Amazon Virtual Private Cloud (Amazon VPC) flow logs, and Amazon Route 53 DNS logs to generate security findings for your account. Once GuardDuty is enabled, it immediately starts monitoring your environment.

GuardDuty integrates with EventBridge, which can be used to send findings data to other applications and services for processing. With EventBridge, you can use GuardDuty findings to invoke automatic responses to your findings by connecting finding events to targets such as Incident Manager response plan.

Create an EventBridge rule

You’ll use an EventBridge rule to capture GuardDuty high severity findings.

To create an EventBridge rule

Open the EventBridge console.
In the navigation pane, select Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern

Enter the following event pattern which will filter on GuardDuty high severity findings

{
  "source": ["aws.guardduty"],
  "detail-type": ["GuardDuty Finding"],
  "detail": {
    "severity": [
      7.0,
      7.1,
      7.2,
      7.3,
      7.4,
      7.5,
      7.6,
      7.7,
      7.8,
      7.9,
      8,
      8.0,
      8.1,
      8.2,
      8.3,
      8.4,
      8.5,
      8.6,
      8.7,
      8.8,
      8.9
    ]
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, select SecurityEventResponsePlan, which you created when you set up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an IAM role that you created before, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

To test and validate whether the above rule is now functional, you can generate sample findings within the GuardDuty console.

Open the GuardDuty console.
In the navigation pane, choose Settings.
On the Settings page, under Sample findings, choose Generate sample findings.
In the navigation pane, choose Findings. The sample findings are displayed on the Current findings page with the prefix [SAMPLE].

Once you have generated sample findings, your Incident Manager response plan will be invoked almost immediately and the engagement plan with your contacts will begin.

You can select an open incident in the Incident Manager console to see additional details from the GuardDuty finding. Figure 6 shows a high severity finding.

Figure 6: Incident Manager open incident for GuardDuty high severity finding

Monitor S3 bucket settings for public access

AWS Config enables continuous monitoring of your AWS resources, making it easier to assess, audit, and record resource configurations and changes. AWS Config does this through rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking that your Amazon Elastic Block Store (Amazon EBS) volumes are encrypted, your resources are tagged appropriately, and multi-factor authentication (MFA) is enabled for root accounts.

Set up AWS Config and EventBridge

You will use AWS Config to monitor your S3 bucket ACLs and policies for violations which could allow public read or public write access. If AWS Config finds a policy violation, it will initiate an AWS EventBridge rule to invoke your Incident Manager response plan.

To create the AWS Config rule to capture S3 bucket public access

Sign in to the AWS Config console.
If this is your first time in the AWS Config console, refer to the Getting Started guide for more information.
Select Rules from the menu and choose Add Rule.
On the AWS Config rules page, enter S3 in the search box and select the s3-bucket-public-read-prohibited and s3-bucket-public-write-prohibited rules, and then choose Next.

Figure 7: AWS Config rules
Leave the Configure rules page as default and select Next.
On the Review page, select Add Rule. AWS Config is now analyzing your S3 buckets, capturing their current configurations, and evaluating the configurations against the rules you selected.

To create the EventBridge rule

Open the Amazon EventBridge console
In the navigation pane, choose Rules, and then choose Create rule.
Enter a name and description for the rule.
For Define pattern, choose Event pattern.
Choose Custom pattern

Enter the following event pattern, which will filter on AWS Config rule s3-bucket-public-write-prohibited being non-compliant.

{
  "source": ["aws.config"],
  "detail-type": ["Config Rules Compliance Change"],
  "detail": {
    "messageType": ["ComplianceChangeNotification"],
    "configRuleName": ["s3-bucket-public-write-prohibited", ""],
    "newEvaluationResult": {
      "complianceType": [
        "NON_COMPLIANT"
      ]
    }
  }
}

For Select targets, choose Incident Manager response plan.
For Response plan, choose SecurityEventResponsePlan, which you created earlier when setting up Incident Manager.
To create an IAM role automatically, choose Create a new role for this specific resource. To use an existing IAM role, choose Use existing role.
(Optional) Enter one or more tags for the rule.
Choose Create.

To validate the rule

Create a compliant test S3 bucket with no public read or write access through either an ACL or a policy.
Change the ACL of the bucket to allow public listing of objects so that the bucket is non-compliant.

Figure 8: Amazon S3 console
After a few minutes, you should see the AWS Config rule initiated which invokes the EventBridge rule and therefore your Incident Manager response plan.

Summary

In this post, I showed you how to use Incident Manager to monitor for security events and invoke a response plan via Amazon CloudWatch or Amazon EventBridge. AWS CloudTrail API activity (for a root account login), Amazon GuardDuty (for high severity findings), and AWS Config (to enforce policies like preventing public write access to an S3 bucket). I demonstrated how you can create an incident management and response plan to ensure you have used the power of cloud to create automations that respond to and mitigate security incidents in a timely manner. To learn more about Incident Manager, see What Is AWS Systems Manager Incident Manager in the AWS documentation.

If you have feedback about this post, submit comments in the comments section below. If you have questions about this post, start a new thread on the Systems Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How US federal agencies can use AWS to encrypt data at rest and in transit

2021-09-10 Robert George

Post Syndicated from Robert George original https://aws.amazon.com/blogs/security/how-us-federal-agencies-can-use-aws-to-encrypt-data-at-rest-and-in-transit/

This post is part of a series about how Amazon Web Services (AWS) can help your US federal agency meet the requirements of the President’s Executive Order on Improving the Nation’s Cybersecurity. You will learn how you can use AWS information security practices to meet the requirement to encrypt your data at rest and in transit, to the maximum extent possible.

Encrypt your data at rest in AWS

Data at rest represents any data that you persist in non-volatile storage for any duration in your workload. This includes block storage, object storage, databases, archives, IoT devices, and any other storage medium on which data is persisted. Protecting your data at rest reduces the risk of unauthorized access when encryption and appropriate access controls are implemented.

AWS KMS provides a streamlined way to manage keys used for at-rest encryption. It integrates with AWS services to simplify using your keys to encrypt data across your AWS workloads. It uses hardware security modules that have been validated under FIPS 140-2 to protect your keys. You choose the level of access control that you need, including the ability to share encrypted resources between accounts and services. AWS KMS logs key usage to AWS CloudTrail to provide an independent view of who accessed encrypted data, including AWS services that are using keys on your behalf. As of this writing, AWS KMS integrates with 81 different AWS services. Here are details on recommended encryption for workloads using some key services:

Encrypt an instance store root volume for an Amazon EC2 instance for added protection of configuration files and data stored with the operating system
Encrypt AWS Fargate ephemeral storage for added protection of data stored and processed by container applications
Enable encryption for an Amazon EBS volume to further protect data stored and processed by the attached Amazon EC2 instance
Enable at-rest encryption for Amazon EFS volumes to further protect data stored in network filesystems (NFS)
Use server-side or client-side encryption for data stored in Amazon S3 volumes
Encrypt Amazon RDS database instances
Encrypt at-rest data for Amazon DynamoDB tables
Encrypt at-rest data stored in an Amazon Redshift data warehouse
Protect configuration parameters with AWS Systems Manager Parameter Store

You can use AWS KMS to encrypt other data types including application data with client-side encryption. A client-side application or JavaScript encrypts data before uploading it to S3 or other storage resources. As a result, uploaded data is protected in transit and at rest. Customer options for client-side encryption include the AWS SDK for KMS, the AWS Encryption SDK, and use of third-party encryption tools.

You can also use AWS Secrets Manager to encrypt application passwords, connection strings, and other secrets. Database credentials, resource names, and other sensitive data used in AWS Lambda functions can be encrypted and accessed at run time. This increases the security of these secrets and allows for easier credential rotation.

KMS HSMs are FIPS 140-2 validated and accessible using FIPS validated endpoints. Agencies with additional requirements that require a FIPS 140-3 validated hardware security module (HSM) (for example, for securing third-party secrets managers) can use AWS CloudHSM.

For more information about AWS KMS and key management best practices, visit these resources:

Encrypt your data in transit in AWS

In addition to encrypting data at rest, agencies must also encrypt data in transit. AWS provides a variety of solutions to help agencies encrypt data in transit and enforce this requirement.

First, all network traffic between AWS data centers is transparently encrypted at the physical layer. This data-link layer encryption includes traffic within an AWS Region as well as between Regions. Additionally, all traffic within a virtual private cloud (VPC) and between peered VPCs is transparently encrypted at the network layer when you are using supported Amazon EC2 instance types. Customers can choose to enable Transport Layer Security (TLS) for the applications they build on AWS using a variety of services. All AWS service endpoints support TLS to create a secure HTTPS connection to make API requests.

AWS offers several options for agency-managed infrastructure within the AWS Cloud that needs to terminate TLS. These options include load balancing services (for example, Elastic Load Balancing, Network Load Balancer, and Application Load Balancer), Amazon CloudFront (a content delivery network), and Amazon API Gateway. Each of these endpoint services enable customers to upload their digital certificates for the TLS connection. Digital certificates then need to be managed appropriately to account for expiration and rotation requirements. AWS Certificate Manager (ACM) simplifies generating, distributing, and rotating digital certificates. ACM offers publicly trusted certificates that can be used in AWS services that require certificates to terminate TLS connections to the internet. ACM also provides the ability to create a private certificate authority (CA) hierarchy that can integrate with existing on-premises CAs to automatically generate, distribute, and rotate certificates to secure internal communication among customer-managed infrastructure.

Finally, you can encrypt communications between your EC2 instances and other AWS resources that are connected to your VPC, such as Amazon Relational Database Service (Amazon RDS) databases, Amazon Elastic File System (Amazon EFS) file systems, Amazon S3, Amazon DynamoDB, Amazon Redshift, Amazon EMR, Amazon OpenSearch Service, Amazon ElasticCache for Redis, Amazon FSx Windows File Server, AWS Direct Connect (DX) MACsec, and more.

Conclusion

This post has reviewed services that are used to encrypt data at rest and in transit, following the Executive Order on Improving the Nation’s Cybersecurity. I discussed the use of AWS KMS to manage encryption keys that handle the management of keys for at-rest encryption, as well as the use of ACM to manage certificates that protect data in transit.

Next steps

To learn more about how AWS can help you meet the requirements of the executive order, see the other posts in this series:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How US federal agencies can authenticate to AWS with multi-factor authentication

2021-09-02 Kyle Hart

Post Syndicated from Kyle Hart original https://aws.amazon.com/blogs/security/how-us-federal-agencies-can-authenticate-to-aws-with-multi-factor-authentication/

This post is part of a series about how AWS can help your US federal agency meet the requirements of the President’s Executive Order on Improving the Nation’s Cybersecurity. We recognize that government agencies have varying degrees of identity management and cloud maturity and that the requirement to implement multi-factor, risk-based authentication across an entire enterprise is a vast undertaking. This post specifically focuses on how you can use AWS information security practices to help meet the requirement to “establish multi-factor, risk-based authentication and conditional access across the enterprise” as it applies to your AWS environment.

This post focuses on the best-practices for enterprise authentication to AWS – specifically federated access via an existing enterprise identity provider (IdP).

Many federal customers use authentication factors on their Personal Identity Verification (PIV) or Common Access Cards (CAC) to authenticate to an existing enterprise identity service which can support Security Assertion Markup Language (SAML), which is then used to grant user access to AWS. SAML is an industry-standard protocol and most IdPs support a range of authentication methods, so if you’re not using a PIV or CAC, the concepts will still work for your organization’s multi-factor authentication (MFA) requirements.

Accessing AWS with MFA

There are two categories we want to look at for authentication to AWS services:

AWS APIs, which include access through the following:
- The AWS Management Console.
- The AWS Command Line Interface (AWS CLI).
Resources you launch that are running within your AWS VPC, which can include database engines or operating system environments such as Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon WorkSpaces, or Amazon AppStream 2.0.

There is also a third category of services where authentication occurs in AWS that is beyond the scope of this post: applications that you build on AWS that authenticate internal or external end users to those applications. For this category, multi-factor authentication is still important, but will vary based on the specifics of the application architecture. Workloads that sit behind an AWS Application Load Balancer can use the ALB to authenticate users using either Open ID Connect or SAML IdP that enforce MFA upstream.

MFA for the AWS APIs

AWS recommends that you use SAML and an IdP that enforces MFA as your means of granting users access to AWS. Many government customers achieve AWS federated authentication with Active Directory Federation Services (AD FS). The IdP used by our federal government customers should enforce usage of CAC/PIV to achieve MFA and be the sole means of access to AWS.

Federated authentication uses SAML to assume an AWS Identity and Access Management (IAM) role for access to AWS resources. A role doesn’t have standard long-term credentials such as a password or access keys associated with it. Instead, when you assume a role, it provides you with temporary security credentials for your role session.

AWS accounts in all AWS Regions, including AWS GovCloud (US) Regions, have the same authentication options for IAM roles through identity federation with a SAML IdP. The AWS Single Sign-on (SSO) service is another way to implement federated authentication to the AWS APIs in regions where it is available.

MFA for AWS CLI access

In AWS Regions excluding AWS GovCloud (US), you can consider using the AWS CloudShell service, which is an interactive shell environment that runs in your web browser and uses the same authentication pipeline that you use to access the AWS Management Console—thus inheriting MFA enforcement from your SAML IdP.

If you need to use federated authentication with MFA for the CLI on your own workstation, you’ll need to retrieve and present the SAML assertion token. For information about how you can do this in Windows environments, see the blog post How to Set Up Federated API Access to AWS by Using Windows PowerShell. For information about how to do this with Python, see How to Implement a General Solution for Federated API/CLI Access Using SAML 2.0.

Conditional access

IAM permissions policies support conditional access. Common use cases include allowing certain actions only from a specified, trusted range of IP addresses; granting access only to specified AWS Regions; and granting access only to resources with specific tags. You should create your IAM policies to provide least-privilege access across a number of attributes. For example, you can grant an administrator access to launch or terminate an EC2 instance only if the request originates from a certain IP address and is tagged with an appropriate tag.

You can also implement conditional access controls using SAML session tags provided by their IdP and passed through the SAML assertion to be consumed by AWS. This means two separate users from separate departments can assume the same IAM role but have tailored, dynamic permissions. As an example, the SAML IdP can provide each individual’s cost center as a session tag on the role assertion. IAM policy statements can be written to allow the user from cost center A the ability to administer resources from cost center A, but not resources from cost center B.

Many customers ask about how to limit control plane access to certain IP addresses. AWS supports this, but there is an important caveat to highlight. Some AWS services, such as AWS CloudFormation, perform actions on behalf of an authorized user or role, and execute from within the AWS cloud’s own IP address ranges. See this document for an example of a policy statement using the aws:ViaAWSService condition key to exclude AWS services from your IP address restrictions to avoid unexpected authorization failures.

Multi-factor authentication to resources you launch

You can launch resources such as Amazon WorkSpaces, AppStream 2.0, Redshift, and EC2 instances that you configure to require MFA. The Amazon WorkSpaces Streaming Protocol (WSP) supports CAC/PIV authentication for pre-authentication, and in-session access to the smart card. For more information, see Use smart cards for authentication. To see a short video of it in action, see the blog post Amazon WorkSpaces supports CAC/PIV smart card authentication. Redshift and AppStream 2.0 support SAML 2.0 natively, so you can configure those services to work with your SAML IdP similarly to how you configure AWS Console access and inherit the MFA enforced by the upstream IdP.

MFA access to EC2 instances can occur via the existing methods and enterprise directories used in your on-premises environments. You can, of course, implement other systems that enforce MFA access to an operating system such as RADIUS or other third-party directory or MFA token solutions.

Shell access with Systems Manager Session Manager

An alternative method for MFA for shell access to EC2 instances is to use the Session Manager feature of AWS Systems Manager. Session Manager uses the Systems Manager management agent to provide role-based access to a shell (PowerShell on Windows) on an instance. Users can access Session Manager from the AWS Console or from the command line with the Session Manager AWS CLI plugin. Similar to using CloudShell for CLI access, accessing EC2 hosts via Session Manager uses the same authentication pipeline you use for accessing the AWS control plane. Your interactive session on that host can be configured for audit logging.

Security best practices in IAM

The focus of this blog is on integrating an agency’s existing MFA-enabled enterprise authentication service; but to make it easier for you to view the entire security picture, you might be interested in IAM security best practices. You can enforce these best-practice security configurations with AWS Organizations Service Control Policies.

Conclusion

This post covered methods your federal agency should consider in your efforts to apply the multi-factor authentication (MFA) requirements in the Executive Order on Improving the Nation’s Cybersecurity to your AWS environment. To learn more about how AWS can help you meet the requirements of the executive order, see the other posts in this series:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Top 10 security best practices for securing data in Amazon S3

2021-09-02 Megan O'Neil

Post Syndicated from Megan O'Neil original https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-data-in-amazon-s3/

With more than 100 trillion objects in Amazon Simple Storage Service (Amazon S3) and an almost unimaginably broad set of use cases, securing data stored in Amazon S3 is important for every organization. So, we’ve curated the top 10 controls for securing your data in S3. By default, all S3 buckets are private and can be accessed only by users who are explicitly granted access through ACLs, S3 bucket policies, and identity-based policies. In this post, we review the latest S3 features and Amazon Web Services (AWS) services that you can use to help secure your data in S3, including organization-wide preventative controls such as AWS Organizations service control policies (SCPs). We also provide recommendations for S3 detective controls, such as Amazon GuardDuty for S3, AWS CloudTrail object-level logging, AWS Security Hub S3 controls, and CloudTrail configuration specific to S3 data events. In addition, we provide data protection options and considerations for encrypting data in S3. Finally, we review backup and recovery recommendations for data stored in S3. Given the broad set of use cases that S3 supports, you should determine the priority of controls applied in accordance with your specific use case and associated details.

Block public S3 buckets at the organization level

Designate AWS accounts for public S3 use and prevent all other S3 buckets from inadvertently becoming public by enabling S3 Block Public Access. Use Organizations SCPs to confirm that the S3 Block Public Access setting cannot be changed. S3 Block Public Access provides a level of protection that works at the account level and also on individual buckets, including those that you create in the future. You have the ability to block existing public access—whether it was specified by an ACL or a policy—and to establish that public access isn’t granted to newly created items. This allows only designated AWS accounts to have public S3 buckets while blocking all other AWS accounts. To learn more about Organizations SCPs, see Service control policies.

Use bucket policies to verify all access granted is restricted and specific

Check that the access granted in the Amazon S3 bucket policy is restricted to specific AWS principals, federated users, service principals, IP addresses, or VPCs that you provide. A bucket policy that allows a wildcard identity such as Principal “*” can potentially be accessed by anyone. A bucket policy that allows a wildcard action “*” can potentially allow a user to perform any action in the bucket. For more information, see Using bucket policies.

Ensure that any identity-based policies don’t use wildcard actions

Identity policies are policies assigned to AWS Identity and Access Management (IAM) users and roles and should follow the principle of least privilege to help prevent inadvertent access or changes to resources. Establishing least privilege identity policies includes defining specific actions such as S3:GetObject or S3:PutObject instead of S3:*. In addition, you can use predefined AWS-wide condition keys and S3‐specific condition keys to specify additional controls on specific actions. An example of an AWS-wide condition key commonly used for S3 is IpAddress: { aws:SourceIP: “10.10.10.10”}, where you can specify your organization’s internal IP space for specific actions in S3. See IAM.1 in Monitor S3 using Security Hub and CloudWatch Logs for detecting policies with wildcard actions and wildcard resources are present in your accounts with Security Hub.

Consider splitting read, write, and delete access. Allow only write access to users or services that generate and write data to S3 but don’t need to read or delete objects. Define an S3 lifecycle policy to remove objects on a schedule instead of through manual intervention— see Managing your storage lifecycle. This allows you to remove delete actions from your identity-based policies. Verify your policies with the IAM policy simulator. Use IAM Access Analyzer to help you identify, review, and design S3 bucket policies or IAM policies that grant access to your S3 resources from outside of your AWS account.

Enable S3 protection in GuardDuty to detect suspicious activities

In 2020, GuardDuty announced coverage for S3. Turning this on enables GuardDuty to continuously monitor and profile S3 data access events (data plane operations) and S3 configuration (control plane APIs) to detect suspicious activities. Activities such as requests coming from unusual geolocations, disabling of preventative controls, and API call patterns consistent with an attempt to discover misconfigured bucket permissions. To achieve this, GuardDuty uses a combination of anomaly detection, machine learning, and continuously updated threat intelligence. To learn more, including how to enable GuardDuty for S3, see Amazon S3 protection in Amazon GuardDuty.

Use Macie to scan for sensitive data outside of designated areas

In May of 2020, AWS re-launched Amazon Macie. Macie is a fully managed service that helps you discover and protect your sensitive data by using machine learning to automatically review and classify your data in S3. Enabling Macie organization wide is a straightforward and cost-efficient method for you to get a central, continuously updated view of your entire organization’s S3 environment and monitor your adherence to security best practices through a central console. Macie continually evaluates all buckets for encryption and access control, alerting you of buckets that are public, unencrypted, or shared or replicated outside of your organization. Macie evaluates sensitive data using a fully-managed list of common sensitive data types and custom data types you create, and then issues findings for any object where sensitive data is found.

Encrypt your data in S3

There are four options for encrypting data in S3, including client-side and server-side options. With server-side encryption, S3 encrypts your data at the object level as it writes it to disks in AWS data centers and decrypts it when you access it. As long as you authenticate your request and you have access permissions, there is no difference in the way you access encrypted or unencrypted objects.

The first two options use AWS Key Management Service (AWS KMS). AWS KMS lets you create and manage cryptographic keys and control their use across a wide range of AWS services and their applications. There are options for managing which encryption key AWS uses to encrypt your S3 data.

Server-side encryption with Amazon S3-managed encryption keys (SSE-S3). When you use SSE-S3, each object is encrypted with a unique key that’s managed by AWS. This option enables you to encrypt your data by checking a box with no additional steps. The encryption and decryption are handled for you transparently. SSE-S3 is a convenient and cost-effective option.
Server-side encryption with customer master keys (CMKs) stored in AWS KMS (SSE-KMS), is similar to SSE-S3, but with some additional benefits and costs compared to SSE-S3. There are separate permissions for the use of a CMK that provide added protection against unauthorized access of your objects in S3. SSE-KMS also provides you with an audit trail that shows when your CMK was used and by whom. SSE-KMS gives you control of the key access policy, which might provide you with more granular control depending on your use case.
In server-side encryption with customer-provided keys (SSE-C), you manage the encryption keys and S3 manages the encryption as it writes to disks and decryption when you access your objects. This option is useful if you need to provide and manage your own encryption keys. Keep in mind that you are responsible for the creation, storage, and tracking of the keys used to encrypt each object and AWS has no ability to recover customer-provided keys if they’re lost. The major thing to account for with SSE-C is that you must provide the customer-managed key every-time you PUT or GET an object.
Client-side encryption is another option to encrypt your data in S3. You can use a CMK stored in AWS KMS or use a master key that you store within your application. Client-side encryption means that you encrypt the data before you send it to AWS and that you decrypt it after you retrieve it from AWS. AWS doesn’t manage your keys and isn’t responsible for encryption or decryption. Usually, client-side encryption needs to be deeply embedded into your application to work.

Protect data in S3 from accidental deletion using S3 Versioning and S3 Object Lock

Amazon S3 is designed for durability of 99.999999999 percent of objects across multiple Availability Zones, is resilient against events that impact an entire zone, and designed for 99.99 percent availability over a given year. In many cases, when it comes to strategies to back up your data in S3, it’s about protecting buckets and objects from accidental deletion, in which case S3 Versioning can be used to preserve, retrieve, and restore every version of every object stored in your buckets. S3 Versioning lets you keep multiple versions of an object in the same bucket and can help you recover objects from accidental deletion or overwrite. Keep in mind this feature has costs associated. You may consider S3 Versioning in selective scenarios such as S3 buckets that store critical backup data or sensitive data.

With S3 Versioning enabled on your S3 buckets, you can optionally add another layer of security by configuring a bucket to enable multi-factor authentication (MFA) delete. With this configuration, the bucket owner must include two forms of authentication in any request to delete a version or to change the versioning state of the bucket.

S3 Object Lock is a feature that helps you mitigate data loss by storing objects using a write-once-read-many (WORM) model. By using Object Lock, you can prevent an object from being overwritten or deleted for a fixed time or indefinitely. Keep in mind that there are specific use cases for Object Lock, including scenarios where it is imperative that data is not changed or deleted after it has been written.

Enable logging for S3 using CloudTrail and S3 server access logging

Amazon S3 is integrated with CloudTrail. CloudTrail captures a subset of API calls, including calls from the S3 console and code calls to the S3 APIs. In addition, you can enable CloudTrail data events for all your buckets or for a list of specific buckets. Keep in mind that a very active S3 bucket can generate a large amount of log data and increase CloudTrail costs. If this is concern around cost then consider enabling this additional logging only for S3 buckets with critical data.

Server access logging provides detailed records of the requests that are made to a bucket. Server access logs can assist you in security and access audits.

Backup your data in S3

Although S3 stores your data across multiple geographically diverse Availability Zones by default, your compliance requirements might dictate that you store data at even greater distances. Cross-region replication (CRR) allows you to replicate data between distant AWS Regions to help satisfy these requirements. CRR enables automatic, asynchronous copying of objects across buckets in different AWS Regions. For more information on object replication, see Replicating objects. Keep in mind that this feature has costs associated, you might consider CCR in selective scenarios such as S3 buckets that store critical backup data or sensitive data.

Monitor S3 using Security Hub and CloudWatch Logs

Security Hub provides you with a comprehensive view of your security state in AWS and helps you check your environment against security industry standards and best practices. Security Hub collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues.

The AWS Foundational Security Best Practices standard is a set of controls that detect when your deployed accounts and resources deviate from security best practices, and provides clear remediation steps. The controls contain best practices from across multiple AWS services, including S3. We recommend you enable the AWS Foundational Security Best Practices as it includes the following detective controls for S3 and IAM:

IAM.1: IAM policies should not allow full “*” administrative privileges.
S3.1: Block Public Access setting should be enabled
S3.2: S3 buckets should prohibit public read access
S3.3: S3 buckets should prohibit public write access
S3.4: S3 buckets should have server-side encryption enabled
S3.5: S3 buckets should require requests to use Secure Socket layer
S3.6: Amazon S3 permissions granted to other AWS accounts in bucket policies should be restricted
S3.8: S3 Block Public Access setting should be enabled at the bucket level

For details of each control, including remediation steps, please review the AWS Foundational Security Best Practices controls.

If there is a specific S3 API activity not covered above that you’d like to be alerted on, you can use CloudTrail Logs together with Amazon CloudWatch for S3 to do so. CloudTrail integration with CloudWatch Logs delivers S3 bucket-level API activity captured by CloudTrail to a CloudWatch log stream in the CloudWatch log group that you specify. You create CloudWatch alarms for monitoring specific API activity and receive email notifications when the specific API activity occurs.

Conclusion

By using the ten practices described in this blog post, you can build strong protection mechanisms for your data in Amazon S3, including least privilege access, encryption of data at rest, blocking public access, logging, monitoring, and configuration checks.

Depending on your use case, you should consider additional protection mechanisms. For example, there are security-related controls available for large shared datasets in S3 such as Access Points, which you can use to decompose one large bucket policy into separate, discrete access point policies for each application that needs to access the shared data set. To learn more about S3 security, see Amazon S3 Security documentation.

Now that you’ve reviewed the top 10 security best practices to make your data in S3 more secure, make sure you have these controls set up in your AWS accounts—and go build securely!

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon S3 forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How to improve visibility into AWS WAF with anomaly detection

2021-09-02 Cyril Soler

Post Syndicated from Cyril Soler original https://aws.amazon.com/blogs/security/how-to-improve-visibility-into-aws-waf-with-anomaly-detection/

When your APIs are exposed on the internet, they naturally face unpredictable traffic. AWS WAF helps protect your application’s API against common web exploits, such as SQL injection and cross-site scripting. In this blog post, you’ll learn how to automatically detect anomalies in the AWS WAF metrics to improve your visibility into AWS WAF activity, identify malicious activity, and simplify your investigations. The service that this solution uses to detect anomalies is Amazon Lookout for Metrics.

Lookout for Metrics is a service you can use to monitor business or operational metrics such as successful or failed HTTP requests and detect anomalies by using machine learning (ML). You can configure Lookout for Metrics to monitor different data sources that contain AWS WAF metrics, including Amazon CloudWatch. Lookout for Metrics can also take actions such as publishing findings in AWS Security Hub.

Solution overview

The solution in this blog post uses Amazon API Gateway to serve a simple REST API. AWS WAF protects API Gateway with AWS Managed Rules for AWS WAF. Amazon Lookout for Metrics actively detects unusual patterns in AWS WAF rule actions and sends a finding to Security Hub when suspicious activity is detected. Figure 1 shows the solution architecture.

Because AWS WAF integrates with Application Load Balancer, Amazon CloudFront distributions, or AWS AppSync GraphQL APIs, this solution also applies to these services.

Figure 1: Solution architecture

The workflow of the solution is as follows:

An HTTP request reaches the API Gateway endpoint.
AWS WAF analyzes the HTTP request using the configured rules.
Amazon CloudWatch collects action metrics for each rule that is configured in AWS WAF.
Amazon Lookout for Metrics monitors CloudWatch metrics, selects the best ML algorithm, and trains the ML model.
Lookout for Metrics detects outliers and provides a severity score to diagnose the issue.
Lookout for Metrics invokes an AWS Lambda function when an anomaly is detected.
The Lambda function sends a finding to Security Hub for further analysis.

Let’s take a detailed look at the AWS services that you will use in this solution.

Amazon API Gateway

Amazon API Gateway is a serverless API management service that supports mock integrations for API methods. This is the easiest and the most cost-effective way to implement this solution. But you can also use Amazon CloudFront, AWS AppSync GraphQL API, and Application Load Balancer to implement this solution in your workload.

AWS WAF

AWS WAF is a web application firewall you can associate with API Gateway for REST APIs, Amazon CloudFront, AWS AppSync for GraphQL API, or Application Load Balancer. AWS WAF is integrated with other AWS services such as CloudWatch. AWS WAF uses rules to detect common web exploits in the incoming HTTP requests. You can configure your own rules, or use managed rulesets from AWS or from a third-party vendor. In this solution, you use AWS Managed Rules, which contains the CrossSiteScripting_QUERYARGUMENTS rule.

Amazon CloudWatch

Amazon CloudWatch is a monitoring and observability service. CloudWatch receives specific metrics from AWS WAF every 5 minutes. In particular, for each AWS WAF rule, CloudWatch provides PassedRequests, BlockedRequests, and CountedRequests metrics.

Amazon Lookout for Metrics

Amazon Lookout for Metrics uses machine learning (ML) algorithms to automatically detect and diagnose anomalies in your metrics. By using CloudWatch metrics as a data source for Lookout for Metrics, you can apply one of the Lookout for Metrics ML models to detect anomalies in a faster way. In addition, you can provide feedback on detected anomalies to help improve the model accuracy over time. Lookout for Metrics is available in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm) AWS Regions.

AWS Lambda

In this solution, you use an AWS Lambda function as an alert mechanism for Lookout for Metrics. When the machine learning model detects an outlier, it invokes the Lambda function, which implements a custom code. The Lambda function then imports the anomaly as a finding to Security Hub.

AWS Security Hub

In this solution, you use AWS Security Hub as a centralized way to manage security findings. This integration has the advantage of providing a common place for the security team to diagnose security findings from various sources, and uniformly integrates with your existing Security Information and Event Management (SIEM) system.

Prerequisites

This solution uses Security Hub to collect anomaly detection findings. Before you deploy the solution, you need to enable Security Hub in your AWS account by following the instructions provided in to enable Security Hub manually. After you enable Security Hub, you can optionally select the security standards that are relevant for your workload, as shown in Figure 2.

Figure 2: Manually enabling Security Hub in the AWS Management Console

Deploy the solution

A ready-to-use solution is provided as an AWS Cloud Development Kit (AWS CDK) application in the AWS WAF Anomaly Detection CDK project GitHub code repository. You can clone the GitHub repository and deploy the application by using the AWS CDK for Python.

Important: After you successfully deploy the solution, you should activate the Lookout for Metrics detector. This is not done as part of the CDK deployment. To activate the detector, in the AWS Management Console navigate to Amazon Lookout for Metrics, select the detector the solution created (WAFBlockingRequestDetector), and choose Activate. Alternatively, you can use the following AWS command to activate your detector.
aws lookoutmetrics activate-anomaly-detector --anomaly-detector-arn arn:aws:lookoutmetrics:<REGION_ID>:<ACCOUNT_ID>:AnomalyDetector:WAFBlockingRequestDetector

If you don’t want to run the CDK application, you can implement the same solution by using the AWS Management Console. In the following sections, I’ll go through the manual steps you can follow to achieve this.

Create an API to demonstrate the solution

First, you need an HTTP endpoint to protect. AWS WAF is integrated with CloudFront, Application Load Balancer, API Gateway, and AWS AppSync GraphQL API. In this blog post, I recommend a REST API Gateway because it’s a fully managed service to create and manage APIs. In addition, API Gateway provides a mechanism to implement mock APIs.

To build a REST API, follow the instructions for creating a REST API in Amazon API Gateway. After you create the API, create a GET method at the API root level and associate it to a mock endpoint, as shown in Figure 3. This is just enough to return an HTTP 200 status code to any GET requests.

Figure 3: Creating an API with mock integration

Finally, deploy the API under the “prod” stage and keep all the default settings.

Create an AWS WAF web ACL to deploy the managed rules

Now that you’ve created an API in API Gateway, you need to create an AWS WAF web access control list (web ACL) by following the instructions in Creating a web ACL. A web ACL is the top-level configuration object of AWS WAF. This is the collection of AWS WAF rules that you will apply to your API. API Gateway is a regional service, so make sure to create a web ACL in the same AWS Region as the API. After you create the web ACL, add the Core rule set (CRS) rule group from AWS Managed Rules, also called AWSManagedRulesCommonRuleSet, as shown in Figure 4. This rule group contains the CrossSiteScripting_QUERYARGUMENTS rule, which you will use later to demonstrate the anomaly detection.

Figure 4: Adding AWSManagedRulesCommonRuleSet to the AWS WAF web ACL

By observing Web ACL rule capacity units used, you can see that the Core rule set is consuming 700 web ACL capacity units (WCUs). The maximum capacity for a web ACL is 1,500, which is sufficient for most use cases. If you need more capacity, contact the AWS Support Center.

Associate the web ACL with the API deployment

After you create the web ACL, you associate it with the API. To do this, in the AWS WAF console, navigate to the web ACL you just created. On the Associated AWS resources tab, choose Add AWS resources. When prompted, choose the API you created earlier, and then choose Add.

Figure 5: Associating the web ACL with the API

Create a Lambda function to forward the anomaly to Security Hub

It’s useful to get visibility into the anomalies that are detected by the solution, and there are various ways to do that. In this solution, you provide such visibility as findings to Security Hub. Security Hub provides a centralized place to manage different findings from your AWS solutions. It also provides graphical tools to help with diagnostics.

You use a Lambda function that receives each anomaly and imports them into Security Hub. You can find the lookout_alarm Lambda function on GitHub, or follow the instructions to build a Lambda function with Python. You will use this Lambda function to provide additional context enrichment in the finding.

import boto3

securityHub = boto3.client('securityhub')

def lambda_handler(event, context):
    # submit the finding to Security Hub
    result = securityHub.batch_import_findings(Findings = [...])

Before you use this Lambda function, make sure you enable Security Hub.

Create the Lookout for Metrics detector, dataset, and alarm

Now you have an API that is protected by an AWS WAF web ACL. You also have configured a way to integrate with Security Hub through a Lambda function. The next step is to create a Lookout for Metrics detector and connect all these elements together. The key concepts and terminology of Lookout for Metrics are:

Detector – A Lookout for Metrics resource that monitors a dataset and identifies anomalies.
Dataset – The detector’s copy of the data that Lookout for Metrics is analyzing.
Alert – A mechanism to send a notification or initiate a processing workflow when the detector finds an anomaly.

First, follow the instructions to create a detector. The only information you need to provide is a name and an interval. The interval is the amount of time between two analyses. Your choice of the interval depends upon criteria such as the metrics you are processing, or the retention time of your data. For more information on the detector interval, see Lookout for Metrics quotas. In the example in Figure 6, I chose an interval of 5 minutes, which is the minimum.

Figure 6: Creating an Amazon Lookout for Metrics detector

After you create the detector, follow the instructions to configure a dataset that uses CloudWatch as a data source. Select Create a role in the service role, choose Next, and enter the following parameters:

For the CloudWatch namespace, choose AWS/WAFV2.
For Dimensions, choose Region, Rule, and WebACL.
For Measure, choose BlockedRequests.
For Aggregation function, choose SUM.

Figure 7 shows the data source fields that the detector will check for anomalies.

Figure 7: Creating an Amazon Lookout for Metrics dataset

Next, create a Lookout for Metrics alert to invoke the Lambda function. To do so, follow the instructions for working with alerts. You provide a name, a channel (the Lambda function), and a severity threshold. One of the main advantages of Lookout for Metrics is the scoring of the detected anomaly, which indicates the severity. Anomalies have a score from 0 to 100. You can set up different alerts with different thresholds that are associated to the same detector. This way, you can provide alerts for different severity levels. In the example in Figure 8, I created a single alert with a severity threshold of 10.

Figure 8: Creating an Amazon Lookout for Metrics alert

The last steps are to activate the detector and configure Lookout for Metrics to select a ML model and train it. To do so, choose Activate on the detector details page.

Figure 9: Activating the Amazon Lookout for Metrics detector

Why does this solution use Lookout for Metrics anomaly detection?

Amazon CloudWatch offers native anomaly detection on a given metric. This function is useful to apply statistical and ML algorithms that continuously analyze metrics, determine normal baselines, and identify anomalies with minimal user intervention.

Lookout for Metrics provides a more sophisticated version of anomaly detection, which makes it the better choice for this solution. Lookout for Metrics automatically supports a collection of ML algorithms. For example, no one algorithm works for all kinds of data, so Lookout for Metrics inspects the data and applies the right ML algorithm to the right data to accurately detect anomalies. In addition, Lookout for Metrics groups concurrent anomalies into logical groups, and sends a single alert for the anomaly group rather than separate alerts, so you can see the full picture. Finally, Lookout for Metrics allows you to provide feedback on the detected anomalies, which AWS uses to continuously improve the accuracy and performance of the models.

Publish the value zero in CloudWatch metrics

The reporting criteria for AWS WAF metrics is a nonzero value. This means that the BlockedRequests metric isn’t updated if AWS WAF isn’t blocking any requests. In the absence of real HTTP traffic, typically in a testing environment, the value zero must be published. In production, because AWS WAF is actively blocking illegitimate requests, this publication is not required. To train the ML model in the absence of blocked requests, you need to publish the value zero by calling the PutMetricData CloudWatch API method every 5 minutes.

In my example, I selected a 5-minute period to be aligned with the Lookout for Metrics interval. It’s possible to publish a zero value every five minutes by using the CloudWatch metrics API, as shown following. The zero value doesn’t impact the SUM and ensures that at least one value is published every five minutes. You can use the cloudwatch_zero Lambda function on GitHub to publish the value zero by using the AWS SDK for Python.

import boto3

cloudwatch = boto3.client('cloudwatch')

def lambda_handler(event, context):

    result = cloudwatch.put_metric_data(
        Namespace='AWS/WAFV2',
        MetricData=[{
                'MetricName': 'BlockedRequests',
                'Dimensions': [...],
                'Value': 0
        }]
    )

To create a CloudWatch Events rule to schedule the call every 5 minutes

Navigate to the CloudWatch Event console and choose Create Rule.
Choose Schedule, keep the 5-minute default interval, and choose Add target.
Select the name of the function you previously created, expand the Configure input section.
Choose Constant (JSON text), as shown in Figure 10. In the text field, paste the following configuration:
```
{"WebACLId":"WebACLForWAFDemo","RuleId":"AWS-AWSManagedRulesCommonRuleSet"}
```
Choose Configure details.
Enter a name for your rule, and then choose Create rule.

Figure 10: Creating a CloudWatch Events rule scheduled every 5 minutes

Training time

Before the activated detector attempts to find anomalies, it uses data from several intervals to learn. If no historical data is available, the training process takes approximately one day for a five-minute interval. When you first deploy this solution, you have no historical data in CloudWatch for your AWS WAF resources, and you’re facing a cold start of Lookout for Metrics anomaly detection. Because the Lookout for Metrics detector interval is set to 5 minutes, you have to wait for 25 hours before being able to detect an anomaly. If you deploy the solution against an AWS WAF resource that’s been in production for days, you’ll have a reduced training time.

Test the anomaly detection

After 25 hours, Lookout for Metrics correctly selects an ML model that fits your metrics behavior, and correctly trains it based on your actual data. You can then start to test the anomaly detection. You can use a simple curl command, injecting a JavaScript alert() call in a query parameter as described in the AWS WAF documentation, to invoke the CrossSiteScripting_QUERYARGUMENTS managed rule. Make sure to inject a significant number of requests to ensure detection of blocked requests anomalies.

for i in {1..150}
do
  curl https://<api_gateway_endpoint>?test=%3Cscript%3Ealert%28%22hello%22%29%3C%2Fscript%3E
done

After you run the injection script, wait for the system to detect the anomaly. The CloudWatch BlockedRequests metric takes up to 5 minutes to update, and Lookout for Metrics is configured to detect anomalies in the CloudWatch data every 5 minutes. For those reasons, it can take 10 minutes to detect the simulated anomaly.

After detection and processing time, the finding is visible in Security Hub. To view the finding, go to the AWS Management Console, choose Services, choose Security Hub, and then choose Findings.

Figure 11: AWS Security Hub findings

In Figure 11, you can see the new finding, coming from Lookout for Metrics, with a Low severity and an anomaly score of 100. You can use the remediation field to open the Lookout for Metrics console, where you can give feedback on the anomaly detection to improve the model for future detections.

Figure 12: Lookout for Metrics console, Finding view

Figure 12 shows the Lookout for Metrics graphical interface, where you can see the metrics related to the finding. The previous injection script impacted only one metric, but the same setup works to observe anomalies that arise between two or more metrics together. This feature makes diagnosis of issues easier.

For each of the impacted metrics, to confirm that the anomaly is relevant, choose the Yes button next to Is this relevant? above the graph.

Extend the solution

The solution in this post detects anomalies in the AWS WAF blocked request behavior. But you can also configure AWS WAF rule actions to count your requests instead of blocking them. This is usually done on legacy systems or for some particular rules of a managed ruleset that present an incompatibility with your workload. When you configure the rule action as a count, you increase the need for a comprehensive observability approach. By implementing anomaly detection against counted requests, this solution will help you to achieve better observability for your system.

Concerning the remediation, it’s possible to modify this solution by integrating it with different AWS services. As an example, you can integrate the anomaly detection with your own SIEM system, or simply notify your security team distribution list by using Amazon Simple Notification Service (Amazon SNS).

AWS WAF provides additional information in its logs, such as the IP address for the client. To detect anomalies in AWS WAF logs, you can ingest the AWS WAF logs to Amazon Simple Storage Service (Amazon S3), and then use Lookout for Metrics with Amazon S3 as a data source.

Conclusion

AWS WAF is integrated with CloudWatch and provides metrics for passed requests, blocked requests, or counted requests. With Lookout for Metrics, you can detect unexpected behavior in CloudWatch metrics by using a machine learning (ML) model. In this blog, I showed you how to integrate both services to provide AWS WAF with an ML-based anomaly detection mechanism. ML is a way to gain more visibility into your AWS WAF behavior. In addition, you can easily be notified when the system detects abnormal levels of blocked (or counted) requests, in order to take the right remediation action.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS WAF forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.