Automated security and compliance remediation at HDI

Post Syndicated from Uladzimir Palkhouski original https://aws.amazon.com/blogs/devops/automated-security-and-compliance-remediation-at-hdi/

with Dr. Malte Polley (HDI Systeme AG – Cloud Solutions Architect)

At HDI, one of the biggest European insurance group companies, we use AWS to build new services and capabilities and delight our customers. Working in the financial services industry, the company has to comply with numerous regulatory requirements in the areas of data protection and FSI regulations such as GDPR, German Supervisory Requirements for IT (VAIT) and Supervision of Insurance Undertakings (VAG). The same security and compliance assessment process in the cloud supports development productivity and organizational agility, and helps our teams innovate at a high pace and meet the growing demands of our internal and external customers.

In this post, we explore how HDI adopted AWS security and compliance best practices. We describe implementation of automated security and compliance monitoring of AWS resources using a combination of AWS and open-source solutions. We also go through the steps to implement automated security findings remediation and address continuous deployment of new security controls.

Background

Data analytics is the key capability for understanding our customers’ needs, driving business operations improvement, and developing new services, products, and capabilities for our customers. We needed a cloud-native data platform of virtually unlimited scale that offers descriptive and prescriptive analytics capabilities to internal teams with a high innovation pace and short experimentation cycles. One of the success metrics in our mission is time to market, therefore it’s important to provide flexibility to internal teams to quickly experiment with new use cases. At the same time, we’re vigilant about data privacy. Having a secure and compliant cloud environment is a prerequisite for every new experiment and use case on our data platform.

Cloud security and compliance implementation in the cloud is a shared effort between the Cloud Center of Competence team (C3), the Network Operation Center (NoC), and the product and platform teams. The C3 team is responsible for new AWS account provisioning, account security, and compliance baseline setup. Cross-account networking configuration is established and managed by the NoC team. Product teams are responsible for AWS services configuration to meet their requirements in the most efficient way. Typically, they deploy and configure infrastructure and application stacks, including the following:

Network configuration – Amazon Virtual Private Cloud (Amazon VPC) subnets and routing
Object storage setup – Amazon Simple Storage Service (Amazon S3) buckets and bucket policies
Data encryption at rest configuration – Management of AWS Key Management Service (AWS KMS) customer master keys (CMKs) and key policies
Managed services configuration – AWS Glue jobs, AWS Cloud9 environments, and others

We were looking for security controls model that would allow us to continuously monitor infrastructure and application components set up by all the teams. The model also needed to support guardrails that allowed product teams to focus on new use case implementation, but also inherited the security and compliance best practices promoted and ensured within our company.

Security and compliance baseline definition

We started with the AWS Well-Architected Framework Security Pillar whitepaper, which provides implementation guidance on the essential areas of security and compliance in the cloud, including identity and access management, infrastructure security, data protection, detection, and incident response. Although all five elements are equally important for implementing enterprise-grade security and compliance in the cloud, we saw an opportunity to improve controls of on-premises environments by automating detection and incident response elements. The continuous monitoring of AWS infrastructure and application changes complemented by the automated incident response of the security baseline helps us foster security best practices and allows for a high innovation pace. Manual security reviews are no longer required to asses security posture.

Our security and compliance controls framework is based on GDPR and several standards and programs, including ISO 27001, C5. Translation of the controls framework into the security and compliance baseline definition in the cloud isn’t always straightforward, so we use a number of guidelines. As a starting point, we use CIS Amazon Web Services benchmarks, because it’s a prescriptive recommendation and its controls cover multiple AWS security areas, including identity and access management, logging and monitoring configuration, and network configuration. CIS benchmarks are industry-recognized cyber security best practices and recommendations that cover a wide range of technology families, and are used by enterprise organizations around the world. We also apply GDPR compliance on AWS recommendations and AWS Foundational Security Best Practices, extending controls recommended by CIS AWS Foundations Benchmarks in multiple control areas: inventory, logging, data protection, access management, and more.

Security controls implementation

AWS provides multiple services that help implement security and compliance controls:

AWS CloudTrail provides a history of events in an AWS account, including those originating from command line tools, AWS SDKs, AWS APIs, or the AWS Management Console. In addition, it allows exporting event history for further analysis and subscribing to specific events to implement automated remediation.
AWS Config allows you to monitor AWS resource configuration, and automatically evaluate and remediate incidents related to unexpected resources configuration. AWS Config comes with pre-built conformance pack sample templates designed to help you meet operational best practices and compliance standards.
Amazon GuardDuty provides threat detection capabilities that continuously monitor network activity, data access patterns, and account behavior.

With multiple AWS services to use as building blocks for continuous monitoring and automation, there is a strong need for a consolidated findings overview and unified remediation framework. This is where AWS Security Hub comes into play. Security Hub provides built-in security standards and controls that make it easy to enable foundational security controls. Then, Security Hub integrates with CloudTrail, AWS Config, GuardDuty, and other AWS services out of the box, which eliminates the need to develop and maintain integration code. Security Hub also accepts findings from third-party partner products and provides APIs for custom product integration. Security Hub significantly reduces the effort to consolidate audit information coming from multiple AWS-native and third-party channels. Its API and supported partner products ecosystem gave us confidence that we can adhere to changes in security and compliance standards with low effort.

While AWS provides a rich set of services to manage risk at the Three Lines Model, we were looking for wider community support in maintaining and extending security controls beyond those defined by CIS benchmarks and compliance and best practices recommendations on AWS. We came across Prowler, an open-source tool focusing on AWS security assessment and auditing and infrastructure hardening. Prowler implements CIS AWS benchmark controls and has over 100 additional checks. We appreciated Prowler providing checks that helped us meet GDPR and ISO 27001 requirements, specifically. Prowler delivers assessment reports in multiple formats, which makes it easy to implement reporting archival for future auditing needs. In addition, Prowler integrates well with Security Hub, which allows us to use a single service for consolidating security and compliance incidents across a number of channels.

We came up with the solution architecture depicted in the following diagram.

Automated remediation solution architecture HDI

Let’s look closely into the most critical components of this solution.

Prowler is a command line tool that uses the AWS Command Line Interface (AWS CLI) and a bash script. Individual Prowler checks are bash scripts organized into groups by compliance standard or AWS service. By supplying corresponding command line arguments, we can run Prowler against a specific AWS Region or multiple Regions at the same time. We can run Prowler in multiple ways; we chose to run it as an AWS Fargate task for Amazon Elastic Container Service (Amazon ECS). Fargate is a serverless compute engine that runs Docker-compatible containers. ECS Fargate tasks are scheduled tasks that make it easy to perform periodic assessments of an AWS account and export findings. We configured Prowler to run every 7 days in every account and Region it’s deployed into.

Security Hub acts as a single place for consolidating security findings from multiple sources. When Security Hub is enabled in a given Region, CIS AWS Foundations Benchmark and Foundational Security Best Practices standards are enabled as well. Enabling these standards also configures integration with AWS Config and Guard Duty. Integration with Prowler requires enabling product integration on the Security Hub side by calling the EnableImportFindingsForProduct API action for a given product. Because Prowler supports integration with Security Hub out of the box, posting security findings is a matter of passing the right command line arguments: -M json-asff to format reports as AWS Security Findings Format and -S to ship findings to Security Hub.

Automated security findings remediation is implemented using AWS Lambda functions and the AWS SDK for Python (Boto3). The remediation function can be triggered in two ways: automatically in response to a new security finding, or by a security engineer from the Security Hub findings page. In both cases, the same Lambda function is used. Remediation functions implement security standards in accordance with recommendations, whether they’re CIS AWS Foundations Benchmark and Foundational Security Best Practices standards, or others.

The exact activities performed depend on the security findings type and its severity. Examples of activities performed include deleting non-rotated AWS Identity and Access Management (IAM) access keys, enabling server-side encryption for S3 buckets, and deleting unencrypted Amazon Elastic Block Store (Amazon EBS) volumes.

To trigger the Lambda function, we use Amazon EventBridge, which makes it easy to build an event-driven remediation engine and allows us to define Lambda functions as targets for Security Hub findings and custom actions. EventBridge allows us to define filters for security findings and therefore map finding types to specific remediation functions. Upon successfully performing security remediation, each function updates one or more Security Hub findings by calling the BatchUpdateFindings API and passing the corresponding finding ID.

The following example code shows a function enforcing an IAM password policy:

import boto3
import os
import logging
from botocore.exceptions import ClientError

iam = boto3.client("iam")
securityhub = boto3.client("securityhub")

log_level = os.environ.get("LOG_LEVEL", "INFO")
logging.root.setLevel(logging.getLevelName(log_level))
logger = logging.getLogger(__name__)


def lambda_handler(event, context, iam=iam, securityhub=securityhub):
    """Remediate findings related to cis15 and cis11.

    Params:
        event: Lambda event object
        context: Lambda context object
        iam: iam boto3 client
        securityhub: securityhub boto3 client
    Returns:
        No returns
    """
    finding_id = event["detail"]["findings"][0]["Id"]
    product_arn = event["detail"]["findings"][0]["ProductArn"]
    lambda_name = os.environ["AWS_LAMBDA_FUNCTION_NAME"]
    try:
        iam.update_account_password_policy(
            MinimumPasswordLength=14,
            RequireSymbols=True,
            RequireNumbers=True,
            RequireUppercaseCharacters=True,
            RequireLowercaseCharacters=True,
            AllowUsersToChangePassword=True,
            MaxPasswordAge=90,
            PasswordReusePrevention=24,
            HardExpiry=True,
        )
        logger.info("IAM Password Policy Updated")
    except ClientError as e:
        logger.exception(e)
        raise e
    try:
        securityhub.batch_update_findings(
            FindingIdentifiers=[{"Id": finding_id, "ProductArn": product_arn},],
            Note={
                "Text": "Changed non compliant password policy",
                "UpdatedBy": lambda_name,
            },
            Workflow={"Status": "RESOLVED"},
        )
    except ClientError as e:
        logger.exception(e)
        raise e

A key aspect in developing remediation Lambda functions is testability. To quickly iterate through testing cycles, we cover each remediation function with unit tests, in which necessary dependencies are mocked and replaced with stub objects. Because no Lambda deployment is required to check remediation logic, we can test newly developed functions and ensure reliability of existing ones in seconds.

Each Lambda function developed is accompanied with an event.json document containing an example of an EventBridge event for a given security finding. A security finding event allows us to verify remediation logic precisely, including deletion or suspension of non-compliant resources or a finding status update in Security Hub and the response returned. Unit tests cover both successful and erroneous remediation logic. We use pytest to develop unit tests, and botocore.stub and moto to replace runtime dependencies with mocks and stubs.

Automated security findings remediation

The following diagram illustrates our security assessment and automated remediation process.

Automated remediation flow HDI

The workflow includes the following steps:

An existing Security Hub integration performs periodic resource audits. The integration posts new security findings to Security Hub.
Security Hub reports the security incident to the company’s centralized Service Now instance by using the Service Now ITSM Security Hub integration.
Security Hub triggers automated remediation:
1. Security Hub triggers the remediation function by sending an event to EventBridge. The event has a source field equal to aws.securityhub, with the filter ID corresponding to the specific finding type and compliance status as FAILED. The combination of these fields allows us to map the event to a particular remediation function.
2. The remediation function starts processing the security finding event.
3. The function calls the UpdateFindings Security Hub API to update the security finding status upon completing remediation.
4. Security Hub updates the corresponding security incident status in Service Now (Step 2)
Alternatively, the security operations engineer resolves the security incident in Service Now:
1. The engineer reviews the current security incident in Service Now.
2. The engineer manually resolves the security incident in Service Now.
3. Service Now updates the finding status by calling the UpdateFindings Security Hub API. Service Now uses the AWS Service Management Connector.
Alternatively, the platform security engineer triggers remediation:
1. The engineer reviews the currently active security findings on the Security Hub findings page.
2. The engineer triggers remediation from the security findings page by selecting the appropriate action.
3. Security Hub triggers the remediation function by sending an event with the source aws.securityhub to EventBridge. The automated remediation flow continues as described in the Step 3.

Deployment automation

Due to legal requirements, HDI uses the infrastructure as code (IaC) principle while defining and deploying AWS infrastructure. We started with AWS CloudFormation templates defined as YAML or JSON format. The templates are static by nature and define resources in a declarative way. We figured out that as our solution complexity grows, the CloudFormation templates also grow in size and complexity, because all the resources deployed have to be explicitly defined. We wanted a solution to increase our development productivity and simplify infrastructure definition.

The AWS Cloud Development Kit (AWS CDK) helped us in two ways:

The AWS CDK provides ready-to-use building blocks called constructs. These constructs include pre-configured AWS services following best practices. For example, a Lambda function always gets an IAM role with an IAM policy to be able to write logs to CloudWatch Logs.
The AWS CDK allows us to use high-level programming languages to define configuration of all AWS services. Imperative definition allows us to build our own abstractions and reuse them to achieve concise resource definition.

We found that implementing IaC with the AWS CDK is faster and less error-prone. At HDI, we use Python to build application logic and define AWS infrastructure. The imperative nature of the AWS CDK is truly a turning point in fulfilling legal requirements and achieving high developer productivity at the same time.

One of the AWS CDK constructs we use is AWS CDK pipeline. This construct creates a customizable continuous integration and continuous delivery (CI/CD) pipeline implemented with AWS CodePipeline. The source action is based on AWS CodeCommit. The synth action is responsible for creating a CloudFormation template from the AWS CDK project. The synth action also runs unit tests on remediations functions. The pipeline actions are connected via artifacts. Lastly, the AWS CDK pipeline constructs offer a self-mutating feature, which allows us to maintain the AWS CDK project as well as the pipeline in a single code repository. Changes of the pipeline definition as well as automated remediation solutions are deployed seamlessly. The actual solution deployment is also implemented as a CI/CD stage. Stages can be eventually deployed in cross-Region and cross-account patterns. To use cross-account deployments, the AWS CDK provides a bootstrap functionality to create a trust relationship between AWS accounts.

The AWS CDK project is broken down to multiple stacks. To deploy the CI/CD pipeline, we run the cdk deploy cicd-4-securityhub command. To add a new Lambda remediation function, we must add remediation code, optional unit tests, and finally the Lambda remediation configuration object. This configuration object defines the Lambda function’s environment variables, necessary IAM policies, and external dependencies. See the following example code of this configuration:

prowler_729_lambda = {
    "name": "Prowler 7.29",
    "id": "prowler729",
    "description": "Remediates Prowler 7.29 by deleting/terminating unencrypted EC2 instances/EBS volumes",
    "policies": [
        _iam.PolicyStatement(
            effect=_iam.Effect.ALLOW,
            actions=["ec2:TerminateInstances", "ec2:DeleteVolume"],
            resources=["*"])
        ],
    "path": "delete_unencrypted_ebs_volumes",
    "environment_variables": [
        {"key": "ACCOUNT_ID", "value": core.Aws.ACCOUNT_ID}
    ],
    "filter_id": ["prowler-extra729"],
 }

Remediation functions are organized in accordance with the security and compliance frameworks they belong to. The AWS CDK code iterates over remediation definition lists and synthesizes corresponding policies and Lambda functions to be deployed later. Committing Git changes and pushing them triggers the CI/CD pipeline, which deploys the newly defined remediation function and adjusts the configuration of Prowler.

We are working on publishing the source code discussed in this blog post.

Looking forward

As we keep introducing new use cases in the cloud, we plan to improve our solution in the following ways:

Continuously add new controls based on our own experience and improving industry standards
Introduce cross-account security and compliance assessment by consolidating findings in a central security account
Improve automated remediation resiliency by introducing remediation failure notifications and retry queues
Run a Well-Architected review to identify and address possible areas of improvement

Conclusion

Working on the solution described in this post helped us improve our security posture and meet compliancy requirements in the cloud. Specifically, we were able to achieve the following:

Gain a shared understanding of security and compliance controls implementation as well as shared responsibilities in the cloud between multiple teams
Speed up security reviews of cloud environments by implementing continuous assessment and minimizing manual reviews
Provide product and platform teams with secure and compliant environments
Lay a foundation for future requirements and improvement of security posture in the cloud

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

Noise