Tag Archives: Best practices

How to use tokenization to improve data security and reduce audit scope

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/how-to-use-tokenization-to-improve-data-security-and-reduce-audit-scope/

Tokenization of sensitive data elements is a hot topic, but you may not know what to tokenize, or even how to determine if tokenization is right for your organization’s business needs. Industries subject to financial, data security, regulatory, or privacy compliance standards are increasingly looking for tokenization solutions to minimize distribution of sensitive data, reduce risk of exposure, improve security posture, and alleviate compliance obligations. This post provides guidance to determine your requirements for tokenization, with an emphasis on the compliance lens given our experience as PCI Qualified Security Assessors (PCI QSA).

What is tokenization?

Tokenization is the process of replacing actual sensitive data elements with non-sensitive data elements that have no exploitable value for data security purposes. Security-sensitive applications use tokenization to replace sensitive data, such as personally identifiable information (PII) or protected health information (PHI), with tokens to reduce security risks.

De-tokenization returns the original data element for a provided token. Applications may require access to the original data, or an element of the original data, for decisions, analysis, or personalized messaging. To minimize the need to de-tokenize data and to reduce security exposure, tokens can retain attributes of the original data to enable processing and analysis using token values instead of the original data. Common characteristics tokens may retain from the original data are:

Format attributes

Length for compatibility with storage and reports of applications written for the original data
Character set for compatibility with display and data validation of existing applications
Preserved character positions such as first 6 and last 4 for credit card PAN

Analytics attributes

Mapping consistency where the same data always results in the same token
Sort order

Retaining functional attributes in tokens must be implemented in ways that do not defeat the security of the tokenization process. Using attribute preservation functions can possibly reduce the security of a specific tokenization implementation. Limiting the scope and access to tokens addresses limitations introduced when using attribute retention.

Why tokenize? Common use cases

I need to reduce my compliance scope

Tokens are generally not subject to compliance requirements if there is sufficient separation of the tokenization implementation and the applications using the tokens. Encrypted sensitive data may not reduce compliance obligations or scope. Such industry regulatory standards as PCI DSS 3.2.1 still consider systems that store, process, or transmit encrypted cardholder data as in-scope for assessment; whereas tokenized data may remove those systems from assessment scope. A common use case for PCI DSS compliance is replacing PAN with tokens in data sent to a service provider, which keeps the service provider from being subject to PCI DSS.

I need to restrict sensitive data to only those with a “need-to-know”

Tokenization can be used to add a layer of explicit access controls to de-tokenization of individual data items, which can be used to implement and demonstrate least-privileged access to sensitive data. For instances where data may be co-mingled in a common repository such as a data lake, tokenization can help ensure that only those with the appropriate access can perform the de-tokenization process and reveal sensitive data.

I need to avoid sharing sensitive data with my service providers

Replacing sensitive data with tokens before providing it to service providers who have no access to de-tokenize data can eliminate the risk of having sensitive data within service providers’ control, and avoid having compliance requirements apply to their environments. This is common for customers involved in the payment process, which provides tokenization services to merchants that tokenize the card holder data, and return back to their customers a token they can use to complete card purchase transactions.

I need to simplify data lake security and compliance

A data lake centralized repository allows you to store all your structured and unstructured data at any scale, to be used later for not-yet-determined analysis. Having multiple sources and data stored in multiple structured and unstructured formats creates complications for demonstrating data protection controls for regulatory compliance. Ideally, sensitive data should not be ingested at all; however, that is not always feasible. Where ingestion of such data is necessary, tokenization at each data source can keep compliance-subject data out of data lakes, and help avoid compliance implications. Using tokens that retain data attributes, such as data-to-token consistency (idempotence) can support many of the analytical capabilities that make it useful to store data in the data lake.

I want to allow sensitive data to be used for other purposes, such as analytics

Your organization may want to perform analytics on the sensitive data for other business purposes, such as marketing metrics, and reporting. By tokenizing the data, you can minimize the locations where sensitive data is allowed, and provide tokens to users and applications needing to conduct data analysis. This allows numerous applications and processes to access the token data and maintain security of the original sensitive data.

I want to use tokenization for threat mitigation

Using tokenization can help you mitigate threats identified in your workload threat model, depending on where and how tokenization is implemented. At the point where the sensitive data is tokenized, the sensitive data element is replaced with a non-sensitive equivalent throughout the data lifecycle, and across the data flow. Some important questions to ask are:

  • What are the in-scope compliance, regulatory, privacy, or security requirements for the data that will be tokenized?
  • When does the sensitive data need to be tokenized in order to meet security and scope reduction objectives?
  • What attack vector is being addressed for the sensitive data by tokenizing it?
  • Where is the tokenized data being hosted? Is it in a trusted environment or an untrusted environment?

For additional information on threat modeling, see the AWS security blog post How to approach threat modeling.

Tokenization or encryption consideration

Tokens can provide the ability to retain processing value of the data while still managing the data exposure risk and compliance scope. Encryption is the foundational mechanism for providing data confidentiality.

Encryption rarely results in cipher text with a similar format to the original data, and may prevent data analysis, or require consuming applications to adapt.

Your decision to use tokenization instead of encryption should be based on the following:

Reduction of compliance scope As discussed above, by properly utilizing tokenization to obfuscate sensitive data you may be able to reduce the scope of certain framework assessments such as PCI DSS 3.2.1.
Format attributes Used for compatibility with existing software and processes.
Analytics attributes Used to support planned data analysis and reporting.
Elimination of encryption key management A tokenization solution has one essential API—create token—and one optional API—retrieve value from token. Managing access controls can be simpler than some non-AWS native general purpose cryptographic key use policies. In addition, the compromise of the encryption key compromises all data encrypted by that key, both past and future. The compromise of the token database compromises only existing tokens.

Where encryption may make more sense

Although scope reduction, data analytics, threat mitigation, and data masking for the protection of sensitive data make very powerful arguments for tokenization, we acknowledge there may be instances where encryption is the more appropriate solution. Ask yourself these questions to gain better clarity on which solution is right for your company’s use case.

Scalability If you require a solution that scales to large data volumes, and have the availability to leverage encryption solutions that require minimal key management overhead, such as AWS Key Management Services (AWS KMS), then encryption may be right for you.
Data format If you need to secure data that is unstructured, then encryption may be the better option given the flexibility of encryption at various layers and formats.
Data sharing with 3rd parties If you need to share sensitive data in its original format and value with a 3rd party, then encryption may be the appropriate solution to minimize external access to your token vault for de-tokenization processes.

What type of tokenization solution is right for your business?

When trying to decide which tokenization solution to use, your organization should first define your business requirements and use cases.

  1. What are your own specific use cases for tokenized data, and what is your business goal? Identifying which use cases apply to your business and what the end state should be is important when determining the correct solution for your needs.
  2. What type of data does your organization want to tokenize? Understanding what data elements you want to tokenize, and what that tokenized data will be used for may impact your decision about which type of solution to use.
  3. Do the tokens need to be deterministic, the same data always producing the same token? Knowing how the data will be ingested or used by other applications and processes may rule out certain tokenization solutions.
  4. Will tokens be used internally only, or will the tokens be shared across other business units and applications? Identifying a need for shared tokens may increase the risk of token exposure and, therefore, impact your decisions about which tokenization solution to use.
  5. How long does a token need to be valid? You will need to identify a solution that can meet your use cases, internal security policies, and regulatory framework requirements.

Choosing between self-managed tokenization or tokenization as a service

Do you want to manage the tokenization within your organization, or use Tokenization as a Service (TaaS) offered by a third-party service provider? Some advantages to managing the tokenization solution with your company employees and resources are the ability to direct and prioritize the work needed to implement and maintain the solution, customizing the solution to the application’s exact needs, and building the subject matter expertise to remove a dependency on a third party. The primary advantages of a TaaS solution are that it is already complete, and the security of both tokenization and access controls are well tested. Additionally, TaaS inherently demonstrates separation of duties, because privileged access to the tokenization environment is owned by the tokenization provider.

Choosing a reversible tokenization solution

Do you have a business need to retrieve the original data from the token value? Reversible tokens can be valuable to avoid sharing sensitive data with internal or third-party service providers in payments and other financial services. Because the service providers are passed only tokens, they can avoid accepting additional security risk and compliance scope. If your company implements or allows de-tokenization, you will need to be able to demonstrate strict controls on the management and use of de-tokenization privilege. Eliminating the implementation of de-tokenization is the clearest way to demonstrate that downstream applications cannot have sensitive data. Given the security and compliance risks of converting tokenized data back into its original data format, this process should be highly monitored, and you should have appropriate alerting in place to detect each time this activity is performed.

Operational considerations when deciding on a tokenization solution

While operational considerations are outside the scope of this post, they are important factors for choosing a solution. Throughput, latency, deployment architecture, resiliency, batch capability, and multi-regional support can impact the tokenization solution of choice. Integration mechanisms with identity and access control and logging architectures, for example, are important for compliance controls and evidence creation.

No matter which deployment model you choose, the tokenization solution needs to meet security standards, similar to encryption standards, and must prevent determining what the original data is from the token values.

Conclusion

Using tokenization solutions to replace sensitive data offers many security and compliance benefits. These benefits include lowered security risk and smaller audit scope, resulting in lower compliance costs and a reduction in regulatory data handling requirements.

Your company may want to use sensitive data in new and innovative ways, such as developing personalized offerings that use predictive analysis and consumer usage trends and patterns, fraud monitoring and minimizing financial risk based on suspicious activity analysis, or developing business intelligence to improve strategic planning and business performance. If you implement a tokenization solution, your organization can alleviate some of the regulatory burden of protecting sensitive data while implementing solutions that use obfuscated data for analytics.

On the other hand, tokenization may also add complexity to your systems and applications, as well as adding additional costs to maintain those systems and applications. If you use a third-party tokenization solution, there is a possibility of being locked into that service provider due to the specific token schema they may use, and switching between providers may be costly. It can also be challenging to integrate tokenization into all applications that use the subject data.

In this post, we have described some considerations to help you determine if tokenization is right for you, what to consider when deciding which type of tokenization solution to use, and the benefits. disadvantages, and comparison of tokenization and encryption. When choosing a tokenization solution, it’s important for you to identify and understand all of your organizational requirements. This post is intended to generate questions your organization should answer to make the right decisions concerning tokenization.

You have many options available to tokenize your AWS workloads. After your organization has determined the type of tokenization solution to implement based on your own business requirements, explore the tokenization solution options available in AWS Marketplace. You can also build your own solution using AWS guides and blog posts. For further reading, see this blog post: Building a serverless tokenization solution to mask sensitive data.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Security Assurance Services or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Tim Winston

Tim is a Senior Assurance Consultant with AWS Security Assurance Services. He leverages more than 20 years’ experience as a security consultant and assessor to provide AWS customers with guidance on payment security and compliance. He is a co-author of the “Payment Card Industry Data Security Standard (PCI DSS) 3.2.1 on AWS”.

Author

Kristine Harper

Kristine is a Senior Assurance Consultant and PCI DSS Qualified Security Assessor (QSA) with AWS Security Assurance Services. Her professional background includes security and compliance consulting with large fintech enterprises and government entities. In her free time, Kristine enjoys traveling, outdoor activities, spending time with family, and spoiling her pets.

Author

Michael Guzman

Michael is an Assurance Consultant with AWS Security Assurance Services. Michael is a PCI QSA and HITRUST CCSFP, along with holding several AWS certifications. His background is in Financial Services IT Operations and Administrations, with over 20 years experience within that industry. In his spare time Michael enjoy’s spending time with his family, continuing to improve his golf skills and perfecting his Tri-Tip recipe.

Migrating AWS Lambda functions to Arm-based AWS Graviton2 processors

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-to-arm-based-aws-graviton2-processors/

AWS Lambda now allows you to configure new and existing functions to run on Arm-based AWS Graviton2 processors in addition to x86-based functions. Using this processor architecture option allows you to get up to 34% better price performance. This blog post highlights some considerations when moving from x86 to arm64 as the migration process is code and workload dependent.

Functions using the Arm architecture benefit from the performance and security built into the Graviton2 processor, which is designed to deliver up to 19% better performance for compute-intensive workloads. Workloads using multithreading and multiprocessing, or performing many I/O operations, can experience lower invocation time, which reduces costs.

Duration charges, billed with millisecond granularity, are 20 percent lower when compared to current x86 pricing. This also applies to duration charges when using Provisioned Concurrency. Compute Savings Plans supports Lambda functions powered by Graviton2.

The architecture change does not affect the way your functions are invoked or how they communicate their responses back. Integrations with APIs, services, applications, or tools are not affected by the new architecture and continue to work as before.

The following runtimes, which use Amazon Linux 2, are supported on Arm:

  • Node.js 12 and 14
  • Python 3.8 and 3.9
  • Java 8 (java8.al2) and 11
  • .NET Core 3.1
  • Ruby 2.7
  • Custom runtime (provided.al2)

[email protected] does not support Arm as an architecture option.

You can create and manage Lambda functions powered by Graviton2 processor using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and AWS Cloud Development Kit (AWS CDK). Support is also available through many AWS Lambda Partners.

Understanding Graviton2 processors

AWS Graviton processors are custom built by AWS. Generally, you don’t need to know about the specific Graviton processor architecture, unless your applications can benefit from specific features.

The Graviton2 processor uses the Neoverse-N1 core and supports Arm V8.2 (include CRC and crypto extensions) plus several other architectural extensions. In particular, Graviton2 supports the Large System Extensions (LSE), which improve locking and synchronization performance across large systems.

Migrating x86 Lambda functions to arm64

Many Lambda functions may only need a configuration change to take advantage of the price/performance of Graviton2. Other functions may require repackaging the Lambda function using Arm-specific dependencies, or rebuilding the function binary or container image.

You may not require an Arm processor on your development machine to create Arm-based functions. You can build, test, package, compile, and deploy Arm Lambda functions on x86 machines using AWS SAM and Docker Desktop. If you have an Arm-based system, such as an Apple M1 Mac, you can natively compile binaries.

Functions without architecture-specific dependencies or binaries

If your functions don’t use architecture-specific dependencies or binaries, you can switch from one architecture to the other with a single configuration change. Many functions using interpreted languages such as Node.js and Python, or functions compiled to Java bytecode, can switch without any changes. Ensure you check binaries in dependencies, Lambda layers, and Lambda extensions.

To switch functions from x86 to arm64, you can change the Architecture within the function runtime settings using the Lambda console.

Edit AWS Lambda function Architecture

Edit AWS Lambda function Architecture

If you want to display or log the processor architecture from within a Lambda function, you can use OS specific calls. For example, Node.js process.arch or Python platform.machine().

When using the AWS CLI to create a Lambda function, specify the --architectures option. If you do not specify the architecture, the default value is x86-64. For example, to create an arm64 function, specify --architectures arm64.

aws lambda create-function \
    --function-name MyArmFunction \
    --runtime nodejs14.x \
    --architectures arm64 \
    --memory-size 512 \
    --zip-file fileb://MyArmFunction.zip \
    --handler lambda.handler \
    --role arn:aws:iam::123456789012:role/service-role/MyArmFunction-role

When using AWS SAM or CloudFormation, add or amend the Architectures property within the function configuration.

MyArmFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: nodejs14.x
    Code: src/
    Architectures:
  	- arm64
    Handler: lambda.handler
    MemorySize: 512

When initiating an AWS SAM application, you can specify:

sam init --architecture arm64

When building Lambda layers, you can specify CompatibleArchitectures.

MyArmLayer:
  Type: AWS::Lambda::LayerVersion
  Properties:
    ContentUri: layersrc/
    CompatibleArchitectures:
      - arm64

Building function code for Graviton2

If you have dependencies or binaries in your function packages, you must rebuild the function code for the architecture you want to use. Many packages and dependencies have arm64 equivalent versions. Test your own workloads against arm64 packages to see if your workloads are good migration candidates. Not all workloads show improved performance due to the different processor architecture features.

For compiled languages like Rust and Go, you can use the provided.al2 custom runtime, which supports Arm. You provide a binary that communicates with the Lambda Runtime API.

When compiling for Go, set GOARCH to arm.

GOOS=linux GOARCH=arm go build

When compiling for Rust, set the target.

cargo build --release -- target-cpu=neoverse-n1

The default installation of Python pip on some Linux distributions is out of date (<19.3). To install binary wheel packages released for Graviton, upgrade the pip installation using:

sudo python3 -m pip install --upgrade pip

The Arm software ecosystem is continually improving. As a general rule, use later versions of compilers and language runtimes whenever possible. The AWS Graviton Getting Started GitHub repository includes known recent changes to popular packages that improve performance, including ffmpeg, PHP, .Net, PyTorch, and zlib.

You can use https://pkgs.org/ as a package repository search tool.

Sometimes code includes architecture specific optimizations. These can include code optimized in assembly using specific instructions for CRC, or enabling a feature that works well on particular architectures. One way to see if any optimizations are missing for arm64 is to search the code for __x86_64__ ifdefs and see if there is corresponding arm64 code included. If not, consider alternative solutions.

For additional language-specific considerations, see the links within the GitHub repository.

The Graviton performance runbook is a performance profiling reference by the Graviton to benchmark, debug, and optimize application code.

Building functions packages as container images

Functions packaged as container images must be built for the architecture (x86 or arm64) they are going to use. There are arm64 architecture versions of the AWS provided base images for Lambda. To specify a container image for arm64, use the arm64 specific image tag, for example, for Node.js 14:

  • public.ecr.aws/lambda/nodejs:14-arm64
  • public.ecr.aws/lambda/nodejs:latest-arm64
  • public.ecr.aws/lambda/nodejs:14.2021.10.01.16-arm64

Arm64 Images are also available from Docker Hub.

You can also use arbitrary Linux base images in addition to the AWS provided Amazon Linux 2 images. Images that support arm64 include Alpine Linux 3.12.7 or later, Debian 10 and 11, Ubuntu 18.04 and 20.04. For more information and details of other supported Linux versions, see Operating systems available for Graviton based instances.

Migrating a function

Here is an example of how to migrate a Lambda function from x86 to arm64 and take advantage of newer software versions to improve price and performance. You can follow a similar approach to test your own code.

I have an existing Lambda function as part of an AWS SAM template configured without an Architectures property, which defaults to x86_64.

  Imagex86Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9

The Lambda function code performs some compute intensive image manipulation. The code uses a dependency configured with the following version:

{
  "dependencies": {
    "imagechange": "^1.1.1"
  }
}

I duplicate the Lambda function within the AWS SAM template using the same source code and specify arm64 as the Architectures.

  ImageArm64Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9
      Architectures:
        - arm64

I use AWS SAM to build both Lambda functions. I specify the --use-container flag to build each function within its architecture-specific build container.

sam build –use-container

I can use sam local invoke to test the arm64 function locally even on an x86 system.

AWS SAM local invoke

AWS SAM local invoke

I then use sam deploy to deploy the functions to the AWS Cloud.

The AWS Lambda Power Tuning open-source project runs your functions using different settings to suggest a configuration to minimize costs and maximize performance. The tool allows you to compare two results on the same chart and incorporate arm64-based pricing. This is useful to compare two versions of the same function, one using x86 and the other arm64.

I compare the performance of the X86 and arm64 Lambda functions and see that the arm64 Lambda function is 12% cheaper to run:

Compare x86 and arm64 with dependency version 1.1.1

Compare x86 and arm64 with dependency version 1.1.1

I then upgrade the package dependency to use version 1.2.1, which has been optimized for arm64 processors.

{
  "dependencies": {
    "imagechange": "^1.2.1"
  }
}

I use sam build and sam deploy to redeploy the updated Lambda functions with the updated dependencies.

I compare the original x86 function with the updated arm64 function. Using arm64 with a newer dependency code version increases the performance by 30% and reduces the cost by 43%.

Compare x86 and arm64 with dependency version 1.2.1

Compare x86 and arm64 with dependency version 1.2.1

You can use Amazon CloudWatch,to view performance metrics such as duration, using statistics. You can then compare average and p99 duration between the two architectures. Due to the Graviton2 architecture, functions may be able to use less memory. This could allow you to right-size function memory configuration, which also reduces costs.

Deploying arm64 functions in production

Once you have confirmed your Lambda function performs successfully on arm64, you can migrate your workloads. You can use function versions and aliases with weighted aliases to control the rollout. Traffic gradually shifts to the arm64 version or rolls back automatically if any specified CloudWatch alarms trigger.

AWS SAM supports gradual Lambda deployments with a feature called Safe Lambda deployments using AWS CodeDeploy. You can compile package binaries for arm64 using a number of CI/CD systems. AWS CodeBuild supports building Arm based applications natively. CircleCI also has Arm compute resource classes for deployment. GitHub Actions allows you to use self-hosted runners. You can also use AWS SAM within GitHub Actions and other CI/CD pipelines to create arm64 artifacts.

Conclusion

Lambda functions using the Arm/Graviton2 architecture provide up to 34 percent price performance improvement. This blog discusses a number of considerations to help you migrate functions to arm64.

Many functions can migrate seamlessly with a configuration change, others need to be rebuilt to use arm64 packages. I show how to migrate a function and how updating software to newer versions may improve your function performance on arm64. You can test your own functions using the Lambda PowerTuning tool.

Start migrating your Lambda functions to Arm/Graviton2 today.

For more serverless learning resources, visit Serverless Land.

Top 10 security best practices for securing backups in AWS

Post Syndicated from Ibukun Oyewumi original https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-backups-in-aws/

Security is a shared responsibility between AWS and the customer. Customers have asked for ways to secure their backups in AWS. This post will guide you through a curated list of the top ten security best practices to secure your backup data and operations in AWS. While this blog post focuses on backup data and operations in AWS Backup service, the recommended security best practices can be leveraged by organizations that utilize other backup solutions, such as backup tools from the AWS Marketplace.

Since security practices constantly evolve to mitigate new risks, it’s important that you conduct regular risk assessments to determine the applicability of security controls, and implement multiple layers of controls to mitigate risks to your data.

#1 – Implement a backup strategy

A comprehensive backup strategy is an essential part of an organization’s data protection plan to withstand, recover, and reduce any impact that might be sustained due to a security event. You should create an extensive backup strategy that defines which data must be backed up, how often data must be backed up, and monitoring of backup and recovery tasks. When you develop a comprehensive strategy for backing up and restoring data, you should first identify interruptions that may occur, and their potential business impact.

Your objective should be building a recovery strategy that brings your workload back up or avoids downtime within the acceptable Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the acceptable delay between the interruption of service and restoration of service, and RPO is the acceptable amount of time since the last data recovery point. You should consider a granular backup strategy that includes all of the following: continuous backup cadence, Point-in-Time Recovery (PITR), file-level recovery, application data–level recovery, volume-level recovery, instance-level recovery, etc.

A well-designed backup strategy should include actions that can protect and recover your resources from ransomware, with detailed recovery requirements for your applications and their data dependencies. For example, while you establish preventive and detective controls to mitigate the risk of ransomware, you should also design the appropriate level of granularity for cross-region and/or cross-account copy and restore patterns, to ensure that administrators do not restore corrupt backup data in the event of a security event.

In some industries, when developing a backup strategy, you must also consider the regulations for data retention requirements. You should make sure your backup strategy is designed with the necessary retention requirements (per data classification level and/or resource type) sufficient to meet your regulatory needs.

Consult your security compliance teams to validate whether your backup resources and operations should be included or segmented from the scope of your compliance programs. In my experience as a PCI DSS Qualified Security Assessor (QSA), I’ve seen successful/more mature customers include backup and recovery as critical parts of their security program. This helps them understand where data is across their environment and appropriately define compliance scope.

Refer to Backup and Recovery Approaches Using AWS and the Reliability Pillar of the AWS Well-Architected Framework for architectural best practices for designing and operating reliable, secure, efficient, and cost-effective workloads in the cloud.

#2 – Incorporate backup in DR and BCP

Disaster recovery (DR) is the process of preparing, responding, and recovering from a disaster. It is an important part of your resiliency strategy, and concerns how your workload responds when a disaster strikes. A disaster could be a technical failure, human action, or natural event. A Business Continuity Plan (BCP) outlines how an organization intends to continue normal business operations during an unplanned disruption.

Your disaster recovery plan should be a subset of your organization’s business continuity plan (BCP) and you should incorporate AWS Backup procedures in your enterprise business continuity plan. For example, a security event that affects production data might require you to invoke a disaster recovery plan that fails over to backup data from another AWS Region. You should ensure that your employees are familiar with and have practiced using AWS Backup along with your organizational procedures, so that if disaster strikes, your organization can continue its normal operations with little or no service disruption.

#3 – Automate backup operations

Organizations should configure their backup plans and resource assignments to reflect their enterprise data protection policies. Automating and deploying backup policies or organization-wide backup plans allows you to standardize and scale your backup strategy. You can leverage AWS Organizations to centrally automate backup policies to implement, configure, manage, and govern backup activity across supported AWS resources by scheduling backup operations.

You should consider implementing infrastructure as code (IaC) and event-driven architecture as essential parts of your digital transformation and backup strategy, to improve productivity and govern infrastructure operations across multi-account environments. Automating backups allows you to reduce manual overhead from time-consuming configuration of your backups, minimizes the risk for errors, provides visibility on drift detection, and enhances backup policy compliance across multiple AWS workloads or accounts.

Implementing backup policies as code can help you meet data protection regulations, by configuring different requirements for your resource types, scaling your enterprise data protection strategy, and implementing lifecycle rules to specify how long before a recovery point either transitions to cold storage or is deleted, which can help optimize your costs.

When automating your backup operations, you can scale resource assignment options using AWS Tags and Resource IDs to automatically identify the AWS resources that store data for your business-critical applications and protect your data using immutable backups. This can help you prioritize security controls, such as access permissions and backup plans or policies.

#4 – Implement access control mechanisms

When thinking about security in the cloud, your foundational strategy should begin with a strong identity foundation to ensure a user has the right permissions to access data. Appropriate authentication and authorization can mitigate the risk of security events. The shared responsibility model requires AWS customers to implement access control policies. You can use AWS Identity and Access Management (IAM) service to create and manage access policies at scale.

When configuring access rights and permissions, you should implement the principle of least privilege by ensuring each user or system accessing your backup data or Vault is only given the permissions necessary to fulfill their job duties. Using AWS Backup, you should implement access control policies by setting access policies on backup vaults to protect your cloud workloads.

For example, implementing access control policies allows you to grant users access to create backup plans and on-demand backups, but still limit their ability to delete recovery points once they’ve been created. Using vault access policies, you can share a destination backup vault with a source AWS Account, user, or IAM role, as required by your business needs. Access policy can also allow you to share a backup vault with one or multiple accounts, or with your entire organization in AWS Organizations.

As you scale your workloads or migrate into AWS, you may need to centrally manage permissions to your backup vaults and operations. You should use service control policies (SCPs) to implement centralized control over the maximum available permissions for all accounts in your organization. This offers defense in depth, and ensures your users stay within the defined access control guidelines. To learn more, read how you can secure your AWS Backup data and operations using service control policies (SCPs).

To mitigate security risks such as unintended access to your backup resources and data, use AWS IAM Access Analyzer to identify any AWS Backup IAM role shared with an external entity such as AWS account, a root user, an IAM user or role, a federated user, an AWS service, an anonymous user, or other entity that you can use to create a filter.

#5 – Encrypt backup data and vault

Organizations increasingly need to improve their data security strategy, and may be required to meet data protection regulations as they scale in the cloud. The correct implementation of encryption methods can provide an additional layer of protection above foundational access control mechanisms providing a mitigation if your primary access control policies fail.

For example, if you configure overly permissive access control policies on your Backup data, your key management system or process can mitigate the maximum impact of a security event, since there are separate authorization mechanisms to access your data and encryption key which means that the backup data is only viewable as cipher text.

To get the most from AWS cloud encryption, you should encrypt data both in transit and at rest. To protect data in transit, AWS uses published API calls to access AWS Backup through the network using Transport Layer Security (TLS) protocol to provide encryption between you, your application and the Backup service. To protect data at rest, AWS offers cloud-native options of using AWS Key Managed System (KMS) or AWS CloudHSM which leverages Advanced Encryption Standard (AES) with 256-bit keys (AES-256), a strong industry-adopted algorithm for encrypting data. You should evaluate your data governance and regulatory requirements, and select the appropriate encryption service to encrypt your cloud data and backup vaults.

Encryption configuration differs depending on the resource type and backup operations across accounts or Regions. Certain resource types support the ability to encrypt your backups using a separate encryption key from the key used to encrypt the source resource. Since you are responsible for managing access controls to determine who can access your Backup data or vault encryption keys and under which conditions, you should use the policy language offered by AWS KMS to define access controls on keys. You can also use AWS Backup Audit Manager to confirm that your backup is properly encrypted.

To learn more, refer to the documentation on encryption for backups and backup copies.

AWS KMS multi-Region keys allows you to replicate keys from one Region into another. Multi-Region keys are designed to simplify encryption management when your encrypted data has to be copied into other Regions for disaster recovery. You should evaluate the need to implement multi-region KMS keys as part of your overall backup strategy.

#6 – Safeguard backups using immutable storage

Immutable storage allows organizations to write data in a Write Once Read Many (WORM) state. While in a WORM state, data can be written one time, read and used as often as needed after it has been committed or written to the storage medium. Immutable storage ensures data integrity is maintained and provides protection against deletes, overwrites, inadvertent and unauthorized access, ransomware compromise etc. Immutable storage offers an efficient mechanism to address potential security events with real impacts on your business operations.

Immutable storage can be used for better governance when paired with strong SCP restrictions, or can be used in a compliance WORM mode when the letter of the law (such as a legal hold) requires access to immutable data.

You can maintain data availability and integrity with AWS Backup Vault Lock to protect your backups* such that unauthorized entities cannot erase, alter or corrupt your customer or business data during the required retention period. AWS Backup Vault Lock helps you meet your organization’s data protection policies by preventing deletions by privileged users (including the AWS account root user), changes to your backup lifecycle settings, and updates that alter your defined retention period.

AWS Backup Vault Lock ensures immutability and adds an additional layer of defense that protects backups (recovery points) in your Backup Vaults, especially in highly- regulated industries with stringent integrity needs for backups and archives. AWS Backup Vault Lock makes sure your data is preserved along with a backup to recover from in case of unintended or malicious actions.

*The feature has not yet been assessed for compliance with the Securities and Exchange Commission (SEC) rule 17a-4(f) and the Commodity Futures Trading Commission (CFTC) in regulation 17 C.F.R. 1.31(b)-(c).

#7 – Implement backup monitoring and alerting

Backup jobs can fail. A failed job, such as backup, restore, or copy task, may have impact on subsequent steps in a process. When the initial backup job fails, there’s a high probability that other succeeding tasks will also fail. In such a scenario, you can best understand the course of events through monitoring and notification.

Enabling and configuring notifications to generate emails to monitor AWS Backup jobs gives you awareness of your backup activities, ensures you meet critical service-level agreements (SLAs), enhances your business-as-usual monitoring, and helps you meet compliance obligations. You can implement backup monitoring for your workloads by integrating AWS Backup with other AWS services and ticketing systems to perform automated investigation and escalation flows.

For example, use Amazon CloudWatch to track metrics, create alarms, and view dashboards, Amazon EventBridge to monitor AWS Backup processes and events, AWS CloudTrail to monitor AWS Backup API calls with detailed information on the time, source IP, users, and accounts making those calls, and Amazon Simple Notification Service (Amazon SNS) to subscribe to AWS Backup-related topics such as backup, restore, and copy events. Monitoring and alerting can provide organizational awareness for your backup jobs, which helps you respond to backup failures.

You can use AWS Backup Audit Manager to automatically generate evidence of your daily backup audit reports per account and Region. You can also scale your backup monitoring across multiple accounts by using a set of automation templates and dashboards (known as the backup observer solution) to obtain aggregated daily cross-account multi-Region AWS Backup reporting.

#8 – Audit backup configuration

Organizations should audit the compliance of AWS Backup policies against defined controls such as defined backup frequency. You should continuously and automatically track your backup activity and generate automatic reports to find and investigate backup operations or resources which are not compliant with your business requirements.

AWS Backup Audit Manager provides built-in, customizable, compliance controls that align with your business compliance and regulatory requirements. AWS Backup Audit Manager provides five backup governance control templates, including backup resources protected by backup plans, backup plan with a minimum frequency and minimum retention, etc. If you leverage infrastructure-as-code automation, you can use AWS Backup Audit Manager with AWS CloudFormation.

AWS Security Hub provides you with a comprehensive view of your security state in AWS and helps you check your environment against security best practices and industry standards such as AWS Foundational Security Best Practices controls. If you leverage AWS Security Hub within your cloud environment, we recommend you enable the AWS Foundational Security Best Practices, as it includes detective controls that can help with securing backups in AWS. The detective controls in AWS Backup Audit Manager and Security Hub are also mostly available as AWS managed rules in AWS Config.

#9 – Test data recovery capabilities

Ideally, any data stored as a backup must be able to be successfully restored when required. Your backup strategy must include testing your backups. A backup strategy is not effective if backed up data cannot be restored. You should regularly test your ability to find certain recovery points and restore them. While AWS Backup automatically copies tags from the resources it protects to the recovery points, tags are not copied from recovery points to the corresponding restored resources. To scale your inventory management and locate recovery points, you should consider retaining your tags on resources created by AWS Backup restore jobs, using AWS Backup events to trigger a tag replication process.

You can start your data recovery workflow by establishing data recovery patterns and then regularly test them. You should create a simple and repeatable process that allows you to perform continuous data recovery testing to increase confidence in your ability to recover backup data. For example, you can create a pattern to test a cross-account, cross-region restore operation from a central DR backup vault encrypted with a customer-managed KMS key to a source account backup vault encrypted with a different customer-managed KMS key.

If you don’t frequently test such restore operations, you may find that your assumptions on KMS encryption for cross-account, cross-region operations are incorrect. Oftentimes, the only backup recovery pattern that actually works is the path you test frequently. Through routine testing of supported backup resource types, you can spot early warnings that could potentially cause future disturbances and loss of critical data. If possible, maintain a limited but feasible number of recovery paths and patterns to prevent wasted storage space, optimize costs, and save time. It’s easier to fix the problem when a recovery test fails than losing valuable or critical data.

#10 – Incorporate backup in incident response plan

Security Incident Response Simulations (SIRS) are internal events that provide a structured opportunity to practice your incident response plan and procedures during a realistic scenario. It’s valuable to test your backup data and operations in creative SIRS activities to test yourself against the unexpected. This helps you validate your organizational readiness and develop comfort with the rare and unexpected. Your simulations must be realistic, and should involve cross-functional organizational teams required to respond to events.

Start with basic and easy simulation exercises, and work towards a full, complex event. For example, you can build a realistic model that consists of an Amazon Virtual Private Cloud and associated resources that simulate inadvertent overexposure of information or a potential data breach due to changes to policies and access control lists. Document lessons learned to evaluate how well your incident response plan worked, and to identify improvements that need to be made to future response procedures.

You can use AWS Backup to set up automated instance-level backups as AMIs and volume-level backups as snapshots across multiple AWS accounts. This can help your incident response team enhance their forensic process such as automated forensic disk collection, by providing a restore point that could reduce the scope and impact of potential security events such as ransomware.

Conclusion

In this blog post, I showed you the top ten security best practices and controls to protect your backup data in AWS. I encourage you to use these best practices to design and implement a backup and recovery strategy and architecture with multiple layers of controls that scales and achieves your business needs. To learn more about AWS Backup, refer to the AWS Backup documentation.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Backup forum or contact AWS Support.

Further reading

Additional resources to consider:

Prescriptive Guidance: Backup and recovery approaches on AWS

Blog: Automate centralized backup at scale across AWS services using AWS Backup

Blog: Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for Recovery in the Cloud

Blog: The importance of encryption and how AWS can help

Blog: Enhance the security posture of your backups with AWS Backup Vault Lock

Blog: Monitor, Evaluate, and Demonstrate Backup Compliance with AWS Backup Audit Manager

Blog: Create and share encrypted backups across accounts and Regions using AWS Backup

Blog: Simplify auditing your data protection policies with AWS Backup Audit Manager

Blog: Managing access to backups using service control policies with AWS Backup

Blog: Obtain aggregated daily cross-account multi-Region AWS Backup reporting

Want more AWS Security news? Follow us on Twitter.

Author

Ibukun Oyewumi

Ibukun is a Security Assurance Consultant at AWS. He focuses on helping customers architect, build, scale, and optimize security controls, risk management, and compliance.

Configure AWS SSO ABAC for EC2 instances and Systems Manager Session Manager

Post Syndicated from Rodrigo Ferroni original https://aws.amazon.com/blogs/security/configure-aws-sso-abac-for-ec2-instances-and-systems-manager-session-manager/

In this blog post, I show you how to configure AWS Single Sign-On to define attribute-based access control (ABAC) permissions to manage Amazon Elastic Compute Cloud (Amazon EC2) instances and AWS Systems Manager Session Manager for federated users. This combination allows you to control access to specific Amazon EC2 instances based on users’ attributes. I show you how defined AWS SSO identity source attributes like login and department can be used, and how custom attributes like SSMSessionRunAs can be used to pass these attributes into Amazon Web Services (AWS) from an external identity provider (IdP) using  SAML 2.0 assertion.

AWS SSO added support for ABAC to enable you to create fine-grained permissions for your workforce in AWS using user attributes. Using user attributes as tags in AWS helps you simplify the process of creating fine-grained permissions in AWS and enables you to ensure that your workforce has access only to the AWS resources with matching tags.

The new feature works with any supported AWS SSO identity source. This post walks you through the steps to enable attributes for access control, create permission sets and manage assignments when using a supported external IdP as your identity source.

Solution overview

The following architecture diagram—Figure 1—presents an overview of the solution.

Figure 1: Solution architecture diagram

Figure 1: Solution architecture diagram

In the example in Figure 1, Alice and Bob are users who each have the attributes
login
, department, and SSMSessionRunAs. These attributes are created and updated in the external directory—Okta in this example—under those users’ profiles. The first two attributes are automatically synchronized by using System for Cross-domain Identity Management (SCIM) protocol between AWS SSO and Okta and configured within AWS SSO settings. The third custom attribute is passed directly from Okta into the AWS accounts as a new SAML assertion.

Both users are using the same AWS SSO custom permission set that allows them to launch a new Amazon EC2 instance with proper tags enforcement. Based on those tags, they can start, stop, and restart the EC2 instance if they are in the same department, and to terminate it if they are the owner. Also, they can connect using Session Manager if they’re in the same department. Users can sign in to those instances using the Linux OS user defined in the attribute SSMSessionRunAs.

Prerequisites

To perform the steps to use AWS SSO attributes for ABAC, you must already have deployed AWS SSO for your AWS Organizations and have connected with an external identity source using SAML and SCIM protocols. For more information, see Checklist: Configuring ABAC in AWS using AWS SSO.

You need two test users for implementing and testing the solution. You can use two existing users, or create new users named Alice and Bob to match the solution and testing described in the following sections.

Implement the solution

The basic steps to implement the solution are:

  1. Confirm in AWS SSO settings that you have defined an external IdP, authentication via SAML 2.0, and provisioning via SCIM protocol.
  2. Enable attributes for access control and define the two supported attributes: login and department.
  3. Create a new user attribute in the Okta Directory.
  4. Edit and confirm the users’ attributes defined in the Okta Directory profile.
  5. Configure the SAML attribute statement in the Okta AWS SSO application.
  6. Create a new permission set using an ABAC policy.
  7. Create an AWS account assignment to the users using the permission set created in the previous step.

Confirm AWS SSO configuration

In this first step, you confirm that AWS SSO has been properly configured. Go to AWS SSO console SSO settings to check that the configuration of your identity source, authentication, and provisioning is as follows:

Identity source: External Identity Provider
Authentication: SAML 2.0
Provisioning: SCIM

  1. Confirm authentication is working as expected, by going to your user portal URL in a new browser instance (to ensure your user authentication doesn’t overwrite your existing authentication). The user portal offers a single place to access all the assigned AWS accounts, roles, and applications. For example, it should look like https://exampledomain.awsapps.com/start. Once you access it, the process automatically redirects the request to your external provider for authentication, and then returns the user to the AWS SSO user portal.
  2. To confirm provisioning, go to the AWS SSO console and choose Users from the right panel. You should see your Okta users assigned to the AWS SSO application being synchronized by SCIM protocol. Select any user to see the Created by SCIM and Updated by SCIM information for that user.

Enable AWS SSO attributes for access control

In this step, you enable ABAC and then configure AWS SSO attributes. This solution uses the Attributes for access control page in the AWS Management Console to enter the key and value pairs.

To enable attributes for access control

  1. Open the AWS SSO console.
  2. Choose Settings.
  3. On the Settings page, under Identity source, next to Attributes for access control, select Enable. As shown in Figure 2.
Figure 2: Attributes for access control settings (enable ABAC)

Figure 2: Attributes for access control settings (enable ABAC)

Once ABAC is enabled, you can select the attributes to be synchronized. For this use case, select login and department.

To select your attributes using the AWS SSO console

  1. Open the AWS SSO console.
  2. Choose Settings.
  3. On the Settings page, under Identity source, next to Attributes for access control, choose View details.
  4. On the Attributes for access control page, notice the Key and Value columns. This is where you will be mapping the attribute from your identity source to an attribute that AWS SSO passes as a session tag. Set the first key and value pair by entering login as the key and ${path:userName} as the value. Set the second key and value pair to department and ${path:enterprise.department}. The settings are shown in Figure 3 below.

    Figure 3: Map attributes using the Attributes for access control page

    Figure 3: Map attributes using the Attributes for access control page

  5. Choose Save changes.

Create a new attribute in Okta Directory

In this third step, you create the new custom attribute SSMSessionRunAs.

To create a new user attribute

  1. Open the Okta console.
  2. Under Directory, choose Profile Editor.
  3. Choose Edit Profile for Okta User (default).
  4. Under Attributes, choose Add Attribute as follows:
    Data type: Select String
    Display Name: Enter SSMSessionRunAs
    Variable Name: Enter SSMSessionRunAs
    Attribute Length: Select Less than and enter 10 (max).
  5. Choose Save.

Edit and confirm users’ attributes defined in Okta Directory profile

Now that you have the new attribute SSMSessionRunAs created, go to the users’ profiles to enter the Department and SSMSessionRunAs values for both users.

To edit and confirm users’ attributes

  1. Open the Okta console.
  2. Under Directory, choose People.
  3. Select user Bob.
  4. Under Profile tab choose Edit as follows:

    For the key Department, enter blue as the value.

    For the key SSMSessionRunAs, enter bob as the value.

  5. Choose Save.
  6. Repeat steps 1 through 5 for Alice. For the key Department, enter amber as the value and for SSMSessionRunAs, enter alice as the value.
  7. Confirm that the attributes of both users are defined in the external directory as follows:Username (login): [email protected]
    First name (firstName): Bob
    Last name (lastName): Rodriguez
    Display name (displayName): Bob
    Department (department): blue
    SSMSessionRunAs (SSMSessionRunAs): bob

    Username (login): [email protected]
    First name (firstName): Alice
    Last name (lastName): Rosalez
    Display name (displayName): Alice
    Department (department): amber
    SSMSessionRunAs (SSMSessionRunAs): alice

Configure SAML attribute statement in Okta AWS SSO application

The attribute SSMSessionRunAs isn’t available as an attribute within AWS SSO. However, you can include it by defining SAML attribute statements, which are inserted into the SAML assertions.

To create a new SAML attribute

  1. Open the Okta Application console.
  2. Choose AWS Single Sign-on application.
  3. On the Sign On tab, choose Edit Settings.
  4. Under SAML 2.0 Attributes Statements enter the following:
    • For Name, enter https://aws.amazon.com/SAML/Attributes/AccessControl:SSMSessionRunAs
    • For Name format, select URI Reference
    • For Value, enter user.SSMSessionRunAs
  5. Choose Save.

Create a new permission set using an ABAC policy

In this step, you create a permissions policy that determines who can access your AWS resources based on the configured attribute value. When you enable ABAC and specify attributes, AWS SSO passes the attribute value of the authenticated user into AWS Identity and Access Management (IAM) for use in policy evaluation.

To create a permission set

  1. Open the AWS SSO console.
  2. Choose AWS accounts.
  3. Select the Permission sets tab.
  4. Choose Create permission set.
  5. On the Create new permission set page, choose Create a custom permission set.
    1. Choose Next: Details.
    2. Under Create a custom permission set, enter a name that will identify this permission set in AWS SSO. This name will also appear as an IAM role in the user portal for any users who have access to it. For this solution, name it myCustomPermissionSetEC2SSM.
    3. Choose Create a custom permissions policy and paste in the following ABAC policy document:
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AllowDescribeList",
            "Action": [
              "ec2:Describe*",
              "ssm:Describe*",
              "ssm:Get*",
              "ssm:List*",
              "iam:ListInstanceProfiles",
              "cloudwatch:DescribeAlarms"
            ],
            "Effect": "Allow",
            "Resource": "*"
          },
          {
            "Sid": "AllowRunInstancesResources",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
              "arn:aws:ec2:*::image/*",
              "arn:aws:ec2:*::snapshot/*",
              "arn:aws:ec2:*:*:subnet/*",
              "arn:aws:ec2:*:*:key-pair/*",
              "arn:aws:ec2:*:*:security-group/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ]
          },
          {
            "Sid": "AllowRunInstancesConditions",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
              "arn:aws:ec2:*:*:instance/*",
              "arn:aws:ec2:*:*:volume/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ],
            "Condition": {
              "StringLike": {
                "aws:RequestTag/Name": "*"
              },
              "StringEquals": {
                "aws:RequestTag/Owner": "${aws:PrincipalTag/login}",
                "aws:RequestTag/Department": "${aws:PrincipalTag/department}"
              },
              "ForAllValues:StringEquals": {
                "aws:TagKeys": [
                  "Name",
                  "Owner",
                  "Department"
                ]
              }
            }
          },
          {
            "Sid": "AllowCreateTagsOnRunInstance",
            "Effect": "Allow",
            "Action": "ec2:CreateTags",
            "Resource": [
              "arn:aws:ec2:*:*:volume/*",
              "arn:aws:ec2:*:*:instance/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ],
            "Condition": {
              "StringEquals": {
                "ec2:CreateAction": "RunInstances"
              }
            }
          },
          {
            "Sid": "AllowPassRoleSpecificRole",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/EC2UbuntuSSMRole"
          },
          {
            "Sid": "AllowEC2ActionsConditions",
            "Effect": "Allow",
            "Action": [
              "ec2:StartInstances",
              "ec2:StopInstances",
              "ec2:RebootInstances"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ec2:ResourceTag/Department": "${aws:PrincipalTag/department}"
              }
            }
          },
          {
            "Sid": "AllowTerminateConditions",
            "Effect": "Allow",
            "Action": [
              "ec2:TerminateInstances"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ec2:ResourceTag/Owner": "${aws:PrincipalTag/login}"
              }
            }
          },
          {
            "Sid": "AllowStartSessionConditions",
            "Effect": "Allow",
            "Action": [
              "ssm:StartSession"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ssm:resourceTag/Department": "${aws:PrincipalTag/department}"
              }
            }
          },
          {
            "Sid": "AllowTerminateSessionConditions",
            "Effect": "Allow",
            "Action": [
              "ssm:TerminateSession"
            ],
            "Resource": [
              "arn:aws:ssm:*:*:session/${aws:PrincipalTag/login}-*"
            ]
          }
        ]
      }
      

    4. Choose Next: Tags.
    5. Review the selections you made, and then choose Create.

The policy described above uses SAML session tags for the ABAC to define permissions based on attributes. These attributes are the tags passed in the AssumeRoleWithSAML operation when the SAML-based federation occurs.

A combination of global (aws:TagKeys, aws:PrincipalTag, aws:RequestTag) and service (ec2:ResourceTag, ec2:CreateAction, ssm:resourceTag) condition keys is used to assign the permissions.

To learn more about AWS global and service conditions keys, see AWS global condition context keys and The condition keys table for AWS services.

Assign users to an AWS account

In this step, you use the permission set created in the previous step to assign access to the users for a specified AWS account.

To assign access to users

  1. Open the AWS SSO console.
  2. Choose AWS accounts.
  3. Under the AWS organization tab, in the list of AWS accounts, select one or more accounts to which you want to assign access.
  4. Choose Assign users.
  5. On the Select users or groups page, select both test users from the list of users as shown in Figure 4.

    Note: You can use the search box to look for specific users.

    Figure 4: Select users to assign to AWS accounts

    Figure 4: Select users to assign to AWS accounts

  6. Choose Next: Permission sets.
  7. On the Select permission sets page, select the permission sets that you created in step 5 to apply to the users from the table as shown in Figure 5.

    Figure 5: Select permissions sets

    Figure 5: Select permissions sets

  8. Choose Finish to start the configuration of your AWS account. When configuration is complete, a message is displayed stating that you have successfully configured your AWS account as shown in Figure 6.

    Figure 6: Confirmation that configuration is complete

    Figure 6: Confirmation that configuration is complete

Test the solution

Now that you have everything in place, let’s test the solution. To test the solution, you’ll log in to AWS SSO, access the AWS account and check the event logs, and test the Amazon EC2 operations.

Log in to AWS SSO as Bob through your external IdP

Enter the user portal URL in a browser window and log in to AWS SSO as Bob. AWS SSO redirects to the external provider for the log in process. After successful authentication, the external provider redirects to the AWS SSO portal, which shows you a list of the AWS accounts that you have access to. In this case, Bob has access to one AWS account as shown in Figure 7.

Figure 7: AWS SSO showing AWS accounts that the user has access to

Figure 7: AWS SSO showing AWS accounts that the user has access to

Access the AWS account using the permission set and confirm the event logs

Select the Management console link for the AWS account that has the myCustomPermissionSetEC2SSM permission set that you created earlier. This action federates into the AWS account and is logged in to AWS CloudTrail with the API AssumeRoleWithSAML. To confirm that the SAML session tags are being passed in the session, look at the API event log in the CloudTrail Event history console. In the following example, you can check the principalTags keys and their values under requestParameters.

{
     "eventVersion": "1.08",
     "userIdentity": {
          "type": "SAMLUser",
          "principalId": "d/UbWH0ijLBmlakaboZwi5CA/30=:[email protected]",
          "userName": "[email protected]",
          "identityProvider": "d/UbWH0ijLBmlakaboZwi5CA/30="
},
     "eventTime": "2021-05-13T16:08:48Z",
     "eventSource": "sts.amazonaws.com",
     "eventName": "AssumeRoleWithSAML",
     ...
     "requestParameters": {
        "sAMLAssertionID": "_5072d119-64f5-4341-aeed-30d9b7c24b5b",
        "roleSessionName": "[email protected]",
        "principalTags": {
            "SSMSessionRunAs": "bob",
            "department": "blue",
            "login": "[email protected]"
        },
        "durationSeconds": 3600,
        "roleArn": "arn:aws:iam::555555555555:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea",
        "principalArn": "arn:aws:iam::555555555555:saml-provider/AWSSSO_5f872b6782a0507a_DO_NOT_DELETE"
    },
     "responseElements": {
     ...

Test EC2 operations

  1. Open the Amazon EC2 console:
    For this example, when opening the Amazon EC2 console there are already three running EC2 instances to test the ABAC policy that have been created with proper tags explained in the following step. From the top menu, you can also confirm the federated login AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea/[email protected] that represents the AWS SSO managed role and the user as shown in Figure 8.

    Figure 8: EC2 instances and user information

    Figure 8: EC2 instances and user information

  2. Launch a new EC2 instance:
    Start testing the ABAC policy by launching a new EC2 instance. This action is authorized only when you fill in the three required tags: Name, Owner, and Department.

    1. From the Amazon EC2 console, choose Launch Instances.
    2. Set the AMI, for this example select an Ubuntu-based OS.
    3. Set the Instance Type, a t2.micro will work.
    4. Configure the EC2 instance. Choose an IAM role to allow Systems Manager to manage the new EC2 instance. In this case, you have to create the IAM role EC2UbuntuSSMRole with the AWS managed policy AmazonEC2RoleforSSM attached in advanced with proper IAM permissions since the user Bob is not allow to do so. Then, you must use the user data to create the OS Ubuntu user—Bob—that you need to log in to the EC2 instance by using Session Manager. You can copy and paste the following to create the user “Bob”:#!/bin/bash
      sudo useradd -m bob
    5. Add storage using the default settings.
    6. Add tags. From the ABAC policy previously created, you can confirm that tag key Name can be anything as the condition StringLike is indicated with a wildcard (*). The tag keys Owner and Department have to match the principal session tags passed through federation. In this case, enter [email protected] as the key Owner, and enter blue as the Department, as shown in Figure 9.

      Figure 9: EC2 tags describing key value pairs

      Figure 9: EC2 tags describing key value pairs

    7. Configure security groups. When configuring security groups, you can choose an existing security group that doesn’t allow any inbound traffic to the SSH port. Since when using Session Manager you connect to the EC2 instance through an API that is going to be an outbound connection. This way you can safely leave the security group inbound rules close.
    8. Review and launch. It will ask you about selecting or creating a key pair. You don’t need one, because you’re using Session Manager. Proceed without selecting or creating a new SSH key pair. When launching the EC2 instance with the correct tag keys and values, you get the success message shown in Figure 10.
      Figure 10: EC2 success message launching an instance with the correct tags

      Figure 10: EC2 success message launching an instance with the correct tags

      If there are any missing tag keys or the values aren’t correct, the action will be denied as shown in Figure 11. For more information, you can decode the authorization error message using the API DecodeAuthorizationMessage.

      Figure 11: EC2 failed message launching an instance with incorrect tags

      Figure 11: EC2 failed message launching an instance with incorrect tags

  3. Stop, reboot, and terminate EC2 instances.
    The next tests are to be stop, reboot, and terminate the EC2 instances. In the ABAC policy you defined that only users who have the same department value as the resource can perform the first two actions. You can terminate and EC2 instance only if you are an owner. To stop, reboot, and terminate instances, open the EC2 Console, choose Instances, and select the instance you want to affect. Choose Instance state and choose the action you want to test: Stop instance, Reboot instance or Terminate instance.

    Trying to stop the EC2 instance amber-instance where Department is amber is shown in Figure 12.

    Figure 12: EC2 console showing how to stop an instance

    Figure 12: EC2 console showing how to stop an instance

    The action should fail as shown in Figure 13.

    Figure 13: EC2 instance failure message stopping an instance with wrong tags

    Figure 13: EC2 instance failure message stopping an instance with wrong tags

    Only when the department value of the EC2 instance is blue is it possible to stop or reboot the instance as shown in Figure 14.

    Figure 14: EC2 success message stopping an instance with correct tags

    Figure 14: EC2 success message stopping an instance with correct tags

    Only when the owner who launched the EC2 instance matches with the federated login is it possible to terminate the instance. Trying to terminate an EC2 instance that was launched by anyone other than the owner will lead to a failed action as shown in Figure 15.

    Figure 15: EC2 failed message terminating an instance with incorrect tags

    Figure 15: EC2 failed message terminating an instance with incorrect tags

  4. Try to modify tags. Because ABAC policies rely on tags, you cannot modify tags after the resources have been created. This is set in the ABAC policy statement AllowCreateTagsOnRunInstance in Create a new permission set using an ABAC policy. If you try to modify any tag keys or values on existing resources, the changes will be denied. For example, if you try to modify the owner of a tag on an existing EC2 instance, you get the “Failed to update tags” error message as shown in Figure 16.

    Figure 16: Failed message when attempting to modify tags

    Figure 16: Failed message when attempting to modify tags

  5. Connect to the EC2 instance using Session Manager.
    1. Test logging in to the EC2 instance by choosing the new instance and choosing Connect as shown in Figure 17.

      Figure 17: EC2 console selecting an instance to connect

      Figure 17: EC2 console selecting an instance to connect

    2.  Then choose the Session Manager tab and choose Connect as shown in Figure 18.
      Figure 18: EC2 console selecting Session Manager to connect

      Figure 18: EC2 console selecting Session Manager to connect

      This will open a new tab in the browser redirecting to a Systems Manager session where you can confirm that the Ubuntu OS user is Bob as shown in Figure 19.

      Figure 19: Systems Manager session started confirming Ubunto OS user

      Figure 19: Systems Manager session started confirming Ubunto OS user

      Note: By default, sessions are launched using the credentials of a system-generated account named ssm-user that is created on a managed instance. However, you can instead launch sessions using any OS user by enabling the run as feature in SSM. To learn more about this, see Enable run as support for Linux and macOS instances in the Systems Manager Session Manager user guide.

    3. Performing the same action in an EC2 instance with a different Department tag will lead to a denied action as shown in Figure 20. This is because the ABAC policy allows the StartSession action only when the Department key matches the Department value in the EC2 instance.

      Figure 20: Systems Manager StartSession failed message

      Figure 20: Systems Manager StartSession failed message

Conclusion

In this blog post, you learned how to use AWS SSO with the two methods of passing attributes to AWS account using session tags for ABAC. You also learned how to build policies with tags as conditions to simplify and reuse custom permission sets. You have seen working examples with services like EC2, and Systems Manager Session Manager. To learn more about ABAC policies, SAML session tags, and how to pass session tags in federation, see IAM tutorial: Use SAML session tags for ABAC and Passing session tags using AssumeRoleWithSAML.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Author

Rodrigo Ferroni

Rodrigo Ferroni is a senior Security Specialist at AWS Enterprise Support. He is certified in CISSP, AWS Security Specialist, and AWS Solutions Architect Associate. He enjoys helping customers to continue adopting AWS security services to improve their security posture in the cloud. Outside of work, he loves to travel as much as he can. In every winter he enjoys snowboarding with his friends.

Disabling Security Hub controls in a multi account environment

Post Syndicated from Priyank Ghedia original https://aws.amazon.com/blogs/security/disabling-security-hub-controls-in-a-multi-account-environment/

In this blog post, you’ll learn about an automated process for disabling or enabling selected AWS Security Hub controls across multiple accounts and multiple regions. You may already know how to disable Security Hub controls through the Security Hub console, or using the Security Hub update-standards-control API. However, these methods work on a per account and per region basis. If you want to tune your posture by disabling some controls across multiple accounts, this automated approach makes it easier.

The update Security Hub controls script provided in this post uses cross account access to disable or enable several controls across multiple accounts and multiple Regions on an ad hoc basis, without having to maintain resources in each Region. These accounts can be individual accounts where Security Hub is managed independently per account, part of a Security Hub administrator setup, or a combination of these. You will need to execute this script in newly added AWS accounts again to keep the controls status consistent for new accounts.

This blog also describes an alternative solution which automates updating Security Hub controls on a schedule for any new AWS accounts that are added, and keeps the control status in member accounts consistent with the controls status in the Security Hub administrator account. The alternate solution will only work if you use Security Hub administrator setup, and must be deployed on a per Region basis.

Why disable controls?

Security Hub generates findings from continuous checks against a set of rules from supported security standards. These checks provide a readiness score and identify specific accounts and resources that require attention.

It can be useful to turn off security checks for controls that are not relevant to your environment. Disabling irrelevant controls makes it easier to identify the important findings that you can take action on.

Reasons to disable controls

There are several reasons you may choose to disable controls:

  • Controls for unused services
  • Controls using global resources
  • Controls with compensating controls

Unused services

If you are not using the specific AWS service for which a control is enabled, you may want to disable controls that have periodic checks associated with that service.

Note: Periodic checks run automatically every 12 hours, whereas change-triggered checks run when the associated resource changes state. Change-triggered checks do not run if the associated in scope resource does not exist, so you do not have to disable controls that explicitly use change-triggered checks.

For example, SageMaker.1 is a control that has a periodic check and is part of the AWS Foundational Security Best Practices standard. If you do not use Amazon SageMaker in your environment, you can disable this check temporarily. If you disable an unused check and later start using the service in the future, you must enable the check again. You should perform regular analysis of services in use across your AWS Accounts.

Global resources

Global resources, such as AWS Identity and Access Management (IAM), generate findings in each region where you have Security Hub enabled. To focus only on meaningful findings, you can disable controls for global resources in all but one Region. You will have to align the desired region to be the same Region in AWS Config where global recording is enabled. A list of all such controls for each standard are as follows:

Compensating controls

You may want to disable a Security Hub control for which a compensating control is in place. For example, the AWS Foundational Security Best Practices control CloudTrail.5 checks to see if AWS CloudTrail is configured to send logs to Amazon CloudWatch logs. You can disable this check if you are sending CloudTrail to another destination, such as Amazon Simple Storage Service (Amazon S3) or partner tools such as a SIEM solution. Another example is CIS 3.x controls. GuardDuty can be considered a compensating control instead of CloudWatch alarms for CIS 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, and 3.14

Solution overview:

The update Security Hub controls script shown below is written in Python. It uses AWS SDK for Python (Boto3) to assume a cross account IAM role in the accounts passed in the script, and uses the update-standards-control Security Hub API to enable or disable controls. You must deploy this solution in a single AWS Region.

We will review the prerequisites in the next section to understand the requirements for successfully using this solution. We will then walk through the instructions for deploying this automation through an Amazon Elastic Compute Cloud (Amazon EC2) instance along with examples of how to use the script.

Note: You can also use your local machine (customer device) or AWS Cloudshell if you meet the prerequisites described below.

Prerequisites:

The update Security Hub controls script shown below requires:

  • An IAM role for cross-account access. You should have an IAM principal able to assume an IAM role in accounts where Security Hub controls needs to be disabled or enabled. This role should have securityhub:UpdateStandardsControl permission.

    Note: If you used the Security Hub multiaccount scripts from awslabs to enable Security Hub for multiple accounts you can use the role provided in the repository for the purpose of this blog as it has the necessary permissions.

  • AWS CLI
  • Python
  • Boto3

Update security hub controls script

Let’s review the arguments the script uses to run successfully. There are both required arguments and optional arguments:

Required arguments

–input-file
Path to a CSV file containing a list of the 12 digit AWS account IDs whose controls you want to disable or enable. If you are using Security Hub administrator account, use the list-members API to generate a list of all accounts where security hub is enabled. Store this list in a CSV file, one account ID per line.

–assume-role
Role name of the execution role in each account. This role should have the securityhub:UpdateStandardsControl permission. If you will be using the sample execution role CloudFormation provided in Part B—Execution role, the name of this role is ManageSecurityHubControlsExecutionRole.

–regions
Comma separated list of Regions to update SecurityHub controls. Do not add any spaces after each comma. Specify ALL to consider all Regions where Security Hub is available. If you list a Region where you have not enabled Security Hub, the script will skip this Region and log the failure.

–standard
Enter the standard code (AFSBP, CIS, PCIDSS).
AFSBP for AWS Foundational Security Best Practices
CIS for CIS AWS Foundations
PCIDSS for Payment Card Industry Data Security Standard (PCI DSS)

The script works with one Security Hub Standard at a time. For example, you can only disable AFSBP controls in the first run of the script. If you want to disable multiple controls across AFSBP and CIS standards you must run the script twice, once for AFSBP and once for CIS controls.

–control-id-list
Comma separated list of controls, example for CIS, enter – (1.1,1.2) or for PCIDSS enter – (PCI.AutoScaling.1,PCI.CloudTrail.4)

Do not add any spaces after each comma. Control IDs can be found from the Security Hub console: Security Standards > View results > ID column in the enabled controls table. You can also find the control IDs in the Security Hub controls documentation for each standard.

Figure 1: Sample Control IDs of PCI security standards shown in console

Figure 1: Sample Control IDs of PCI security standards shown in console

–control-action
For enabling controls use ENABLED; for disabling controls use DISABLED.

–disabled-reason
Reason for disabling the controls. This text will appear in the Security Hub console for the disabled controls, it is used to understand why the control is disabled for reviews in the future. If your script uses different reasons for disabling controls, you need to run the script multiple times, once for each disable reason. This argument is NOT required (or used) for enabling controls.

Optional arguments

-h, –help
shows the help message and exit

–profile
If this argument is not given, the default profile will be used. Use this argument to parse the named profile. The credentials in this profile should have permissions to assume the execution role in each account.

Solution deployment

We are using an Amazon EC2 instance with Amazon Linux 2 (AL2) to run this automation. We will create the instance role for the EC2 in Part A, create the execution role in Part B, and launch the EC2 instance to run the Update Security Hub controls script in Part C.

Note: For the purpose of this blog, we will be creating the EC2 instance and the instance role in the Security Hub administrator account, however this is not a requirement. If you have existing cross account roles with the necessary permissions, you can use these existing roles as well.

Part A–Security Hub controls manager EC2 role

This role should have permission to assume (sts:AssumeRole) the execution role described in Part B.

Figure 2: Amazon EC2 Instance role assumes cross account IAM role

Figure 2: Amazon EC2 Instance role assumes cross account IAM role

A sample policy for this role is provided below. This policy limits the Security Hub controls manager EC2 role to assume only a specific execution role in the downstream accounts:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/ManageSecurityHubControlsExecutionRole"
        }
    ]
}

To create the SHControlsManagerEC2 role

  1. Navigate to the IAM console of the Security Hub administrator account. You must have permission to launch an EC2 instance in this account.
  2. Create a new policy using JSON policy editor, and name it SHAssumeRolePolicy.
  3. Copy the sample policy shown above and paste it into your new policy in the JSON editor. The resource in the policy is the execution role, as will be described in Part B—Execution role below.

    Note: If you are using an existing role as the execution role, rather than creating a new role as shown in Steps 1-4, then you will need to change the resource section of the sample code in Figure 2 to use the ARN of the existing execution role.

  4. In the IAM console, Create an IAM role for EC2 and name it SHControlsManagerEC2. Assign the following policies:
    1. The SHAssumeRolePolicy policy you created in step 2.
    2. The AmazonSSMManagedInstanceCore managed policy for managing the instance using AWS Systems Manager.
    3. The AWSSecurityHubReadOnlyAccess managed policy for allowing read only access to Security Hub.

    Figure 3: IAM policies attached to Amazon EC2 Instance role

    Figure 3: IAM policies attached to Amazon EC2 Instance role

  5. When you are finished creating the role, note the Amazon Resource Name (ARN) – you will be using it in Part B. The ARN will look like Figure 4: arn:aws:iam::111111222222:role/SHControlsManagerEC2

    Figure 4: ARN of the Amazon EC2 instance role

    Figure 4: ARN of the Amazon EC2 instance role

Part B–Execution role

The execution role must be created in all accounts where you will be disabling or enabling controls. This role should have securityhub:UpdateStandardsControl permission. The execution role trust relationship should allow the Security Hub controls manager role to assume the execution role.

Important: The name of the execution role must be identical in each account (including the Security Hub administrator account), otherwise the script will NOT work.

If you are using AWS Organizations, you can use CloudFormation StackSets to deploy this execution role easily across all of your accounts. The stack set runs across the organization at the root or organizational units (OUs) level of your choice. The CloudFormation will use the ARN you noted from Security Hub controls manager EC2 role as a parameter.

Sample execution role policy

A sample policy for the execution role is provided below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "securityhub:UpdateStandardsControl",
            "Resource": "arn:aws:securityhub:*:*:hub/default"
        }
    ]
}

Sample execution role trust policy

Note: the ARN in the principal section below should be the ARN you noted in step 5 of Part A: Security Hub controls manager EC2 role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111222222:role/SHControlsManagerEC2"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To launch the stack set for creating cross account IAM Roles

  1. Login to the console of the Organizations management account or CloudFormation delegated administrator account.
  2. Select StackSet and choose Create StackSet
  3. Choose template is ready, and select template source as Amazon S3 URL
  4. Enter the following URL, then click next: https://awsiammedia.s3.amazonaws.com/public/sample/1083-disabling-security-hub-controls-multi-account/aws-securityhub-updatecontrols-executionrole.yml

    Figure 5: Create a AWS CloudFormation StackSet

    Figure 5: Create a AWS CloudFormation StackSet

  5. Enter a stack name, update description as needed and then under parameters enter the ARN of the Security Hub controls manager EC2 role used in step 5 of Part A–Security Hub controls manager EC2 role, then choose Next.

    Figure 6: AWS CloudFormation StackSet parameters for creating cross account IAM role

    Figure 6: AWS CloudFormation StackSet parameters for creating cross account IAM role

  6. (Optional) On the Configure StackSet options page, go to Tags and add tags to identify and organize your stack set.

    Figure 7: Optionally define tags for the stacks created using AWS CloudFormation StackSet

    Figure 7: Optionally define tags for the stacks created using AWS CloudFormation StackSet

  7. On the Set deployment options page, select the desired Region. Since the resource being created is IAM, you will only need to specify one Region. Choose Next.

    Figure 8: Specify target regions where the CloudFormation stacks need to be created

    Figure 8: Specify target regions where the CloudFormation stacks need to be created

  8. Review the definition and select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit.
  9. You can monitor the creation of the stack set from the Operations tab to ensure that deployment is successful.

Note: StackSets does not deploy stack instances to the organization’s management account, even if the management account is in your organization or in an OU in your organization. You will need to create the execution role manually in the organization management account.

Part C–Launch EC2

We will be using a t2.micro EC2 instance with Amazon Linux 2 (AL2) image which comes preinstalled with AWS CLI and Python. This instance size and image are free tier eligible.

To launch the EC2 instance for executing securityhub-updatecontrols-script

  1. Launch EC2 instance in the Security Hub administrator account in a single Region. You can use any Region.
  2. Attach the IAM role to the instance as created in part A.
  3. Confirm instance is in a running state.
  4. Login to the instance CLI using AWS Systems Manager Session Manager. Go to the EC2 console, select the instance and choose Connect.

    Figure 9: Access newly launched Amazon EC2 instance using session manager

    Figure 9: Access newly launched Amazon EC2 instance using session manager

    Note: You will need to allow outbound internet access to systems manager endpoints for session manager to work. See session manager documentation for more information.

  5. On the next page, choose Connect again.

    Figure 10: Connect to Amazon EC2 instance

    Figure 10: Connect to Amazon EC2 instance

  6. A new window with session manager will open.
  7. Set the working directory to home directory.
     cd /home/ssm-user
  8. Install Git:
     sudo yum install git
  9. Clone the UpdateSecurityHubControls code repository.
     git clone https://github.com/aws-samples/SecurityHub-Multiaccount-UpdateControls.git
  10. Start a virtual environment.
     python3 -m venv SecurityHub-Multiaccount-UpdateControls/env
     source SecurityHub-Multiaccount-UpdateControls/env/bin/activate
  11. Upgrade pip and Install boto3
     pip install pip --upgrade
     pip install boto3

You are now ready to run the script.

Example 1–Disabling Controls

Let’s assume you have Config global recording enabled in us-east-1 and want to disable controls associated with global resources (IAM.1, IAM.2, IAM.3, IAM.4, IAM.5, IAM.6, IAM.7) for AWS Foundational Security Best Practices standard in Regions us-west-1, us-east-2 in all accounts. The name of our execution role in all accounts is ManageSecurityHubControlsExecutionRole

  1. Change to the correct directory
    $ cd SecurityHub-Multiaccount-UpdateControls
  2. Create a file with the list of account IDs. This is passed with the input-file argument.
    $ nano accounts.csv
  3. Enter account IDs one account per line, then save the file.

    Figure 11: CSV file with AWS account IDs

    Figure 11: CSV file with AWS account IDs

  4. Run the script:
     python SH-UpdateControls.py \
    --input-file accounts.csv \
    --assume-role ManageSecurityHubControlsExecutionRole \
    --regions us-east-2,us-west-1 \
    --standard AFSBP \
    --control-id-list IAM.1,IAM.2,IAM.3,IAM.4,IAM.5,IAM.6,IAM.7 \
    --control-action DISABLED \
    --disabled-reason 'Disabling IAM checks in all regions except for us-east-1, as global recording is enabled in us-east-1'
    
  5. As the script runs, it generates a summary of disabled controls. If there are failures, the reason for failures will be listed in the summary. A log file is also saved in the directory with execution logs.

    Figure 12: Execution Summary showing successful updates and failure information

    Figure 12: Execution Summary showing successful updates and failure information

Example 2–Enabling Controls

Let’s say you have disabled Control IDs PCI.CodeBuild.1 and PCI.CodeBuild.2 from the PCI standard in your accounts, as in the past, you did not use AWS CodeBuild in your PCI environment.

After a recent architecture review, you decided to begin using CodeBuild in your accounts in us-west-1, eu-west-1, ap-southeast-1. The name of our execution role in all accounts is ManageSecurityHubControlsExecutionRole.

  1. Confirm you have completed all steps in part C.
  2. Change to the correct directory.
    $ cd SecurityHub-Multiaccount-UpdateControls
  3. Create a file with the list of account IDs. This will be passed with the input-file argument.
    $ nano accounts.csv
    Enter account IDs one account per line, then save the file.

    Figure 13: CSV file with AWS account IDs

    Figure 13: CSV file with AWS account IDs

  4. Run the script to enable controls
     python SH-UpdateControls.py \
    --input-file accounts.csv \
    --assume-role ManageSecurityHubControlsExecutionRole \
    --regions us-west-1,eu-west-1,ap-southeast-1 \
    --standard PCIDSS \
    --control-id-list PCI.CodeBuild.1,PCI.CodeBuild.2 \
    --control-action ENABLED \
    
  5. As the script runs, it generates a summary of enabled controls. If there are failures the reason for failures will be listed in the summary A log file is also saved in the directory with execution logs.

    Figure 14: Execution Summary showing controls enabled

    Figure 14: Execution Summary showing controls enabled

Alternative Solution

Consider a case where new member accounts are being added to the Security Hub administrator of the organization. In such a scenario, you would have to disable the controls manually, or by re-running the update Security Hub controls script in each new account.

Figure 15: High level architecture

Figure 15: High level architecture

This alternative solution addresses this problem by automating the process with AWS Step Functions and a scheduled EventBridge event. A state machine is triggered every 24 hours, which updates the status of the standard controls in all AWS accounts registered as members in the Security Hub administrator account. This solution maintains a list of account IDs within a DynamoDB table, for which exceptions can be made. For details on how to deploy and use this solution, refer to this GitHub repository. This alternative solution and the update SecurityHub controls script are independent of each other.

Conclusion

In this blog, you learned some of the reasons you might disable Security Hub controls. We showed how the controls can quickly be disabled or enabled using the update Security Hub controls script provided. This script will save you time if you want to update controls across multiple accounts and multiple Regions on an as needed basis.

We also introduced an alternative solution which uses AWS Step Functions and a scheduled CloudWatch event to update Security Hub controls in new accounts on a regular basis. You could use either of the two solutions depending on the scenario.

If you have feedback about this post, submit comments in the Comments section below. If you have trouble with the scripts, please open an issue in GitHub.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Priyank Ghedia

Priyank Ghedia

Priyank is a Solutions Architect focussed on Threat Detection and Incident Response. Priyank helps customers meet their security visibility and response objectives by building architectures using AWS security services and tools. Before AWS, he spent 8 years advising customers on global networking and security operations.

Praveen Haranahalli

Praveen Haranahalli

Praveen Haranahalli is a Solutions Architect at AWS, based in Orlando, Florida. Praveen has helped a diverse set of customers to design and operate a broad variety of workloads using AWS Services and has a keen interest in Security and Governance. Outside of work, he loves being outdoor with his family.

Ahmed Bakry

Ahmed Bakry

Ahmed Bakry is a Security Consultant at AWS Professional Services and based in Amsterdam. He obtained his master’s degree in Computer Science at the University of Twente and specialized in Cyber Security. And he did his bachelor degree in Networks Engineering at the German University in Cairo. His passion is developing secure and robust applications that drive success for his customers.

Michael Fuellbier

Michael Fuellbier

Michael Fuellbier is a Security Consultant at AWS Professional Services. His passion is designing and building secure and robust applications together with his customers. He studied mathematics and has a background in penetration testing.

How to configure an incoming email security gateway with Amazon WorkMail

Post Syndicated from Jesse Thompson original https://aws.amazon.com/blogs/security/how-to-configure-an-incoming-email-security-gateway-with-amazon-workmail/

This blog post will walk you through the steps needed to integrate Amazon WorkMail with an email security gateway. Configuring WorkMail this way can provide a versatile defense strategy for inbound email threats.

Amazon WorkMail is a secure, managed business email and calendar service. WorkMail leverages the email receiving capabilities of Amazon Simple Email Service (Amazon SES) to scan all incoming and outgoing email for spam, malware, and viruses to help protect your users from harmful email. AWS Lambda for Amazon WorkMail functions can tap into the capabilities of other AWS services to accomplish additional business objectives, such as controlling message delivery or message modification.

For many organizations, existing features and integrations with Amazon SES are sufficient for their spam, malware, and virus detection. Other organizations may need either a dedicated on-premise security solution, or have other reasons to use an additional inspection point in the overall mail flow. A number of commercial and community-supported tools include features like special encryption capabilities, data loss prevention (DLP) content inspection engines, and embedded-hyperlink transformation features to protect end-user mailboxes.

Prerequisites

To implement this solution, you need:

  • A domain name and permission to alter domain name system (DNS) records in Amazon Route 53 or your existing DNS provider. This could be your organization’s existing domain (such as example.org), a new domain (such as example.net), or a subdomain (such as sub.example.org).
  • Access to an AWS account so you can configure WorkMail and Amazon SES. Optionally, you may also need the ability to create AWS Lambda functions to integrate with WorkMail.
  • Access to configure the email security gateway of your choosing.

How email flows with an email security gateway

Email security gateways function by handling the initial ingress of email via the Simple Mail Transport Protocol (SMTP). When email servers send messages to your domain’s email addresses, they look at your domain’s mail exchange (MX) record in the DNS. After processing an email message, the email security gateway delivers it to the downstream mailbox hosting service, such as WorkMail, by means of Amazon SES via SMTP. You can also optionally configure an AWS Lambda for Amazon WorkMail function to synchronously deliver messages into end-user junk email folders, or to take other actions. 

Figure 1. Interaction points while architecting an email security gateway

Figure 1. Interaction points while architecting an email security gateway

The interaction points are as follows:

  1. The email sender looks up the mail exchange (MX) record for the domain hosted by WorkMail. The domain name system (DNS) domain may be hosted in Route 53, or by another DNS hosting provider. The value of the MX record contains the internet protocol (IP) address of the email security gateway.
  2. The email sender connects to the email security gateway, and sends the message using the Simple Mail Transfer Protocol (SMTP)
  3. The email security gateway accepts, processes, and then delivers the message to the ingress SMTP endpoint for WorkMail. Amazon Simple Email Service (Amazon SES) handles inbound email receiving for WorkMail.
  4. Optionally, an AWS Lambda for Amazon WorkMail function can synchronously process messages before delivery to WorkMail.
  5. WorkMail receives the message for final delivery to the end-user.

The gateway assumes responsibility for inspecting incoming email, because the initial point of ingress is an important component of a multi-layer defense strategy against email-borne threats. The gateway could refuse or quarantine risky messages, it could modify the email subjects and body to add warnings visible to recipients, or it could append metadata to the email headers for downstream processing by an AWS Lambda function.

Why point of ingress email authentication is important

What is email authentication

SMTP was built at a time when networking was less reliable than it is today, and consequently, it was designed to be able to allow any domain to store and later forward messages on behalf of other domains to mitigate connection problems. While that helped at the time, today it presents real problems in authenticating who truly sent a message: the owner of the domain, or just someone else claiming to be the owner? To overcome this issue, the messaging industry has adopted three protocols to help verify the authenticity of a message: SPF, DKIM, and DMARC. These protocols aren’t perfect, but understanding how to use them is important when adding new steps to your message processing workflow, because they can affect how you receive inbound mail.

Sender Policy Framework

Sender Policy Framework (SPF) permits domain owners to declare which SMTP servers are allowed to send email messages claiming to be from their domain. This establishes an identity relationship between the owner of the domain and the authorized party that controls the SMTP server. When SPF is used, a message can only be handed off directly from an authorized SMTP server; it cannot be relayed through a second, unauthorized server without changing the originating address.

DomainKeys Identified Mail (DKIM)

DomainKeys Identified Mail (DKIM) permits domain owners to advertise a public key that a mail recipient’s system can use to verify the sender’s digital signature. This allows SMTP servers and other downstream applications to check the validity of the digital signature against the public key of the domain which had the matching private key to create the signature. DKIM signatures attached to messages can remain intact through intermediary SMTP servers, but if message contents (email body or email headers) are modified by intermediary servers, the final destination will find that the signature is no longer valid.

Domain-based Message Authentication, Reporting and Conformance (DMARC)

Domain-based Message Authentication, Reporting and Conformance (DMARC) permits domain owners to publish a policy telling receiving servers what to do when SPF or DKIM are not valid, such as if a message originated from an unauthorized server, or if it was tampered with after being sent. DMARC checks if a message matches what it knows about the sender via SPF and DKIM, a process known as alignment. The recipient’s server can then enforce the DMARC policy, for example by rejecting or quarantining non-aligned messages.

Tying it all together

Amazon WorkMail normally performs DMARC enforcement for inbound messages, based on their alignment when they were received by Amazon SES. But when an email security gateway acts as an intermediary SMTP server between the original sender and WorkMail, that breaks the relationship with the SMTP servers authorized by SPF, and if the gateway modifies the message, that invalidates the DKIM signature. This is why it’s necessary for the SMTP server at the point of ingress to perform the evaluation of SPF, DKIM, and DMARC. The email security gateway at the border should be made responsible for enforcing DMARC on messages that don’t align, and WorkMail DMARC enforcement should be disabled.

Figure 2. Diagram of SPF policy enforcement process. The full details of the interaction points are outlined below.

Figure 2. Diagram of SPF policy enforcement process. The full details of the interaction points are outlined below.

The interaction points for SPF policy enforcement are as follows:

  1. The email sender delivers the message to the email security gateway with a MAIL FROM address within a domain the sender owns.
  2. The email security gateway looks up the sender’s domain’s Sender Permitted From (SPF) policy in DNS to determine if the sending mail server’s IP address is authorized.
  3. The email security gateway delivers the message to Amazon SES with the same MAIL FROM address. The email security gateway has a different IP address than the original sending email server.
  4. When Amazon SES looks up the MAIL FROM domain’s SPF, it will not find the email security gateway’s IP address as authorized. From the perspective of Amazon SES, and the resulting logs in Amazon Cloudwatch, the message will appear to be unauthorized by the SPF policy. This result is ignored by disabling DMARC checks in the WorkMail organization configuration.
  5. The message continues delivery to WorkMail with an optional integration with AWS Lambda for Amazon WorkMail synchronous run configuration, which can analyze message headers to get a more complete picture of the message’s authenticity.

Choosing an email security gateway

Many email security vendors offer software as a service (SaaS) solutions. This offloads all management responsibilities to the software vendor’s platform. These solutions work as long as they support the email gateway features necessary for this solution as depicted in Figure 2 and described in the Why point of ingress email authentication is important section above.

If you wish to build and maintain your own email security gateway, you may deploy one available from the AWS Marketplace or add an open source application into your Amazon Virtual Private Cloud (Amazon VPC). You will need to remove port 25 restriction from your EC2 instance for the email security gateway within your Amazon VPC to send email to Amazon WorkMail.

How to configure Amazon WorkMail

Follow this procedure to configure your WorkMail organization and Amazon SES IP address filters to allow the email security gateway to process inbound email receiving.

To configure Amazon WorkMail

  1. From the WorkMail console, select your organization, navigate to Organization settings, and select Advanced. Edit the Inbound DMARC Settings and set Enforcement enabled to Off. This ensures that WorkMail does not re-evaluate DMARC.

    Figure 3. Picture of the AWS WorkMail console showing the advanced configuration depicting Inbound DMARC enforcement disabled

    Figure 3. Picture of the AWS WorkMail console showing the advanced configuration depicting Inbound DMARC enforcement disabled

  2. From the Amazon SES console, navigate to Email receiving and create IP address filters to allow the IP address or IP address range of the gateway(s).
  3. Add another rule to block 0.0.0.0/0. This prevents malicious actors from bypassing your email security gateway.

    Figure 4. Picture of the Amazon SES console showing example IP address filters to allow the email security gateway IP addresses and blocking every other IP Address

    Figure 4. Picture of the Amazon SES console showing example IP address filters to allow the email security gateway IP addresses and blocking every other IP Address

  4. From the Route 53 console, navigate to Hosted zones, select the domain and edit the MX record to the IP address or hostname of the gateway. This causes all email senders to deliver to your gateway, instead of delivering directly to WorkMail.
    Follow the instructions of your DNS provider if the domain’s DNS is hosted elsewhere.

    Figure 5. Picture of the Amazon Route 53 console showing that the DMS MX record for the domain needs to be configured with the IP address of the email security gateway

    Figure 5. Picture of the Amazon Route 53 console showing that the DMS MX record for the domain needs to be configured with the IP address of the email security gateway

  5. From the WorkMail console, navigate to Domains, select your domain to show the Domain verification status page.
  6. Copy the host name from the value of the MX record type as depicted in Figure 6.

    Figure 6. Picture showing the section of the MX record value to copy from the WorkMail console.

    Figure 6. Picture showing the section of the MX record value to copy from the WorkMail console.

  7. Configure your email security gateway with the value that you just copied (e.g. inbound-smtp.us-east-1.amazonaws.com) to send inbound messages to your WorkMail organization. Instructions for doing this will vary depending on which email security gateway you are using.

Some specifics of this configuration depend on which gateway you are using, how it is designed for high availability, and the type of features configured. Test your WorkMail integration with a non-production domain before changing your production domain.

It is normal for Amazon CloudWatch logs for WorkMail, as well as the individual message headers, to show that SPF fails for all messages which traverse the gateway. Configure the email security gateway to record its SPF evaluation in the message headers so that it remains available for troubleshooting and further processing.

Junk E-Mail folder integration

WorkMail normally moves spam messages into the Junk E-mail folder within each recipient’s mailbox. To replicate this behavior for spam messages identified by the email security gateway, use AWS Lambda for Amazon WorkMail with a synchronous run configuration to run a function for every inbound message to a group of recipients.

To configure an AWS Lambda function for every inbound message (optional)

  1. Configure the email security gateway to include a spam verdict in the message headers for all incoming mail.
  2. Create a synchronous run configuration using AWS Lambda for Amazon WorkMail  to interpret the message headers and return a response to WorkMail with type: BYPASS_SPAM_CHECK or MOVE_TO_JUNK. A sample Amazon WorkMail Upstream Gateway Filter solution is available in the AWS Serverless Application Repository.

Conclusion

By integrating an email security gateway and leveraging AWS Lambda for Amazon WorkMail, you will gain additional security and management control over your organization’s inbound email. To learn more, read the Amazon WorkMail FAQs and Amazon WorkMail documentation.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Jesse Thompson

Jesse Thompson

Jesse is a worldwide public sector senior solutions architect at AWS. His background is in public sector enterprise IT development and operations, with a focus on abuse mitigation and encouragement of authenticity practices with open standard protocols.

Simplify setup of Amazon Detective with AWS Organizations

Post Syndicated from Karthik Ram original https://aws.amazon.com/blogs/security/simplify-setup-of-amazon-detective-with-aws-organizations/

Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities by collecting log data from your AWS resources. Amazon Detective simplifies the process of a deep dive into a security finding from other AWS security services, such as Amazon GuardDuty and AWS SecurityHub. Detective uses machine learning, statistical analysis, and graph theory to build a linked set of data that enables customers to easily conduct faster and more efficient security investigations.

In this post you will learn about the new AWS Organizations integration with Amazon Detective, where new and existing Detective customers can delegate any account in their organization to be the delegated Detective administrator account, and can centrally manage the Detective behavior graph database for an organization with up to 1,200 accounts.

Customers tell us that they want to manage security findings and investigations across multiple AWS Accounts. Depending on the customer this can be 100s or 1000s of accounts. AWS Organizations integration with security services, including GuardDuty, Security Hub and AWS IAM Access Analyzer comes in handy by helping customers centralize management and governance of their environments as they scale and grow their AWS accounts and resources. Adding to the list, Detective is now integrated with AWS Organizations to simplify security posture management across all existing and future AWS accounts across an organization. Organizations integration is available in all AWS Regions that Detective supports.

Detective is aware of your existing delegated administrator accounts for other AWS Security services such as GuardDuty or Security Hub. Using this awareness, Detective recommends that you choose the same account as the administrator account for Detective, as shown in Figure 1. For a more complete walk though of how to enable your accounts, visit the AWS Detective Documentation.

Figure 1. Setting delegated administrator

Figure 1. Setting delegated administrator

You can then use the same account to manage all of your security services. AWS recommends you align your Detective administrator account with your GuardDuty and SecurityHub administrator accounts, to enable seamless integration between Detective and those services.

  • In GuardDuty or Security Hub, when viewing details for a GuardDuty finding, you can pivot from the finding details to the Detective finding profile.
  • In Detective, when investigating a GuardDuty finding, you can choose the option to archive that finding.

Once designated, the chosen account becomes the administrator account for the organization behavior graph. They can enable any organization account as a member account in the organization behavior graph, and can configure Detective to automatically enable organization accounts when they join the organization.

Figure 2. Auto-enabling Organization accounts

Figure 2. Auto-enabling Organization accounts

The Detective administrator account can also manually invite other accounts to join the organization behavior graph.

Figure 3. Inviting accounts to join the Organization behavior graph

Figure 3. Inviting accounts to join the Organization behavior graph

From Detective, the administrator account can centrally conduct security investigations across the organization

Considerations for AWS Organizations support

Some considerations and recommendations around Organizations support for Detective:

  1. Detective allows up to 1,200 member accounts in each behavior graph.
  2. The Detective administrator account becomes the administrator account for the organization’s behavior graph.
  3. An account can be a member account of multiple behavior graphs in the same Region. An account can accept multiple invitations. An organization account can be enabled as a member account in the organization behavior graph, and can then also accept invitations to other behavior graphs.
  4. An account can only be the administrator account of one behavior graph per Region, but can be an administrator account in different Regions.
  5. Changes to an organization are not immediately reflected in Detective. For most changes, such as new and removed organization accounts, it can take up to an hour for Detective to be notified.

Other recent updates from Amazon Detective

Additional support for all GuardDuty findings

With the recent expansion of security investigation support for Amazon Simple Storage Service (S3) and DNS-related findings on Amazon GuardDuty, Amazon Detective now provides full coverage of all detections from GuardDuty. Security analysts can now easily investigate and analyze the root cause of all GuardDuty findings using Detective, using the Investigate in Detective option in GuardDuty and Security Hub for further investigation.

New resource focused view

In addition to these integrations with AWS Organization and GuardDuty, Detective now makes it even easier for a security analyst to investigate entities and behaviors using a revamped user experience as seen in Figure 4. Amazon Detective presents a unified view of user and resource interactions over time, with all the context and details in one place, to help you quickly analyze the root cause of a security finding.

Figure 4. New resource focused view

Figure 4. New resource focused view

New finding overview

The new finding overview provides an expanded set of details for each finding, and provides links to the profiles for each involved entity as seen in the right panel in Figure 4. With this unified view, you can visualize all of the details and context in one place, while identifying the underlying reasons for the findings. This resource-focused view helps you understand the connections between resources affected by a security finding, and further helps you drill down into relevant historical activity to quickly determine the root cause.

Integration with Splunk

Amazon Detective, in coordination with the Splunk Trumpet project, has released the ability to pivot from an Amazon GuardDuty finding in Splunk directly to an Amazon Detective entity profile. Customers can now quickly identify the root cause of potential security issues or suspicious activities. This setting can be enabled on the Splunk Trumpet project installation page by selecting Detective GuardDuty URLs from the AWS CloudWatch Events dropdown.

Amazon Detective’s interactive visualizations make it easy to investigate and analyze issues more thoroughly and effectively, with minimal effort. Using these visualizations, customers can easily filter large sets of event data into specific timelines, with all the details, context, and guidance needed to help you to investigate quickly. For example; Amazon Detective enables you to view login attempts on a geolocation map, drill down into relevant historical activity, quickly determine a root cause, and if necessary, take action to resolve the issue.

Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues. To get started, enable a 30-day free trial of Amazon Detective with just a few clicks in the AWS Management console. See the AWS Regions page for a list of all Regions where Detective is available. To learn more, visit the Amazon Detective product page.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Karthik Ram

Karthik Ram

Karthik is a Senior Solutions Architect with Amazon Web Services based in Columbus, Ohio. He has a background in Networking and Infrastructure architecture. At AWS, Karthik helps customers build secure and innovative cloud solutions, solving their business problems using data driven approaches. Karthik’s Area of Depth is Cloud Security with a focus on Threat Detection and Incident Response (TDIR).

Author

Ross Warren

Ross Warren is a Senior Solution Architect at AWS based in Northern Virginia. Prior to his work at AWS, Ross’ areas of focus included cyber threat hunting and security operations. Ross has worked at a handful of startups and has enjoyed the transition to AWS because he can continue to build solutions for customers on today’s most innovative platform.

Continuous runtime security monitoring with AWS Security Hub and Falco

Post Syndicated from Rajarshi Das original https://aws.amazon.com/blogs/security/continuous-runtime-security-monitoring-with-aws-security-hub-and-falco/

Customers want a single and comprehensive view of the security posture of their workloads. Runtime security event monitoring is important to building secure, operationally excellent, and reliable workloads, especially in environments that run containers and container orchestration platforms. In this blog post, we show you how to use services such as AWS Security Hub and Falco, a Cloud Native Computing Foundation project, to build a continuous runtime security monitoring solution.

With the solution in place, you can collect runtime security findings from multiple AWS accounts running one or more workloads on AWS container orchestration platforms, such as Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). The solution collates the findings across those accounts into a designated account where you can view the security posture across accounts and workloads.

 

Solution overview

Security Hub collects security findings from other AWS services using a standardized AWS Security Findings Format (ASFF). Falco provides the ability to detect security events at runtime for containers. Partner integrations like Falco are also available on Security Hub and use ASFF. Security Hub provides a custom integrations feature using ASFF to enable collection and aggregation of findings that are generated by custom security products.

The solution in this blog post uses AWS FireLens, Amazon CloudWatch Logs, and AWS Lambda to enrich logs from Falco and populate Security Hub.

Figure : Architecture diagram of continuous runtime security monitoring

Figure 1: Architecture diagram of continuous runtime security monitoring

Here’s how the solution works, as shown in Figure 1:

  1. An AWS account is running a workload on Amazon EKS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs act as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon EKS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as a member account for Security Hub.
  2. Another AWS account is running a workload on Amazon ECS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs acts as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon ECS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as another member account for Security Hub.
  3. The designated Security Hub administrator account combines the findings generated by the two member accounts, and then provides a comprehensive view of security alerts and security posture across AWS accounts. If your workloads span multiple regions, Security Hub supports aggregating findings across Regions.

 

Prerequisites

For this walkthrough, you should have the following in place:

  1. Three AWS accounts.

    Note: We recommend three accounts so you can experience Security Hub’s support for a multi-account setup. However, you can use a single AWS account instead to host the Amazon ECS and Amazon EKS workloads, and send findings to Security Hub in the same account. If you are using a single account, skip the following account specific-guidance. If you are integrated with AWS Organizations, the designated Security Hub administrator account will automatically have access to the member accounts.

  2. Security Hub set up with an administrator account on one account.
  3. Security Hub set up with member accounts on two accounts: one account to host the Amazon EKS workload, and one account to host the Amazon ECS workload.
  4. Falco set up on the Amazon EKS and Amazon ECS clusters, with logs routed to CloudWatch Logs using FireLens. For instructions on how to do this, see:

    Important: Take note of the names of the CloudWatch Logs groups, as you will need them in the next section.

  5. AWS Cloud Development Kit (CDK) installed on the member accounts to deploy the solution that provides the custom integration between Falco and Security Hub.

 

Deploying the solution

In this section, you will learn how to deploy the solution and enable the CloudWatch Logs group. Enabling the CloudWatch Logs group is the trigger for running the Lambda function in both member accounts.

To deploy this solution in your own account

  1. Clone the aws-securityhub-falco-ecs-eks-integration GitHub repository by running the following command.
    $git clone https://github.com/aws-samples/aws-securityhub-falco-ecs-eks-integration
  2. Follow the instructions in the README file provided on GitHub to build and deploy the solution. Make sure that you deploy the solution to the accounts hosting the Amazon EKS and Amazon ECS clusters.
  3. Navigate to the AWS Lambda console and confirm that you see the newly created Lambda function. You will use this function in the next section.
Figure : Lambda function for Falco integration with Security Hub

Figure 2: Lambda function for Falco integration with Security Hub

To enable the CloudWatch Logs group

  1. In the AWS Management Console, select the Lambda function shown in Figure 2—AwsSecurityhubFalcoEcsEksln-lambdafunction—and then, on the Function overview screen, select + Add trigger.
  2. On the Add trigger screen, provide the following information and then select Add, as shown in Figure 3.
    • Trigger configuration – From the drop-down, select CloudWatch logs.
    • Log group – Choose the Log group you noted in Step 4 of the Prerequisites. In our setup, the log group for the Amazon ECS and Amazon EKS clusters, deployed in separate AWS accounts, was set with the same value (falco).
    • Filter name – Provide a name for the filter. In our example, we used the name falco.
    • Filter pattern – optional – Leave this field blank.
    Figure 3: Lambda function trigger - CloudWatch Log group

    Figure 3: Lambda function trigger – CloudWatch Log group

  3. Repeat these steps (as applicable) to set up the trigger for the Lambda function deployed in other accounts.

 

Testing the deployment

Now that you’ve deployed the solution, you will verify that it’s working.

With the default rules, Falco generates alerts for activities such as:

  • An attempt to write to a file below the /etc folder. The /etc folder contains important system configuration files.
  • An attempt to open a sensitive file (such as /etc/shadow) for reading.

To test your deployment, you will attempt to perform these activities to generate Falco alerts that are reported as Security Hub findings in the same account. Then you will review the findings.

To test the deployment in member account 1

  1. Run the following commands to trigger an alert in member account 1, which is running an Amazon EKS cluster. Replace <container_name> with your own value.
    kubectl exec -it <container_name> /bin/bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. To see the list of findings, log in to your Security Hub admin account and navigate to Security Hub > Findings. As shown in Figure 4, you will see the alerts generated by Falco, including the Falco-generated title, and the instance where the alert was triggered.

    Figure 4: Findings in Security Hub

    Figure 4: Findings in Security Hub

  3. To see more detail about a finding, check the box next to the finding. Figure 5 shows some of the details for the finding Read sensitive file untrusted.
    Figure 5: Sensitive file read finding - detail view

    Figure 5: Sensitive file read finding – detail view

    Figure 6 shows the Resources section of this finding, that includes the instance ID of the Amazon EKS cluster node. In our example this is the Amazon Elastic Compute Cloud (Amazon EC2) instance.

    Figure 6: Resource Detail in Security Hub finding

To test the deployment in member account 2

  1. Run the following commands to trigger a Falco alert in member account 2, which is running an Amazon ECS cluster. Replace <<container_id> with your own value.
    docker exec -it <container_id> bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. As in the preceding example with member account 1, to view the findings related to this alert, navigate to your Security Hub admin account and select Findings.

To view the collated findings from both member accounts in Security Hub

  1. In the designated Security Hub administrator account, navigate to Security Hub > Findings. The findings from both member accounts are collated in the designated Security Hub administrator account. You can use this centralized account to view the security posture across accounts and workloads. Figure 7 shows two findings, one from each member account, viewable in the Single Pane of Glass administrator account.

    Figure 7: Write below /etc findings in a single view

    Figure 7: Write below /etc findings in a single view

  2. To see more information and a link to the corresponding member account where the finding was generated, check the box next to the finding. Figure 8 shows the account detail associated with a specific finding in member account 1.
    Figure 8: Write under /etc detail view in Security Hub admin account

    Figure 8: Write under /etc detail view in Security Hub admin account

    By centralizing and enriching the findings from Falco, you can take action more quickly or perform automated remediation on the impacted resources.

 

Cleaning up

To clean up this demo:

  1. Delete the CloudWatch Logs trigger from the Lambda functions that were created in the section To enable the CloudWatch Logs group.
  2. Delete the Lambda functions by deleting the CloudFormation stack, created in the section To deploy this solution in your own account.
  3. Delete the Amazon EKS and Amazon ECS clusters created as part of the Prerequisites.

 

Conclusion

In this post, you learned how to achieve multi-account continuous runtime security monitoring for container-based workloads running on Amazon EKS and Amazon ECS. This is achieved by creating a custom integration between Falco and Security Hub.

You can extend this solution in a number of ways. For example:

  • You can forward findings across accounts using a single source to security information and event management (SIEM) tools such as Splunk.
  • You can perform automated remediation activities based on the findings generated, using Lambda.

To learn more about managing a centralized Security Hub administrator account, see Managing administrator and member accounts. To learn more about working with ASFF, see AWS Security Finding Format (ASFF) in the documentation. To learn more about the Falco engine and rule structure, see the Falco documentation.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rajarshi Das

Rajarshi Das

Rajarshi is a Solutions Architect at Amazon Web Services. He focuses on helping Public Sector customers accelerate their security and compliance certifications and authorizations by architecting secure and scalable solutions. Rajarshi holds 4 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Author

Adam Cerini

Adam is a Senior Solutions Architect with Amazon Web Services. He focuses on helping Public Sector customers architect scalable, secure, and cost effective systems. Adam holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Unify log aggregation and analytics across compute platforms

Post Syndicated from Hari Ohm Prasath original https://aws.amazon.com/blogs/big-data/unify-log-aggregation-and-analytics-across-compute-platforms/

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we walk you through an automated process to aggregate and monitor logging-application data in near-real time, so you can remediate application issues faster.

This post shows how to unify and centralize logs across different computing platforms. With this solution, you can unify logs from Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Kinesis Data Firehose, and AWS Lambda using agents, log routers, and extensions. We use Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) with OpenSearch Dashboards to visualize and analyze the logs, collected across different computing platforms to get application insights. You can deploy the solution using the AWS Cloud Development Kit (AWS CDK) scripts provided as part of the solution.

Customer benefits

A unified aggregated log system provides the following benefits:

  • A single point of access to all the logs across different computing platforms
  • Help defining and standardizing the transformations of logs before they get delivered to downstream systems like Amazon Simple Storage Service (Amazon S3), Amazon OpenSearch Service, Amazon Redshift, and other services
  • The ability to use Amazon OpenSearch Service to quickly index, and OpenSearch Dashboards to search and visualize logs from its routers, applications, and other devices

Solution overview

In this post, we use the following services to demonstrate log aggregation across different compute platforms:

  • Amazon EC2 – A web service that provides secure, resizable compute capacity in the cloud. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon ECS – A web service that makes it easy to run, scale, and manage Docker containers on AWS, designed to make the Docker experience easier for developers.
  • Amazon EKS – A web service that makes it easy to run, scale, and manage Docker containers on AWS.
  • Kinesis Data Firehose – A fully managed service that makes it easy to stream data to Amazon S3, Amazon Redshift, or Amazon OpenSearch Service.
  • Lambda – A compute service that lets you run code without provisioning or managing servers. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon OpenSearch Service – A fully managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more.

The following diagram shows the architecture of our solution.

The architecture uses various log aggregation tools such as log agents, log routers, and Lambda extensions to collect logs from multiple compute platforms and deliver them to Kinesis Data Firehose. Kinesis Data Firehose streams the logs to Amazon OpenSearch Service. Log records that fail to get persisted in Amazon OpenSearch service will get written to AWS S3. To scale this architecture, each of these compute platforms streams the logs to a different Firehose delivery stream, added as a separate index, and rotated every 24 hours.

The following sections demonstrate how the solution is implemented on each of these computing platforms.

Amazon EC2

The Kinesis agent collects and streams logs from the applications running on EC2 instances to Kinesis Data Firehose. The agent is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose. The agent continuously monitors files and sends logs to the Firehose delivery stream.

BDB-1742-Ec2

The AWS CDK script provided as part of this solution deploys a simple PHP application that generates logs under the /etc/httpd/logs directory on the EC2 instance. The Kinesis agent is configured via /etc/aws-kinesis/agent.json to collect data from access_logs and error_logs, and stream them periodically to Kinesis Data Firehose (ec2-logs-delivery-stream).

Because Amazon OpenSearch Service expects data in JSON format, you can add a call to a Lambda function to transform the log data to JSON format within Kinesis Data Firehose before streaming to Amazon OpenSearch Service. The following is a sample input for the data transformer:

46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] "GET / HTTP/1.1" 200 173 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

The following is our output:

{
    "logs" : "46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] \"GET / HTTP/1.1\" 200 173 \"-\" \"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36\"",
}

We can enhance the Lambda function to extract the timestamp, HTTP, and browser information from the log data, and store them as separate attributes in the JSON document.

Amazon ECS

In the case of Amazon ECS, we use FireLens to send logs directly to Kinesis Data Firehose. FireLens is a container log router for Amazon ECS and AWS Fargate that gives you the extensibility to use the breadth of services at AWS or partner solutions for log analytics and storage.

BDB-1742-ECS

The architecture hosts FireLens as a sidecar, which collects logs from the main container running an httpd application and sends them to Kinesis Data Firehose and streams to Amazon OpenSearch Service. The AWS CDK script provided as part of this solution deploys a httpd container hosted behind an Application Load Balancer. The httpd logs are pushed to Kinesis Data Firehose (ecs-logs-delivery-stream) through the FireLens log router.

Amazon EKS

With the recent announcement of Fluent Bit support for Amazon EKS, you no longer need to run a sidecar to route container logs from Amazon EKS pods running on Fargate. With the new built-in logging support, you can select a destination of your choice to send the records to. Amazon EKS on Fargate uses a version of Fluent Bit for AWS, an upstream conformant distribution of Fluent Bit managed by AWS.

BDB-1742-EKS

The AWS CDK script provided as part of this solution deploys an NGINX container hosted behind an internal Application Load Balancer. The NGINX container logs are pushed to Kinesis Data Firehose (eks-logs-delivery-stream) through the Fluent Bit plugin.

Lambda

For Lambda functions, you can send logs directly to Kinesis Data Firehose using the Lambda extension. You can deny the records being written to Amazon CloudWatch.

BDB-1742-Lambda

After deployment, the workflow is as follows:

  1. On startup, the extension subscribes to receive logs for the platform and function events. A local HTTP server is started inside the external extension, which receives the logs.
  2. The extension buffers the log events in a synchronized queue and writes them to Kinesis Data Firehose via PUT records.
  3. The logs are sent to downstream systems.
  4. The logs are sent to Amazon OpenSearch Service.

The Firehose delivery stream name gets specified as an environment variable (AWS_KINESIS_STREAM_NAME).

For this solution, because we’re only focusing on collecting the run logs of the Lambda function, the data transformer of the Kinesis Data Firehose delivery stream filters out the records of type function ("type":"function") before sending it to Amazon OpenSearch Service.

The following is a sample input for the data transformer:

[
   {
      "time":"2021-07-29T19:54:08.949Z",
      "type":"platform.start",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "version":"$LATEST"
      }
   },
   {
      "time":"2021-07-29T19:54:09.094Z",
      "type":"platform.logsSubscription",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Subscribed",
         "types":[
            "platform",
            "function"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.094Z\tundefined\tINFO\tLoading function\n"
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"platform.extension",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Ready",
         "events":[
            "INVOKE",
            "SHUTDOWN"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.097Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.097Z\t024ae572-72c7-44e0-90f5-3f002a1df3f2\tINFO\tvalue1 = value1\n"
   },   
   {
      "time":"2021-07-29T19:54:09.098Z",
      "type":"platform.runtimeDone",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "status":"success"
      }
   }
]

Prerequisites

To implement this solution, you need the following prerequisites:

Build the code

Check out the AWS CDK code by running the following command:

mkdir unified-logs && cd unified-logs
git clone https://github.com/aws-samples/unified-log-aggregation-and-analytics .

Build the lambda extension by running the following command:

cd lib/computes/lambda/extensions
chmod +x extension.sh
./extension.sh
cd ../../../../

Make sure to replace default AWS region specified under the value of firehose.endpoint attribute inside lib/computes/ec2/ec2-startup.sh.

Build the code by running the following command:

yarn install && npm run build

Deploy the code

If you’re running AWS CDK for the first time, run the following command to bootstrap the AWS CDK environment (provide your AWS account ID and AWS Region):

cdk bootstrap \
    --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
    aws://<AWS Account Id>/<AWS_REGION>

You only need to bootstrap the AWS CDK one time (skip this step if you have already done this).

Run the following command to deploy the code:

cdk deploy --requires-approval

You get the following output:

 ✅  CdkUnifiedLogStack

Outputs:
CdkUnifiedLogStack.ec2ipaddress = xx.xx.xx.xx
CdkUnifiedLogStack.ecsloadbalancerurl = CdkUn-ecsse-PY4D8DVQLK5H-xxxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceLoadBalancerDNS570CB744 = CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceServiceURL88A7B1EE = http://CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.eksclusterClusterNameCE21A0DB = ekscluster92983EFB-d29892f99efc4419bc08534a3d253160
CdkUnifiedLogStack.eksclusterConfigCommand515C0544 = aws eks update-kubeconfig --name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.eksclusterGetTokenCommand3C33A2A5 = aws eks get-token --cluster-name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.elasticdomainarn = arn:aws:es:us-east-1:xxx:domain/cdkunif-elasti-rkiuv6bc52rp
CdkUnifiedLogStack.s3bucketname = cdkunifiedlogstack-logsfailederrcapturebucket0bcc-xxxxx
CdkUnifiedLogStack.samplelambdafunction = CdkUnifiedLogStack-LambdatransformerfunctionFA3659-c8u392491FrW

Stack ARN:
arn:aws:cloudformation:us-east-1:xxxx:stack/CdkUnifiedLogStack/6d53ef40-efd2-11eb-9a9d-1230a5204572

AWS CDK takes care of building the required infrastructure, deploying the sample application, and collecting logs from different sources to Amazon OpenSearch Service.

The following is some of the key information about the stack:

  • ec2ipaddress – The public IP address of the EC2 instance, deployed with the sample PHP application
  • ecsloadbalancerurl – The URL of the Amazon ECS Load Balancer, deployed with the httpd application
  • eksclusterClusterNameCE21A0DB – The Amazon EKS cluster name, deployed with the NGINX application
  • samplelambdafunction – The sample Lambda function using the Lambda extension to send logs to Kinesis Data Firehose
  • opensearch-domain-arn – The ARN of the Amazon OpenSearch Service domain

Generate logs

To visualize the logs, you first need to generate some sample logs.

  1. To generate Lambda logs, invoke the function using the following AWS CLI command (run it a few times):
aws lambda invoke \
--function-name "<<samplelambdafunction>>" \
--payload '{"payload": "hello"}' /tmp/invoke-result \
--cli-binary-format raw-in-base64-out \
--log-type Tail

Make sure to replace samplelambdafunction with the actual Lambda function name. The file path needs to be updated based on the underlying operating system.

The function should return "StatusCode": 200, with the following output:

{
    "StatusCode": 200,
    "LogResult": "<<Encoded>>",
    "ExecutedVersion": "$LATEST"
}
  1. Run the following command a couple of times to generate Amazon EC2 logs:
curl http://ec2ipaddress:80

Make sure to replace ec2ipaddress with the public IP address of the EC2 instance.

  1. Run the following command a couple of times to generate Amazon ECS logs:
curl http://ecsloadbalancerurl:80

Make sure to replace ecsloadbalancerurl with the public ARN of the AWS Application Load Balancer.

We deployed the NGINX application with an internal load balancer, so the load balancer hits the health checkpoint of the application, which is sufficient to generate the Amazon EKS access logs.

Visualize the logs

To visualize the logs, complete the following steps:

  1. On the Amazon OpenSearch Service console, choose the hyperlink provided for the OpenSearch Dashboard 7URL.
  2. Configure access to the OpenSearch Dashboard.
  3. Under OpenSearch Dashboard, on the Discover menu, start creating a new index pattern for each compute log.

We can see separate indexes for each compute log partitioned by date, as in the following screenshot.

BDB-1742-create-index

The following screenshot shows the process to create index patterns for Amazon EC2 logs.

BDB-1742-ec2

After you create the index pattern, we can start analyzing the logs using the Discover menu under OpenSearch Dashboard in the navigation pane. This tool provides a single searchable and unified interface for all the records with various compute platforms. We can switch between different logs using the Change index pattern submenu.

BDB-1742-unified

Clean up

Run the following command from the root directory to delete the stack:

cdk destroy

Conclusion

In this post, we showed how to unify and centralize logs across different compute platforms using Kinesis Data Firehose and Amazon OpenSearch Service. This approach allows you to analyze logs quickly and the root cause of failures, using a single platform rather than different platforms for different services.

If you have feedback about this post, submit your comments in the comments section.

Resources

For more information, see the following resources:


About the author

HariHari Ohm Prasath is a Senior Modernization Architect at AWS, helping customers with their modernization journey to become cloud native. Hari loves to code and actively contributes to the open source initiatives. You can find him in Medium, Github & Twitter @hariohmprasath.

balluBallu Singh is a Principal Solutions Architect at AWS. He lives in the San Francisco Bay area and helps customers architect and optimize applications on AWS. In his spare time, he enjoys reading and spending time with his family.

How to customize behavior of AWS Managed Rules for AWS WAF

Post Syndicated from Madhu Kondur original https://aws.amazon.com/blogs/security/how-to-customize-behavior-of-aws-managed-rules-for-aws-waf/

AWS Managed Rules for AWS WAF provides a group of rules created by AWS that can be used help protect you against common application vulnerabilities and other unwanted access to your systems without having to write your own rules. AWS Threat Research Team updates AWS Managed Rules to respond to an ever-changing threat landscape in order to protect your applications.

Recently, AWS WAF launched four new features that are centered on rule customization:

  • Labels – Metadata that can be added to web requests when a rule is matched. Labels can be used to alter the behavior or default action of managed rules.
  • Version management – You can select a specific version of a managed rule group. Versioning can be used to return to previously tested versions.
  • Scope-down statements – Use to narrow the scope of the requests that a rule group evaluates.
  • Custom responses – Send a custom HTTP response back to the client from AWS WAF when a rule blocks a connection request.

In this blog, we go through four use cases to demonstrate how you can use these features to improve your security posture by customizing managed rules.

Case 1: Control automatic updates for a managed rule group by selecting a specific version

By default, managed rule groups are updated automatically as updates become available. This ensures you have the latest protection as soon as it’s available. With the version management feature, you can choose to stay on a specific version, meaning that it won’t update until you explicitly move to a newer version. This allows you to test a new version and promote it to your web ACL when you’re ready, and to return to a previously tested version if necessary.

Note: It’s recommended that you use a version as close as possible to the latest.

To select a managed rule group version

  1. In your AWS WAF console, navigate to the web ACL where you’ve added a managed rule group.
  2. Select the managed rule group whose version you want to set, and choose Edit.
  3. In the Version selection drop down, select the version you want to use. You’ll remain on this version until the version expires or you select another version—you’ll learn how to manage version expiration later in this post.

Note: If you want to receive updates automatically, select Default as the version.

  1. Choose Save Rule to save the configuration.

Figure 1: Console screenshot showing the AWS Managed Rules version drop downFigure 1: Console screenshot showing the AWS Managed Rules version drop down

Set up notifications

You can use Amazon Simple Notification Service (Amazon SNS) to get notifications of updates to a managed rule group. You can subscribe to the SNS topic using the ARN of the managed rule group. Every SNS notification for AWS Managed Rules updates uses the same message format, which enables you to consume these updates programmatically. For more details on the SNS notification message format, see Getting notified of new versions and updates to a managed rule group.

To set up email notifications on new rule updates through Amazon SNS

  1. In your AWS WAF console, navigate to the web ACL where you added the managed rule group.
  2. Select the managed rule group that you want to receive notifications for, and choose Edit.
  3. On the Core rule set page, look for the Amazon SNS topic ARN. Select the link to go to the Amazon SNS console. Make a note of the topic ARN to use in step 4.

Figure 2: Console screenshot highlighting the SNS topic ARNFigure 2: Console screenshot highlighting the SNS topic ARN

  1. On the Create subscription page, enter the following information:
    Topic ARN: Enter the SNS topic ARN from step 3.
    Protocol: Select Email.
    Endpoint: Enter the email address where you want notifications sent.

Figure 3: SNS Create subscription console screenshotFigure 3: SNS Create subscription console screenshot

  1. Choose Create subscription.
  2. Watch for a confirmation email from Amazon SNS. Choose the confirm subscription link in the email to complete the subscription.

Set up a version expiration alert using a CloudWatch alarm

When you stay on a specific version of managed rule group for a long time, there is a risk that you may miss important updates. To ensure you do not stay on a stale version for long time, you should set up an alarm to alert you when a version is close to expiring. When a version expires, the managed rule group automatically switches to the default version. To be notified when a version is about to expire, set up an alert using an Amazon CloudWatch alarm based on DaysToExpiry. You can use the following procedure to set up a notification 60 days before a specific version of the rule set you’re using expires.

To set up a CloudWatch alarm

This will notify you 60 days before a specific version of a rule set expires

  1. Navigate to the CloudWatch console.
  2. Select All metrics from the left navigation pane, and then select WAFV2 from the list of namespaces.
  3. Choose ManagedRuleGroup, Region, Vendor, Version.
  4. Select the managed rule group whose expiration you want to monitor. This example uses AWSManagedRulesCommonRuleSet and Version_1.0.
  5. Select Graphed metric and select the bell alarm icon on the lower right, under Actions. Selecting this icon will take you to the CloudWatch alarms console.

Figure 4: CloudWatch Graphed metrics tabFigure 4: CloudWatch Graphed metrics tab

  1. Configure the CloudWatch alarm with the following details, and then choose Next:
    Statistic: Select Minimum
    Period: Select 5 minutes
    Threshold Type: Select Static
    Operator: Select Lower/Equal (<=threshold)
    Threshold: Enter the value as 60
    Datapoints to alarm: Enter the lower value as 1 and higher value as 1
    Missing data treatment: Select Treat missing data as good (not breaching threshold)
  2. Select the SNS topic that you want to be launched when the configured alarm goes to ALARM state and choose Next.
  3. Enter a name and description for the Alarm. Choose Next to preview the configuration and choose Create Alarm to complete the CloudWatch alarm creation process.

Additional tips

  • If the version of a managed rule group that you’re using has expired, AWS WAF will prevent any configuration change to the web ACL until you select a valid version. You should move onto the newest version as soon as possible so you are covered against the latest threats.
  • You will only receive the DaysToExpiry metric when there is traffic flowing through your web ACL.
  • You can use two different versions of a managed rule group in a web ACL. This can be useful if you want to test two different versions simultaneously to see how they will affect your traffic once deployed—for example, have one version in count mode and the other in block mode.

Note: This workflow is supported through the JSON rule editor and API, but not through the console.

Case 2: Use labels to mitigate false positives caused by a rule in a managed rule group

A label is metadata that a rule can add to matching web requests, regardless of the action associated with the rule. The latest version of AWS Managed Rules supports labels. By creating custom rules that match requests that have labels, you can change the behavior or default action of rules inside a managed rule group.

For example, if you have a rule that’s causing a false positive in a managed rule group, you can mitigate it by overriding the managed rule to Count and writing a custom rule with logic similar to the following:

IF (Statement 1) AND NOT (Statement 2) THEN Block
Statement 1 matches on the label generated from the rule causing a false positive.
Statement 2 contains exception conditions for when you don’t want the rule to evaluate because it’s causing false positives.

Consider a scenario where redirection requests to your application are blocked due to the rule GenericRFI_QUERYARGUMENTS in the managed rule group you’re using. This rule inspects the value of all query parameters and blocks requests that attempt to exploit remote file inclusion (RFI) in web applications, such as :// embedded midway through a URL. An example of a legitimate redirection request that could be blocked due to the characters :// present in the query argument scope could be as follows:

https://ourdomain.com/sso/complete?scope=email profile https://www.redirect-domain.com/auth/email https://www.redirect-domain.com/auth/profile

To prevent similar legitimate requests from being blocked, you can write a custom rule to match based on the label.

Step 1: Set the specific managed rule group to count mode

The first step is to set the specific managed rule to count mode, so that labels are added to the matching requests. Next, the priority of the managed rule must be set higher than the priority of the custom rule.

To set the specific managed rule group to count mode

  1. In your AWS WAF console, navigate to your web ACL and select the Rules tab. Choose Add Rule, and then select Add managed rule groups.
  2. Select AWS managed rule groups.
  3. Under Free rule groups, look for Core rule set and add it to your web ACL by selecting the toggle Add to web ACL.
  4. Choose Edit.
  5. From the list of rules, set the rule generating false positives to the Count action, by selecting the Count toggle beside the rule. This example changes the action for the rule GenericRFI_QUERYARGUMENTS to Count. This ensures that all the matching requests are sent to the subsequent WAF rules in order of priority and adds the label awswaf:managed:aws:commonruleset:GenericRFI_QueryArguments whenever there’s a matching request.
  6. Choose on Save rule.
  7. Choose Add rules again to go to the next window where you can set the rule priority. The managed rule must have a higher priority than the custom rule that you will create in the next steps.
  8. Choose Save to save the configuration.

Step 2: Add a custom rule to the web ACL with lower priority than the managed rule

Create a custom rule in the web ACL that blocks requests if it has the label that you are looking for and doesn’t have the exception condition that caused the false positive. The priority of this custom rule should be set lower than the managed rule.

To add a custom rule with lower priority than the managed rule

  1. In your AWS WAF console, navigate to your web ACL Rules tab and choose Add Rule and select Add my own rules and rule groups.
  2. Select Rule Builder for the rule type.
  3. Enter a Rule Name and select Regular Rule as the Type.
  4. Use the If a request drop down to select matches all the statements (AND).
  5. Statement 1 checks if the request has the label that you’re looking for. In this example it is configured with the following details:
    Inspect: Select Has a label
    Labels: Select Label
    Match key: Select awswaf:managed:aws:commonruleset:GenericRFI_QueryArguments
  6. All subsequent statements must be negated so that the requests don’t match the statement criteria and will be treated as legitimate requests. In this example, we set NOT Statement 2, that checks if the request contains https://www.redirect-domain.com/ in its query string:
    Enable: Select Negate statement results
    Inspect: Select All query parameters
    Match type: Select Contains string
    String to Match: Enter https://www.redirect-domain.com/
    Text transformation: Select None
  7. Under Action, select Block and choose Add rule.
  8. In the Set rule Priority window, set the rule priority of your custom rule to lower than the AWS Managed Rules rule.
  9. Choose Save.

Case 3: Use a scope-down statement to narrow the scope of traffic matching a managed rule group

A scope-down statement can be added to any rule group to narrow the scope of the requests that a rule group evaluates. This allows you to either filter in the requests that you want the rule group to inspect or filter out any requests that doesn’t meet the criteria.

Consider a case where you have a list of trusted IP address that you don’t want to be evaluated against AmazonIPReputationList. You can avoid blocking these trusted IP addresses by using a scope-down statement to exclude the traffic from evaluation.

Step 1: Create the IP set for allowed list of IPs

The first step is to create an IP set that contains the allowed list of IPs. The IP set can be created for a particular AWS Region, or can be global if the web ACL is associated with an Amazon CloudFront distribution.

To create an IP set

  1. Choose IP sets in the AWS WAF console and then choose Create IP set.
  2. In IP set name enter Allowed IPs. Enter the IPs that you want to allow in IP addresses. Choose Create IP set when done.

Figure 5: Console screenshot creating an IP setFigure 5: Console screenshot creating an IP set

Step 2: Add a scope-down statement to the managed rule group

Once you have created the IP set, you can add a scope down statement in your managed rule group so that traffic originating from the IPs in the IP set aren’t evaluated against the rules in the managed rule group.

To add a scope-down statement

  1. On the Rules tab of you your web ACL, choose Add Rule and select Add managed rule groups.
  2. Select AWS managed rule groups.
  3. Under Free rule groups, turn on Amazon IP reputation list to add it to the web ACL and choose Edit.
  4. Select Enable scope-down statement.

Figure 6: Console screenshot showing enabling the scope-down statementFigure 6: Console screenshot showing enabling the scope-down statement

  1. Add the condition so that only the requests that don’t originate from the allowed IPs list created earlier are evaluated for this rule group. Use the If a request drop down to select doesn’t match the statements (NOT).
    Inspect: Select Originates from an IP address in
    IP set: Select Allowed-IPs
    IP address to use as the originating address: Select Source IP address

Figure 7: Scope down statement configuration console screenshotFigure 7: Scope down statement configuration console screenshot

  1. Choose Save rule to add this rule to your web ACL.

Case 4: Use custom responses to change the default block action for a managed rule group

AWS WAF sends back response code 403 (forbidden) when it blocks an incoming request. You can use the custom response feature to instead send a custom HTTP response back to the client when the rule blocks access. Using the custom response, you can customize the status code, response headers, and response body.

Let’s say you want to respond back to a client who might be connecting to your application over VPN. You want to use a custom response to inform the user that this behavior is discouraged, by sending error code 400 (Bad Request) and a static body message (“Please don’t try to connect over a VPN”). To do this, you can use the AWS Managed Rule group AWSManagedRulesAnonymousIpList and then set up custom rules using the label awswaf:managed:aws:anonymous-ip-list:AnonymousIPList.

Step 1: Create a custom response body

The first step in creating a custom response is to create a custom response body. This is the message that will be shown when the custom response is sent.

To create a custom response body

  1. In the AWS WAF console, open your web ACL and select the Custom response bodies tab.
  2. Choose Create custom response body.
  3. In Response body object name, enter a name for this response—for example, Custom-body-IP-list.
  4. Choose a Content type for the response body.
  5. In Response body, enter the response that you want to send back to the client.
  6. Choose Save.

Figure 8: Custom response body creation on the AWS WAF consoleFigure 8: Custom response body creation on the AWS WAF console

Step 2: Override the actions of the managed rule group

The rule you use to send your custom response should be in count mode. This will ensure that all the matching requests are sent to the subsequent WAF rules in priority order. In the following example, the rule AnonymousIPList in the managed rule group AWSManagedRulesAnonymousIpList is set to count mode. For more details on how to override the action of a managed rule group, see Overriding the actions of a rule group or its rules.

Figure 9: console screenshot overriding an AWS Managed Rules ruleFigure 9: console screenshot overriding an AWS Managed Rules rule

Step 3: Create a rule to block the request and send a custom response back to the client

You’ll use the AWS WAF labels feature for this step. As explained in Case 2 above, you need to create a custom rule that matches the label generated by the managed rule. In this case the, custom rule should be configured to send your custom response.

To create a custom rule

  1. Expand the Custom response section and select Enable.
  2. Under Response code, enter the custom HTTP status code you want to send back to the client.
  3. (Optional) Use the Response headers section if you wish to add a custom response header
  4. Under Choose how you would like to specify the response body, select the custom response body you created in Step 1.
  5. (Optional) If you wish to generate additional labels to track activity in logs, you can use Add label.
  6. Choose Add rule.
  7. Set the rule priority of your custom rule to lower than the AWS Managed Rules rule.
  8. Choose Save.

Figure 10: Console screenshot configuring a custom response body for a ruleFigure 10: Console screenshot configuring a custom response body for a rule

Summary

In this post, we demonstrated how the new AWS WAF features such as labels, version management, scope-down statements, and custom responses can help you customize the behavior of AWS Managed Rules to protect your web applications and minimize risk. You can use these features in various ways, such as customizing AWS Managed Rules by combining labels and request properties to allow or block requests, and using labels to help filter logs.

You can learn more about AWS WAF in other AWS WAF–related Security Blog posts.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Madhu Kondur

Madhu Kondur

Madhu is a cloud support engineer at AWS. He’s passionate about helping customers solve their AWS issues. He specializes in network security and enjoys helping customers get the best cloud experience possible through AWS.

Venugopal Pai

Venugopal Pai

Venugopal is a solutions architect at AWS. He lives in Bengaluru, India, and helps customers scale and optimize their applications in AWS.

Hardening the security of your AWS Elastic Beanstalk Application the Well-Architected way

Post Syndicated from Laurens Brinker original https://aws.amazon.com/blogs/security/hardening-the-security-of-your-aws-elastic-beanstalk-application-the-well-architected-way/

Launching an application in AWS Elastic Beanstalk is straightforward. You define a name for your application, select the platform you want to run it on (for example, Ruby), and upload the source code. The default Elastic Beanstalk configuration is intended to be a starting point which prioritizes simplicity and ease of setup. This allows you to quickly deploy a web application on the AWS Cloud. For increased security of production applications, we recommend additional steps you can take to complement the default configuration.

In this post we will describe our recommendations, which are aligned with the AWS Well-Architected Framework, to help you harden the security posture of your Elastic Beanstalk applications. The Well-Architected Framework provides best practices to help you build secure, high-performing, resilient, and efficient infrastructure for your applications and workloads. Focusing on the Security pillar of the framework, we will walk you through additional configurations for increased network protection and protection of data at rest and in transit.

Introduction to Elastic Beanstalk

Elastic Beanstalk is an orchestration service that provisions and operates infrastructure in the AWS Cloud. You can use Elastic Beanstalk to deploy and manage applications in the cloud. Elastic Beanstalk supports many programming languages and frameworks, such as Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker. Elastic Beanstalk can help you decrease overhead by handling tasks such as resource provisioning, load balancing, auto scaling, and health monitoring. You only need to upload the application code. Elastic Beanstalk automatically integrates with other AWS services such as Amazon CloudWatch for logging and monitoring.

Target scenario for this post

This post shows you how to achieve the following things:

  • Launch a highly available Ruby application on Elastic Beanstalk.
  • Attach a MySQL database to the application using Amazon RDS.
  • Protect your sensitive data.
  • Align your application’s security configuration to the Security pillar of the Well-Architected Framework.

Figure 1: Target architecture for the two-tier web application deployed using Elastic Beanstalk

Figure 1: Target architecture for the two-tier web application deployed using Elastic Beanstalk

Figure 1 depicts the target architecture, which is a two-tier web application. Clients resolve the website’s domain name using the Domain Name System (DNS) service Amazon Route 53. An Application Load Balancer (ALB) is used to direct traffic to and from the Amazon EC2 instances which are running the web servers. The EC2 instances are deployed in an Auto Scaling group in private subnets. To ensure that clients can always access the application, the infrastructure is setup so that it can automatically deal with system failures and scale up when there’s an increase in demand. This is done by placing the EC2 instances in the Auto Scaling group across two Availability Zones for high availability. There is also an RDS MySQL database deployed in a private subnet, which is replicated to a stand-by instance in another Availability Zone for disaster recovery. Logs and Metrics are sent to CloudWatch, and Amazon Simple Storage Service (Amazon S3) is used to store logs and source code. Finally, a Network Address Translation (NAT) gateway and Internet gateway manage inbound and outbound traffic to subnets.

The following sections focus on the four main security configurations numbered in Figure 1:

  1. Deploying the EC2 and RDS instances from the web and database layer in private subnets.
  2. Encrypting the logging and source code S3 bucket.
  3. Encrypting the RDS instance and its stand-by replica.
  4. Encrypting data in transit by using the HTTPS protocol.

Strengthening your Elastic Beanstalk application based on the Security pillar of the Well-Architected Framework

To harden the security of your Elastic Beanstalk application, you can build on top of the default setup to incorporate the following security best practices:

  1. Protect networks In the default Elastic Beanstalk setup, the EC2 instances are deployed together with an Application Load Balancer (ALB) in a public subnet. In most cases, EC2 instances do not need to be directly accessible from the internet and therefore should be placed in private subnets. The ALB should be left in the public subnet to provide a single entry-point for inbound traffic from external clients and forward this traffic to the instances over a private network. If these instances need to make a direct outbound connection to the internet, for example to call third-party APIs, we recommend creating a Network Address Translation (NAT) gateway in a public subnet, and adding a route from the private subnet where your instances are running to the NAT Gateway. Your instances can then send requests to the internet and receive corresponding responses through the NAT gateway, but the instances themselves will not be directly accessible from the internet. For more options on interactively accessing instances see AWS Systems Manager.
  2. Protect data at rest We recommend encrypting data at rest. Elastic Beanstalk does not encrypt data stored in Amazon S3 buckets by default, so you should modify the default setup to encrypt the bucket. Similarly, when you set up an RDS database directly through Elastic Beanstalk, you don’t have the option to encrypt the database, so you need to set up your database independently and enable encryption.
  3. Protect data in transit – Web traffic sent between your clients and the ALB over the internet should use HTTPS rather than HTTP. The HTTPS protocol creates an encrypted connection through TLS (Transport Layer Security) between client and server before sending any web traffic. The default setup in Elastic Beanstalk uses HTTP, so the choice to use HTTPS and how to enable it sits with the user. Setting up HTTPS can be done with SSL / TLS server certificates (X.509 certificates) which you manage inside AWS using AWS Certificate Manager or through an external provider. ALB supports TLS-termination, which means that it takes care of the encryption and decryption of the traffic communicated with clients, and then forwards the traffic to the instances over the AWS private network.

Implementing the recommended best practices for your application

To implement the best practices from the section above, you will take the following steps to launch your application, protect networks and to protect data at rest and in transit:

  1. Create your own VPC with public and private subnets.
  2. Create a highly-available Elastic Beanstalk application.
  3. Modify the configuration to deploy instances in private subnets.
  4. Encrypt the log and source code bucket.
  5. Launch an encrypted RDS instance.
  6. Set up encryption in-transit by using the HTTPS protocol.

Create your VPC with public and private subnets

  1. In the AWS Management Console, go to VPC, and select Launch VPC wizard.
  2. Select the VPC with Public and Private Subnets option on the left-hand side, as shown in Figure 2.

Figure 2: Launch VPC wizard

Figure 2: Launch VPC wizard

  1. Click Select.
  2. Adjust the VPC specifications as needed. Specify a CIDR range and a name for the VPC. For the private and public subnets, you need to additionally specify the subnets CIDR range as well as which Availability Zone they should be created in. In order for instances in the private subnet to access the internet, the set-up creates a NAT gateway that resides in the public subnet. In order to do that, you need to specify an Elastic IP ID. If you don’t have an Elastic IP yet, under the VPC console go to Elastic IP addresses, click on Allocate Elastic IP address and Allocate. Use the Allocation ID in the VPC wizard.
  3. Select Create VPC.
  4. Because the target architecture is highly available, another set of public and private subnets needs to be created and set to reside in a different Availability Zone from the subnets you configured in step 4. This is done by going to the Subnets section in the VPC Console. Click on Create subnet, select the VPC you just created, add a new subnet, making sure to assign it to a different Availability Zone. Press Add new subnet to add a second subnet on the same configuration page. When done, press Create subnet.
  5. By default, the subnets will use the main routing table, which will treat them as private subnets. In order to make one of the newly created subnets public, it needs to be added to the route table, which has a route to the Internet Gateway. Go to the Route Tables section in the VPC Console and find the route table associated with your newly created VPC, which has the route to the Internet Gateway. This should be the Route Table which has 1 explicit subnet association. Click on the Route Table’s ID, and verify that there’s a route to a target with the igw- prefix. Then, under the Subnet association tab, edit the explicit subnet associations to include the newly created subnet.

After this is done, you should have two public and two private subnets across two Availability Zones for your new VPC.

Create a highly available Elastic Beanstalk application

The following steps will show you how to create a highly available Elastic Beanstalk application.

  1. In the AWS Management Console, choose Elastic Beanstalk, and then, in the Get Started section, select Create Application.
  2. Provide a name for the application and define the platform it should run on. In our example, the platform is Ruby.
  3. Provide the source code for your web application or use the sample code provided in the Elastic Beanstalk setup console.
    • To use the sample code, select Sample Application.
    • To upload your own source code, in the Source code origin section, for Version label, enter the name of your application code, and then for Source code origin, choose Local file, select Choose File, and navigate to the file that you want to use, as shown in Figure 3.

Figure 3: Source code origin section of the Elastic Beanstalk console

Figure 3: Source code origin section of the Elastic Beanstalk console

  1. Select Configure more options
  2. Depending on your application’s needs, you can select a configuration preset that includes recommended values for several configurations. Select High Availability to include a load balancer and auto scaling for multiple Availability Zones.

Deploy your instances in private subnets

In this step, you will set up Elastic Beanstalk to deploy the Application Load Balancer in public subnets to provide a point of access for inbound traffic from the internet, and you deploy the EC2 instances in a private subnet.

While still in the Configure more options settings:

  1. In the Network section, select Edit, and then, from the dropdown list, select the VPC that you just created.
  2. To deploy your instances in private subnets, in the Load balancer settings section, for Load balancer subnets, check the box next to each public subnet, and in the Instance settings section, for Instance subnets, check the box next to each private subnet, as shown in Figure 4.
Figure 4: Elastic Beanstalk subnet settings for Load Balancer and instances

Figure 4: Elastic Beanstalk subnet settings for Load Balancer and instances

  1. Select Save.

Encrypt the log and source code bucket and block public access

After Elastic Beanstalk has created the application, you can encrypt the S3 bucket.

  1. Open the S3 console and choose the bucket that was created automatically as part of the Elastic Beanstalk setup. The bucket name will have the following structure: elasticbeanstalk-region-account-id.
  2. To encrypt the bucket, choose Properties, and then, for Default Encryption, select Edit, and for Server-side encryption, select Enable.
  3. For Encryption key type, you can use an S3-managed encryption key by selecting Amazon S3 key (SSE-S3). If you want more control over the keys used for encryption, select AWS Key Management Service key (SSE-KMS), which is an encryption key protected by AWS Key Management Service (KMS). Here, you can specify to use an AWS managed key or one of your own Customer managed keys from KMS. For more information on SSE-KMS, visit Protecting Data Using Server-Side Encryption with KMS keys Stored in AWS Key Management Service (SSE-KMS).
  4. Select Save changes.

Even though the bucket that was created by Elastic Beanstalk is non-public by default, we recommend to enable “S3 Block Public Access” at the account level or at least at the bucket level to prevent tampering or accidentally changing this setting in the future.

  1. In your S3 console, click on Block Public Access settings for this account.
  2. If Block all public access is not yet enabled, click on Edit, check the box next to Block all public access and click Save.
  3. Apart from that, you can also block public access at the bucket level. For this, click on the respective bucket, open the Permissions section and edit Block public access (bucket settings) similarly to how you did in step 2.

Launch an encrypted RDS instance

Elastic Beanstalk allows you to set up and run RDS instances in your Elastic Beanstalk environment. Until recently, the database was tied to the lifecycle of the Elastic Beanstalk environment, and its use was recommended to be limited to development and testing environments only. For example, if you previously launched an RDS instance using Elastic Beanstalk, and the Elastic Beanstalk environment was terminated, the RDS instance would also be deleted. As of October 6, 2021, Elastic Beanstalk now supports Database Decoupling, so that the database will persist when the environment is deleted.

However, Elastic Beanstalk currently does not allow you to set up encryption for your database. For this reason, this post shows you how to set up your Elastic Beanstalk application with a decoupled database, by creating the database directly in the RDS service, separate from your Elastic Beanstalk application. RDS allows you to encrypt your database.

Decoupling your database and setting it up directly through the RDS service in the AWS console will require additional steps for integration with your Elastic Beanstalk application, which this post will walk you through.

Note: If you are using the Elastic Beanstalk service to create your RDS instance, you can select one of three options:

  • The first option, enabled by selecting the Create snapshot retention option in the database settings in the Elastic Beanstalk console, makes sure that Elastic Beanstalk creates a snapshot of your database prior to termination. You can restore an existing snapshot of your database through the Elastic Beanstalk console. Bear in mind that there will be downtime of your database between snapshot creation and snapshot restore.
  • The second option, Retain, creates a decoupled database, which persists if the Elastic Beanstalk environment has been terminated.
  • The third option Delete removes the database upon termination.

In this step, you will create an encrypted RDS database, allow access to the database from your application’s instances only, and add the required environment variables to your application so you can use your database in the application.

  1. On the RDS service page in the console, select Create database.
  2. For the database creation method, select Standard create.
  3. For Engine options, choose MySQL and select the latest version.
  4. For Templates choose either the Dev/Test or Production template according to your use case.
  5. In the Settings section, provide a name to use as the database identifier and set a username and password.
  6. Select the appropriate DB instance class that meets your processing power and memory requirements.
  7. For Storage, select your storage type and allocate storage.
  8. If you need Multi-AZ deployment, in the Availability & durability section, choose Create a standby instance.
  9. In the Connectivity section, select the VPC that you created in the Create your VPC with public and private subnets section earlier in this blog post, and verify that Public access has been set to No. For VPC security group, choose Create new and provide a name to identify the group later on.
  10. In the Additional configuration section, under Encryption, choose Enable Encryption. You can choose the default AWS KMS key if you’re happy with AWS managing the keys, or provide a custom key if you want more control. Bear in mind that the encryption option cannot be changed after the database has been created.
  11. Leave the defaults for the remaining settings and select Create database.

After you set up the RDS database and your new Elastic Beanstalk application, you can add the database to your application.

  1. In the RDS console, go to your newly created RDS database and scroll down to Security group rules.
  2. Select the security group that has the CIDR/IP – Inbound type.
  3. Under Inbound rules, select the rule that is listed, and then select Edit inbound rules.
  4. Under the Source column, make sure Custom is selected, and in the search-box next to it, select the security group associated with your Elastic Beanstalk Auto Scaling group.

Important: As a security best practice, you should allow traffic to your RDS database from your instances only. Therefore, make sure the security group allows traffic only from the Auto Scaling group’s security group, and that it has no additional entries.

  1. To add the RDS details to the Elastic Beanstalk environment properties, go to your application’s environment in the Elastic Beanstalk service and navigate to Configuration > Software > Edit > Environment Properties. Add RDS_HOSTNAME, RDS_PORT, RDS_DB_NAME, RDS_USERNAME and RDS_PASSWORD as properties and provide the values that you used to set up the database.
  2. Restart the application by going back to your Elastic Beanstalk environment, and then under Environment actions, choose Restart app server(s).

After the server has restarted, you can access the RDS database in your web application by using the environment properties you set in the console, just as you would if you attached the database directly through the Elastic Beanstalk setup. For more information on using environment properties, visit Environment properties and other software settings.

The new database is now separate from your application and it is encrypted to provide data protection at rest.

Important: The environment properties, including the database username and password, are visible and stored in plain text in the Environment Properties in Elastic Beanstalk.

Depending on your security requirements, you can choose to use AWS Secrets Manager to protect your database credentials, which you can then fetch programmatically in your Elastic Beanstalk instance or through Elastic Beanstalk’s custom environment configuration files (.ebextensions). To learn more about using Secrets Manager to protect and rotate database credentials, see Rotate Amazon RDS database credentials automatically with AWS Secrets Manager. However, this will require additional configuration for Elastic Beanstalk and is beyond the scope of this post.

A second possibility is to use IAM database authentication, which allows you to use your Elastic Beanstalk’s EC2 IAM role to connect to your database. This method leverages short-lived authentication tokens rather than a static database password. In order to set this up, you need to enable IAM database authentication, create an IAM policy to allow IAM database access and create a database account for IAM authentication using the AWSAuthenticationPlugin (for MySQL). Authentication tokens are valid for 15 minutes, and if your web instances need to create a new database connection, or reconnect, authentication tokens will need to be refreshed if they have expired, otherwise the connection will be rejected.

For an implementation guide, check out How do I allow users to authenticate to an Amazon RDS MySQL DB instance using their IAM credentials. For Ruby applications, you can get the authentication token in your application by leveraging the auth_token_generator method in the Ruby aws-sdk.

Set up encryption in transit using the HTTPS protocol

In the Elastic Beanstalk architecture, you can encrypt data in transit at three connection points: from your clients to the load balancer, from the load balancer to the EC2 instances, and from the EC2 instances to the RDS database.

Securing the connection from clients to the ALB

You can use a custom domain name to use HTTPS for your Elastic Beanstalk environment and have your clients can connect securely to your environment. If you don’t have a domain name, you can assign a self-signed server certificate to your ALB to use HTTPS for development and testing purposes.

To secure the connection to your ALB, add a HTTPS listener for the traffic inbound port (typically 443) and attach an TLS / SSL server certificate (X.509 certificate). To generate certificates, use AWS Certificate Manager or third-party providers such as Let’s Encrypt. For a walkthrough on how to set up an HTTPS listener through the console or through .ebextensions configuration files, see the Configuring your Elastic Beanstalk environment’s load balancer to terminate HTTPS.

Securing the connection from the ALB to the EC2 instances

While securing the connection between clients and the ALB is enough for most applications, in some cases a complete end-to-end encryption may be required; for example, to comply with (external) regulations. To secure the connection from your ALB to your application running on an EC2 instance, you must use the .ebextensions configuration files to modify the software running on the instance. You then need to allow the HTTPS traffic to pass through from the ALB to your EC2 instance by allowing inbound traffic on port 443 on the instance’s security group. For a Ruby specific example, see Terminating HTTPS on EC2 instances running Ruby.

For a complete end-to-end encryption walkthrough, see How can I configure HTTPS for my Elastic Beanstalk environment?

Securing the RDS connection

To securely connect from your application to your RDS database, you can use SSL or TLS to encrypt the connection. You will need to download an RDS root certificate and require your application to use this certificate when connecting to the RDS instance to verify the RDS server certificate. For more information on how to download and use the root certificate to setup a secure RDS connection, see the Using SSL with a MySQL DB instance documentation page.

This post has shown you how to align your application with some of the security best practices of the Well-Architected Framework. After completing these steps, your architecture includes four key modifications to improve security:

  1. The EC2 and RDS instances are deployed in a private subnet.
  2. The logging and source code S3 bucket is encrypted.
  3. An encrypted RDS instance is attached.
  4. Encryption occurs in transit by using the HTTPS protocol.

Conclusion

In this post, we’ve covered the additional configuration you should be aware of to harden the security posture of your Elastic Beanstalk applications, aligning to the Security pillar of the Well-Architected Framework. The final setup you created uses a VPC and private subnets to allow internet access only to resources that require it, and provides encryption at rest and in transit using AWS Cloud Security services and secure protocols. The Well-Architected Framework describes additional concepts, design principles, and architectural best practices for designing and running workloads in the cloud. To learn more, see AWS Well-Architected.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Laurens Brinker

Laurens Brinker

Laurens Brinker is an Associate Solutions Architect based in London who is part of the Security Community at AWS. Laurens joined AWS as part of the TechU Graduate program in 2020 and now helps customers running their workloads securely in the AWS Cloud. Outside of work, Laurens enjoys cycling, a casual game of chess, and building small web applications.

Katja Philipp

Katja Philipp

Katja Philipp is an Associate Solutions Architect based in Germany. With a background in M.Sc. Information Systems, she joined AWS in September 2020 with the TechU Graduate program. She enables her customers in the Power & Utilities vertical with best practices around their cloud journey. Katja is passionate about sustainability and how technology can be leveraged to solve current challenges for a better future.

Laura Verghote

Laura Verghote

Laura Verghote is an Associate Technical Trainer based in London, UK. With a background in electrical engineering, she joined AWS in September 2020 with the TechU Graduate program. She delivers a variety of technical trainings to AWS customers across EMEA.

Kimessha Paupamah

Kimessha Paupamah

Kimessha Paupamah is a ProServe Consultant based in South Africa. With a background in Computer Science, she joined AWS in September 2020 with the TechU Graduate program. She accelerates customer business outcomes through guidance on how to architect, design, develop and implement the AWS platform. Kimessha is passionate about enabling customers to build innovative solutions in the cloud.

Benjamin Richer

Benjamin Richer

Benjamin Richer is an Associate Solutions Architect based in Paris. With a background in Network & Telecom, he joined AWS in 2020 through the TechU Graduate Program. Currently working in the Digital Native Business segment he helps grown up Startups optimizing their workload in the Cloud.

Choose the right storage tier for your needs in Amazon OpenSearch Service

Post Syndicated from Changbin Gong original https://aws.amazon.com/blogs/big-data/choose-the-right-storage-tier-for-your-needs-in-amazon-opensearch-service/

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) enables organizations to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite derived from Elasticsearch. Amazon OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).

In this post, we present three storage tiers of Amazon OpenSearch Service—hot, UltraWarm, and cold storage—and discuss how to effectively choose the right storage tier for your needs. This post can help you understand how these storage tiers integrate together and what the trade-off is for each storage tier. To choose a storage tier of Amazon OpenSearch Service for your use case, you need to consider the performance, latency, and cost of these storage tiers in order to make the right decision.

Amazon OpenSearch Service storage tiers overview

There are three different storage tiers for Amazon OpenSearch Service: hot, UltraWarm, and cold. The following diagram illustrates these three storage tiers.

Hot storage

Hot storage for Amazon OpenSearch Service is used for indexing and updating, while providing fast access to data. Standard data nodes use hot storage, which takes the form of instance store or Amazon Elastic Block Store (Amazon EBS) volumes attached to each node. Hot storage provides the fastest possible performance for indexing and searching new data.

You get the lowest latency for reading data in the hot tier, so you should use the hot tier to store frequently accessed data driving real-time analysis and dashboards. As your data ages, you access it less frequently and can tolerate higher latency, so keeping data in the hot tier is no longer cost-efficient.

If you want to have low latency and fast access to the data, hot storage is a good choice for you.

UltraWarm storage

UltraWarm nodes use Amazon Simple Storage Service (Amazon S3) with related caching solutions to improve performance. UltraWarm offers significantly lower costs per GiB for read-only data that you query less frequently and don’t need the same performance as hot storage. Although you can’t modify the data while in UltraWarm, you can move the data to the hot storage tier for edits before moving it back.

When calculating UltraWarm storage requirements, you consider only the size of the primary shards. When you query for the list of shards in UltraWarm, you still see the primary and replicas listed. Both shards are stubs for the same, single copy of the data, which is in Amazon S3. The durability of data in Amazon S3 removes the need for replicas, and Amazon S3 abstracts away any operating system or service considerations. In the hot tier, accounting for one replica, 20 GB of index uses 40 GB of storage. In the UltraWarm tier, it’s billed at 20 GB.

The UltraWarm tier acts like a caching layer on top of the data in Amazon S3. UltraWarm moves data from Amazon S3 onto the UltraWarm nodes on demand, which speeds up access for subsequent queries on that data. For that reason, UltraWarm works best for use cases that access the same, small slice of data multiple times. You can add or remove UltraWarm nodes to increase or decrease the amount of cache against your data in Amazon S3 to optimize your cost per GB. To dial in your cost, be sure to test using a representative dataset. To monitor performance, use the WarmCPUUtilization and WarmJVMMemoryPressure metrics. See UltraWarm metrics for a complete list of metrics.

The combined CPU cores and RAM allocated to UltraWarm nodes affects performance for simultaneous searches across shards. We recommend deploying enough UltraWarm instances so that you store no more than 400 shards per ultrawarm1.medium.search node and 1,000 shards per ultrawarm1.large.search node (including both primaries and replicas). We recommend a maximum shard size of 50 GB for both hot and warm tiers. When you query UltraWarm, each shard uses a CPU and moves data from Amazon S3 to local storage. Running single or concurrent queries that access many indexes can overwhelm the CPU and local disk resources. This can cause longer latencies through inefficient use of local storage, and even cause cluster failures.

UltraWarm storage requires OpenSearch 1.0 or later, or Elasticsearch version 6.8 or later.

If you have large amounts of read-only data and want to balance the cost and performance, use UltraWarm for your infrequently accessed, older data.

Cold storage

Cold storage is optimized to store infrequently accessed or historical data at $0.024 per GB per month. When you use cold storage, you detach your indexes from the UltraWarm tier, making them inaccessible. You can reattach these indexes in a few seconds when you need to query that data. Cold storage is a great fit for scenarios in which a low ROI necessitates an archive or delete action on historical data, or if you need to conduct research or perform forensic analysis on older data with Amazon OpenSearch Service.

Cold storage doesn’t have specific instance types because it doesn’t have any compute capacity attached to it. You can store any amount of data in cold storage.

Cold storage requires OpenSearch 1.0 or later, or Elasticsearch version 7.9 or later and UltraWarm.

Manage storage tiers in OpenSearch Dashboards

OpenSearch Dashboards installed on your Amazon OpenSearch Service domain provides a useful UI for managing indexes in different storage tiers on your domain. From the OpenSearch Dashboards main menu, you can view all indexes in hot, UltraWarm, and cold storage. You can also see the indexes managed by Index State Management (ISM) policies. OpenSearch Dashboards enables you to migrate indexes between UltraWarm and cold storage, and monitor index migration status, without using the AWS Command Line Interface (AWS CLI) or configuration API. For more information on OpenSearch Dashboards, see Using OpenSearch Dashboards with Amazon OpenSearch Service.

Cost considerations

The hot tier requires you to pay for what is provisioned, which includes the hourly rate for the instance type. Storage is either Amazon EBS or a local SSD instance store. For Amazon EBS-only instance types, additional EBS volume pricing applies. You pay for the amount of storage you deploy.

UltraWarm nodes charge per hour just like other node types, but you only pay for the storage actually stored in Amazon S3. For example, although the instance type ultrawarm1.large.elasticsearch provides up to 20 TiB addressable storage on Amazon S3, if you only store 2 TiB of data, you’re only billed for 2 TiB. Like the standard data node types, you also pay an hourly rate for each UltraWarm node. For more information, see Pricing for Amazon OpenSearch Service.

Cold storage doesn’t incur compute costs, and like UltraWarm, you’re only billed for the amount of data stored in Amazon S3. There are no additional transfer charges when moving data between cold and UltraWarm storage.

Example use case

Let’s look at an example with 1 TB of source data per day, 7 days hot, 83 days warm, 365 days cold. For more information on sizing the cluster, see Sizing Amazon OpenSearch Service domains.

For hot storage, you can go through a baseline estimation with the calculation as: storage needed = (daily source data in bytes * 1.25) * (number_of_replicas + 1) * number of days retention. With the best practice for two replicas, we should use two replicas here. The minimum storage requirement to retain 7 TB of data on the hot tier is (7TB*1.25)*(2+1)= 26.25 TB. For this amount of storage, we need 6x R6g.4xlarge.search instances given the Amazon EBS size limit.

We also need to verify from the CPU side, we need 25 primary shards (1TB*1.25/50GB) =25. We have two replicas. With that, we have total 75 active shards. With that, the total vCPU needed is 75*1.5=112.5 vCPU. This means 8x R6g.4xlarge.search instances. This also requires three dedicated c6g.xlarge.search leader nodes.

When calculating UltraWarm storage requirements, you consider only the size of the primary shards, because that’s the amount of data stored in Amazon S3. For this example, the total primary shard size for warm storage is 83*1.25=103.75 TB. Each ultrawarm1.large.search instance has 16 CPU cores and can address up to 20 TiB of storage on Amazon S3. A minimum of six ultrawarm1.large.search nodes is recommended. You’re charged for the actual storage, which is 103.75 TB.

For cold storage, you only pay for the cost of storing 365*1.25=456.25 TB on Amazon S3. The following table contains a breakdown of the monthly costs (USD) you’re likely to incur. This assumes a 1-year reserved instance for the cluster instances with no upfront payment in the US East (N. Virgina) Region.

Cost Type Pricing Usage Cost per month
Instance Usage R6g.4xlarge.search = $0.924 per hour 8 instances * 730 hours in a month = 5,840 hours 5,840 hours * $0.924 = $5,396.16
c6g.xlarge.search = $0.156 per hour 3 instances (leader nodes) * 730 hours in a month = 2,190 hours 2,190 hours * $0.156 = $341.64
ultrawarm1.large.search = $2.68 per hour 6 instances * 730 hours = 4,380 hours 4,380 hours * $2.68 = $11,738.40
Storage Cost Hot storage cost (Amazon EBS) EBS general purpose SSD (gp3) = $0.08 per GB per month 7 days host = 26.25TB 26,880 GB * $0.08 = $2,150.40
UltraWarm managed storage cost = $0.024 per GB per month 83 days warm = 103.75 TB per month 106,240 GB * $0.024 = $2,549.76
Cold storage cost on Amazon S3 = $0.022 per GB per month 365 days cold = 456.25 TB per month 467,200 GB * $0.022 = $10,278.40

The total monthly cost is $32,454.76. The hot tier costs $7,888.20, UltraWarm costs $14,288.16, and cold storage is $10,278.40. UltraWarm allows 83 days of additional retention for slightly more cost than the hot tier, which only provides 7 days. For nearly the same cost as the hot tier, the cold tier stores the primary shards for up to 1 year.

Conclusion

Amazon OpenSearch Service supports three integrated storage tiers: hot, UltraWarm, and cold storage. Based on your data retention, query latency, and budgeting requirements, you can choose the best strategy to balance cost and performance. You can also migrate data between different storage tiers. To start using these storage tiers, sign in to the AWS Management Console, use the AWS SDK, or AWS CLI, and enable the corresponding storage tier.


About the Author

Changbin Gong is a Senior Solutions Architect at Amazon Web Services (AWS). He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Changbin enjoys reading, running, and traveling.

Rich Giuli is a Principal Solutions Architect at Amazon Web Service (AWS). He works within a specialized group helping ISVs accelerate adoption of cloud services. Outside of work Rich enjoys running and playing guitar.

Insulating AWS Outposts Workloads from Amazon EC2 Instance Size, Family, and Generation Dependencies

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/insulating-aws-outposts-workloads-from-amazon-ec2-instance-size-family-and-generation-dependencies/

This post is written by Garry Galinsky, Senior Solutions Architect.

AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low-latency access to on-premises systems, local data processing, data residency, and application migration with local system interdependencies.

Unlike AWS Regions, which offer near-infinite scale, Outposts are limited by their provisioned capacity, EC2 family and generations, configured instance sizes, and availability of compute capacity that is not already consumed by other workloads. This post explains how Amazon EC2 Fleet can be used to insulate workloads running on Outposts from EC2 instance size, family, and generation dependencies, reducing the likelihood of encountering an error when launching new workloads or scaling existing ones.

Product Overview

Outposts is available as a 42U rack that can scale to create pools of on-premises compute and storage capacity. When you order an Outposts rack, you specify the quantity, family, and generation of Amazon EC2 instances to be provisioned. As of this writing, five EC2 families, each of a single generation, are available on Outposts (m5, c5, r5, g4dn, and i3en). However, in the future, more families and generations may be available, and a given Outposts rack may include a mix of families and generations. EC2 servers on Outposts are partitioned into instances of homogenous or heterogeneous sizes (e.g., large, 2xlarge, 12xlarge) based on your workload requirements.

Workloads deployed through AWS CloudFormation or scaled through Amazon EC2 Auto Scaling generally assume that the required EC2 instance type will be available when the deployment or scaling event occurs. Although in the Region this is a reasonable assumption, the same is not true for Outposts. Whether as a result of competing workloads consuming the capacity, the Outpost having been configured with limited capacity for a given instance size, or an Outpost update resulting in instances being replaced with a newer generation, a deployment or scaling event tied to a specific instance size, family, and generation may encounter an InsufficentInstanceCapacity error (ICE). And this may occur even though sufficient unused capacity of a different size, family, or generation is available.

EC2 Fleet

Amazon EC2 Fleet simplifies the provisioning of Amazon EC2 capacity across different Amazon EC2 instance types and Availability Zones, as well as across On-Demand, Amazon EC2 Reserved Instances (RI), and Amazon EC2 Spot purchase models. A single API call lets you provision capacity across EC2 instance types and purchase models in order to achieve the desired scale, performance, and cost.

An EC2 Fleet contains a configuration to launch a fleet, or group, of EC2 instances. The LaunchTemplateConfigs parameter lets multiple instance size, family, and generation combinations be specified in a priority order.

This feature is commonly used in AWS Regions to optimize fleet costs and allocations across multiple deployment strategies (reserved, on-demand, and spot), while on Outposts it can be used to eliminate the tight coupling of a workload to specific EC2 instances by specifying multiple instance families, generations, and sizes.

Launch Template Overrides

The EC2 Fleet LaunchTemplateConfigs definition describes the EC2 instances required for the fleet. A specific parameter of this definition, the Overrides, can include prioritized and/or weighted options of EC2 instances that can be launched to satisfy the workload. Let’s investigate how you can use Overrides to decouple the EC2 size, family, and generation dependencies.

Overriding EC2 Instance Size

Let’s assume our Outpost was provisioned with an m5 server. The server is the equivalent of an m5.24xlarge, which can be configured into multiple instances. For example, the server can be homogeneously provisioned into 12 x m5.2xlarge, or heterogeneously into 1 x m5.8xlarge, 3 x m5.2xlarge, 8 x m5.xlarge, and 4 x m5.large. Let’s assume the heterogeneous configuration has been applied.

Our workload requires compute capacity equivalent to an m5.4xlarge (16 vCPUs, 64 GiB memory), but that instance size is not available on the Outpost. Attempting to launch this instance would result in an InsufficentInstanceCapacity error. Instead, the following LaunchTemplateConfigs override could be used:

"Overrides": [
    {
        "InstanceType": "m5.4xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 0.5,
        "Priority": 2.0
    },
    {
        "InstanceType": "m5.8xlarge",
        "WeightedCapacity": 2.0,
        "Priority": 3.0
    }
]

The Priority describes our order of preference. Ideally, we launch a single m5.4xlarge instance, but that’s not an option. Therefore, in this case, the EC2 Fleet would move to the next priority option, an m5.2xlarge. Given that an m5.2xlarge (8 vCPUs, 32 GiB memory) offers only half of the resource of the m5.4xlarge, the override includes the WeightedCapacity parameter of 0.5, resulting in two m5.2xlarge instances launching instead of one.

Our overrides include a third, over-provisioned and less preferable option, should the Outpost lack two m5.2xlarge capacity: launch one m5.8xlarge. Operating within finite resources requires tradeoffs, and priority lets us optimize them. Note that had the launch required 2 x m5.4xlarge, only one instance of m5.8xlarge would have been launched.

Overriding EC2 Instance Family

Let’s assume our Outpost was provisioned with an m5 and a c5 server, homogeneously partitioned into 12 x m5.2xlarge and 12 x c5.2xlarge instances. Our workload requires compute capacity equivalent to a c5.2xlarge instance (8 vCPUs, 16 GiB memory). As our workload scales, more instances must be launched to meet demand. If we couple our workload to c5.2xlarge, then our scaling will be blocked as soon as all 12 instances are consumed. Instead, we use the following LaunchTemplateConfigs override:

"Overrides": [
    {
        "InstanceType": "c5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 2.0
    }
]

The Priority describes our order of preference. Ideally, we scale more c5.2xlarge instances, but when those are not an option EC2 Fleet would launch the next priority option, an m5.2xlarge. Here again the outcome may result in over-provisioned memory capacity (32 vs 16 GiB memory), but it’s a reasonable tradeoff in a finite resource environment.

Overriding EC2 Instance Generation

Let’s assume our Outpost was provisioned two years ago with an m5 server. Since then, m6 servers have become available, and there’s an expectation that m7 servers will be available soon. Our single-generation Outpost may unexpectedly become multi-generation if the Outpost is expanded, or if a hardware failure results in a newer generation replacement.

Coupling our workload to a specific generation could result in future scaling challenges. Instead, we use the following LaunchTemplateConfigs override:

"Overrides": [
    {
        "InstanceType": "m6.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 2.0
    },
    {
        "InstanceType": "m7.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 3.0
    }

]

Note the Priority here, our preference is for the current generation m6, even though it’s not yet provisioned in our Outpost. The m5 is what would be launched now, given that it’s the only provisioned generation. However, we’ve also future-proofed our workload by including the yet unreleased m7.

Deploying an EC2 Fleet

To deploy an EC2 Fleet, you must:

  1. Create a launch template, which streamlines and standardizes EC2 instance provisioning by simplifying permission policies and enforcing best practices across your organization.
  2. Create a fleet configuration, where you set the number of instances required and specify the prioritized instance family/generation combinations.
  3. Launch your fleet (or a single EC2 instance).

These steps can be codified through AWS CloudFormation or executed through AWS Command Line Interface (CLI) commands. However, fleet definitions cannot be implemented by using the AWS Console. This example will use CLI commands to conduct these steps.

Prerequisites

To follow along with this tutorial, you should have the following prerequisites:

Create a Launch Template

Launch templates let you store launch parameters so that you do not have to specify them every time you launch an EC2 instance. A launch template can contain the Amazon Machine Images (AMI) ID, instance type, and network settings that you typically use to launch instances. For more details about launch templates, reference Launch an instance from a launch template .

For this example, we will focus on these specifications:

  • AMI image ImageId
  • Subnet (the SubnetId associated with your Outpost)
  • Availability zone (the AvailabilityZone associated with your Outpost)
  • Tags

Create a launch template configuration (launch-template.json) with the following content:

{
    "ImageId": "<YOUR-AMI>",
    "NetworkInterfaces": [
        {
            "DeviceIndex": 0,
            "SubnetId": "<YOUR-OUTPOST-SUBNET>"
        }
    ],
    "Placement": {
        "AvailabilityZone": "<YOUR-OUTPOST-AZ>"
    },
    "TagSpecifications": [
        {
            "ResourceType": "instance",
            "Tags": [
                {
                    "Key": "<YOUR-TAG-KEY>",
                    "Value": "<YOUR-TAG-VALUE>"
                }
            ]
        }
    ]
}

Create your launch template using the following CLI command:

aws ec2 create-launch-template \
  --launch-template-name <YOUR-LAUNCH-TEMPLATE-NAME> \
  --launch-template-data file://launch-template.json

You should see a response like this:

{
    "LaunchTemplate": {
        "LaunchTemplateId": "lt-010654c96462292e8",
        "LaunchTemplateName": "<YOUR-LAUNCH-TEMPLATE-NAME>",
        "CreateTime": "2021-07-12T15:55:00+00:00",
        "CreatedBy": "arn:aws:sts::<YOUR-AWS-ACCOUNT>:assumed-role/<YOUR-AWS-ROLE>",
        "DefaultVersionNumber": 1,
        "LatestVersionNumber": 1
    }
}

The value for LaunchTemplateId is the identifier for your newly created launch template. You will need this value lt-010654c96462292e8 in the subsequent step.

Create a Fleet Configuration

Refer to Generate an EC2 Fleet JSON configuration file for full documentation on the EC2 Fleet configuration.

For this example, we will use this configuration to override a mix of instance size, family, and generation. The override includes three EC2 instance types:

  • m5.large, the instance family and generation currently available on the Outpost.
  • m6.large, a forthcoming family and generation not yet available for Outposts.
  • m7.large, a potential future family and generation.

Create an EC2 fleet configuration (ec2-fleet.json) with the following content (note that the LaunchTemplateId was the value returned in the prior step):

{
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 1,
        "OnDemandTargetCapacity": 1,
        "SpotTargetCapacity": 0,
        "DefaultTargetCapacityType": "on-demand"
    },
    "OnDemandOptions": {
        "AllocationStrategy": "prioritized",
        "SingleInstanceType": true,
        "SingleAvailabilityZone": true,
        "MinTargetCapacity": 1
    },
    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-010654c96462292e8",
                "Version": "1"
            },
            "Overrides": [
                {
                    "InstanceType": "m6.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 1.0
                },
                {
                    "InstanceType": "c5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 2.0
                },
                {
                    "InstanceType": "m5.large",
                    "WeightedCapacity": 0.25,
                    "Priority": 3.0
                },
                {
                    "InstanceType": "m5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 4.0
                },
                {
                    "InstanceType": "r5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 5.0
                }


            ]
        }
    ],
    "Type": "instant"
}

Launch the Single Instance Fleet

To launch the fleet, execute the following CLI command (this will launch a single instance, but a similar process can be used to launch multiple):

aws ec2 create-fleet \
  --cli-input-json file://ec2-fleet.json

You should see a response like this:

{
    "FleetId": "fleet-dc630649-5d77-60b3-2c30-09808ef8aa90",
    "Errors": [
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "m6.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 1.0
                }
            },
            "Lifecycle": "on-demand",
            "ErrorCode": "InvalidParameterValue",
            "ErrorMessage": "The instance type 'm6.2xlarge' is not supported in Outpost 'arn:aws:outposts:us-west-2:111111111111:outpost/op-0000ffff0000fffff'."
        },
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "c5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 2.0
                }
            },
            "Lifecycle": "on-demand",
            "ErrorCode": "InsufficientCapacityOnOutpost",
            "ErrorMessage": "There is not enough capacity on the Outpost to launch or start the instance."
        }
    ],
    "Instances": [
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "m5.large",
                    "WeightedCapacity": 0.25,
                    "Priority": 3.0
                }
            },
            "Lifecycle": "on-demand",
            "InstanceIds": [
                "i-03d6323c8a1df8008",
                "i-0f62593c8d228dba5",
                "i-0ae25baae1f621c15",
                "i-0af7e688d0460a60a"
            ],
            "InstanceType": "m5.large"
        }
    ]
}

Results

Navigate to the EC2 Console where you will find new instances running on your Outpost. An example is shown in the following screenshot:

EC2 running instances, AWS console, network view, filtered by tag

Although multiple instance size, family, and generation combinations were included in the Overrides, only the c5.large was available on the Outpost. Instead of launching one m6.2xlarge, four c5.large were launched in order to compensate for their lower WeightedCapacity. From the fleet-create response, the overrides were clearly evaluated in priority order with the error messages explaining why the top two overrides were ignored.

Clean up

AWS CLI EC2 commands can be used to create fleets but can also be used to delete them.

To clean up the resources created in this tutorial:

    1. Note the FleetId values returned in the create-fleet command.
    2. Run the following command for each fleet created:
aws ec2 delete-fleets \
  --fleet-ids  \
  --terminate-instances
  1. Note the launch-template-name used in the create-launch-template command.
  2. Run the following command for each fleet created:
{
    "SuccessfulFleetDeletions": [
        {
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-dc630649-5d77-60b3-2c30-09808ef8aa90"
        }
    ],
    "UnsuccessfulFleetDeletions": []
}
  1. Clean up any resources you created for the prerequisites.

Conclusion

This post discussed how EC2 Fleet can be used to decouple the availability of specific EC2 instance sizes, families, and generation from the ability to launch or scale workloads. On an Outpost provisioned with multiple families of EC2 instances (say m5 and c5) and different sizes (say m5.large and m5.2xlarge), EC2 Fleet can be used to satisfy a workload launch request even if the capacity of the preferred instance size, family, or generation is unavailable.

To learn more about AWS Outposts, check out the Outposts product page. To see a full list of pre-defined Outposts configurations, visit the Outposts pricing page

Building Resilient Well-Architected Workloads Using AWS Resilience Hub

Post Syndicated from Seth Eliot original https://aws.amazon.com/blogs/architecture/building-resilient-well-architected-workloads-using-aws-resilience-hub/

AWS Resilience Hub is a new service that helps you understand and improve the resiliency of your workloads using AWS Well-Architected best practices. As the lead for the Reliability Pillar of AWS Well-Architected, I am eager to share with you how you can use Resilience Hub to ensure your workload architecture is as reliable as you need.

In this blog post, I’ll show you how to use Resilience Hub to assess and improve the resiliency of your architecture based on its recommendations. I’ll start with a single Availability Zone (AZ) architecture, and evolve the architecture using the resiliency recommendations.

Single AZ architecture

Figure 1 shows the single AZ architecture I’m going to start with and assess using Resilience Hub. This simple web server runs on Amazon Elastic Compute Cloud (Amazon EC2). It serves a static web page stored in an Amazon Simple Storage Service (Amazon S3) bucket, and then records web site statistics in a MySQL Amazon Relational Database Service (Amazon RDS) database. A NAT gateway is also deployed so the EC2 servers can make calls out to the internet. When I add my application to Resilience Hub, it will discover my application structure. Then I can use it to assess my application’s resiliency per the instructions in the Measure and Improve Your Application Resilience with AWS Resilience Hub blog post.

Single AZ architecture

Figure 1. Single AZ architecture

Even with only a single Amazon EC2 instance, it is still useful to include an Elastic Load Balancer. This lets you configure health checks performed against the EC2 instance. It also makes it easier to add more EC2 instances later. The Amazon EC2 Auto Scaling group helps improve resiliency—if the EC2 instance fails its health check, the Amazon EC2 Auto Scaling group will replace it.

Resilience Hub assessment results for the single AZ architecture

Figure 2 shows the results from Resilience Hub; it’s showing a lot of red flags! This architecture does not meet my required RTO (Recovery Time Objective) and RPO (Recovery Point Objective) goals for resiliency, and it is unrecoverable for all failure types. Figure 2 shows that Resilience Hub assesses for several failure types, including failures in the workload application (bad code or data), infrastructure (component failure), or individual AZ availability. AZs within a Region are independent of each other, even if one AZ experiences issues, the other AZs will remain available. The single AZ architecture does not take advantage of that.

Resilience Hub assessment of the single AZ architecture

Figure 2. Resilience Hub assessment of the single AZ architecture

Figure 3 shows the component-level assessment of resiliency where each component corresponds to a part of the single AZ architecture. The results show that the S3 bucket does well. S3 is resilient to AZ failures and stores data across multiple AZs, which results in high data durability.

However, having a single RDS instance means that if the instance fails (infrastructure), or the AZ containing the instance fails (AZ) then it cannot operate (unrecoverable RTO), and the data will be lost (unrecoverable RPO). Similarly, deploying only one NAT gateway leaves the architecture vulnerable if the AZ experiences issues, so it shows as unrecoverable for AZ disruptions. But, because the NAT gateway is a fully managed service, there is no hardware to manage, and therefore it shows as resilient (0s RTO) for infrastructure issues.

Resilience Hub component-level assessment of the single AZ architecture against cloud infrastructure failures

Figure 3. Resilience Hub component-level assessment of the single AZ architecture against cloud infrastructure failures

To improve the single AZ architecture’s resiliency, Resilience Hub recommends enabling multi-AZ for the RDS instance. This will set up a standby database instance in another AZ. For the NAT gateway, it suggests “Add NAT Gateways in multiple AZs. (i.e., every AZ you have resources in).” I’ll implement these suggestions in the next section.

Multi-AZ architecture

Figure 4 shows the multi-AZ architecture that I built based on Resilience Hub’s recommendations, which use the following Well-Architected Reliability best practices:

Well-Architected Best Practice Modification to Architecture
Deploy the workload to multiple locations I set up RDS instances, NAT gateways, and EC2 instances distributed across AZs.
Fail over to healthy resources If one EC2 instance fails, the Elastic Load Balancer will fail over and send traffic to the remaining healthy ones. If the primary RDS instance fails, the standby will be promoted to be the new primary.
Automate healing on all layers The Amazon EC2 Auto Scaling group will replace faulty EC2 instances, and the RDS failover is automatic.
Multi-AZ architecture

Figure 4. Multi-AZ architecture

In the next section, I’ll send this new architecture through Resilience Hub to check how much it has improved.

Resilience Hub assessment results for the multi-AZ architecture

As shown in Figure 5, the architecture still has some problems, but it’s looking much better! Resilience Hub has highlighted application failure as the possible source of resilience issues, so let’s dive into application RTO and RPO.

Resilience Hub assessment of the multi-AZ architecture

Figure 5. Resilience Hub assessment of the multi-AZ architecture

Resilience Hub component-level assessment of the multi-AZ architecture against customer application failures

Figure 6. Resilience Hub component-level assessment of the multi-AZ architecture against customer application failures

By making RDS multi-AZ, data is replicated to a standby instance. If your infrastructure or AZ fails, the system will fail over to the standby instance.

Application failures are different. If the data is corrupted or deleted due to a bug, accident, or unauthorized action, then that deletion or corruption will be replicated to the standby. The standby does not protect you in this case. Similarly, with S3, I need to protect against unwanted deletion or corruption by the application, so let’s see what Resilience Hub recommends.

As shown in Figure 7, for RDS, I can enable instance backup, with a suggested retention period of 7 days. By doing this, I can achieve an RPO of 5 minutes because using automatic backup in RDS allows me to restore a DB instance to any specific time during the backup retention period. The latest restorable time for a DB instance is typically within 5 minutes of the current time. The RTO is longer because it takes time to restore a new RDS instance from backup.

Resilience Hub suggestions to improve RDS resiliency in the architecture

Figure 7. Resilience Hub suggestions to improve RDS resiliency in the architecture

As shown in Figure 8, for S3, it recommends I enable versioning. This feature allows me to roll back any change to an object (including deletion) to a last known good state. This means zero data loss and a 0s RPO.

Resilience Hub suggestions to improve S3 resiliency in the architecture

Figure 8. Resilience Hub suggestions to improve S3 resiliency in the architecture

Let’s implement these suggestions!

Multi-AZ architecture with data backup

Figure 9 shows the multi-AZ architecture that incorporates data backup features.

Multi-AZ architecture incorporating data backup features

Figure 9. Multi-AZ architecture incorporating data backup features

Resilience Hub has recommended more revisions to this architecture based on AWS Well-Architected Reliability best practices:

Well-Architected Best Practice Modification to Architecture
Identify and back up all data that needs to be backed up The data in my RDS database and S3 bucket are backed up.
Perform data backup automatically RDS backups are automatic. S3 object versioning is also automatic.

Resilience Hub final assessment results

And…that’s it! You’ll see in Figure 10 that I’ve achieved the goals I set for RTO and RPO and my architecture is more resilient and reliable.

Resilience Hub assessment of the multi-AZ architecture after incorporating data backup features

Figure 10. Resilience Hub assessment of the multi-AZ architecture after incorporating data backup features

Resilience Hub has even more recommendations for things I could improve. For example, if I switch from MySQL RDS to Amazon Aurora I can use backtracking to reduce the 1 hour 40 minute RTO that it takes to restore an RDS database backup. Backtracking “rewinds” the DB cluster to the time you specify, so you can restore a last known good state without needing to recreate the entire database from backup, which saves time and reduces RTO.

Conclusion

To improve the resiliency of your workloads, you need to apply the right best practices. This blog post shows you how Resilience Hub can help you assess and improve a workload with poor resiliency, identify areas for improvement, implement best practices, and evaluate how those practices meet your resiliency goals.

Optimizing Apache Flink on Amazon EKS using Amazon EC2 Spot Instances

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/optimizing-apache-flink-on-amazon-eks-using-amazon-ec2-spot-instances/

This post is written by Kinnar Sen, Senior EC2 Spot Specialist Solutions Architect

Apache Flink is a distributed data processing engine for stateful computations for both batch and stream data sources. Flink supports event time semantics for out-of-order events, exactly-once semantics, backpressure control, and optimized APIs. Flink has connectors for third-party data sources and AWS Services, such as Apache Kafka, Apache NiFi, Amazon Kinesis, and Amazon MSK. Flink can be used for Event Driven (Fraud Detection), Data Analytics (Ad-Hoc Analysis), and Data Pipeline (Continuous ETL) applications. Amazon Elastic Kubernetes Service (Amazon EKS) is the chosen deployment option for many AWS customers for Big Data frameworks such as Apache Spark and Apache Flink. Flink has native integration with Kubernetes allowing direct deployment and dynamic resource allocation.

In this post, I illustrate the deployment of scalable, highly available (HA), resilient, and cost optimized Flink application using Kubernetes via Amazon EKS and Amazon EC2 Spot Instances (Spot). Learn how to save money on big data streaming workloads by implementing this solution.

Overview

Amazon EC2 Spot Instances

Amazon EC2 Spot Instances let you take advantage of spare EC2 capacity in the AWS Cloud and are available at up to a 90% discount compared to On-Demand Instances. Spot Instances receive a two-minute warning when these instances are about to be reclaimed by Amazon EC2. There are many graceful ways to handle the interruption. Recently EC2 Instance rebalance recommendation has been added to send proactive notifications when a Spot Instance is at elevated risk of interruption. Spot Instances are a great way to scale up and increase throughput of Big Data workloads and has been adopted by many customers.

Apache Flink and Kubernetes

Apache Flink is an adaptable framework and it allows multiple deployment options and one of them being Kubernetes. Flink framework has a couple of key building blocks.

  • Job Client submits the job in form of a JobGraph to the Job Manager.
  • Job Manager plays the role of central work coordinator which distributes the job to the Task Managers.
  • Task Managers are the worker component, which runs the operators for source, transformations and sinks.
  • External components which are optional such as Resource Provider, HA Service Provider, Application Data Source, Sinks etc., and this varies with the deployment mode and options.

Image shows Flink application deployment architecture with Job Manager, Task Manager, Scheduler, Data Flow Graph, and client.

Flink supports different deployment (Resource Provider) modes when running on Kubernetes. In this blog we will use the Standalone Deployment mode, as we just want to showcase the functionality. We recommend first-time users however to deploy Flink on Kubernetes using the Native Kubernetes Deployment.

Flink can be run in different modes such as Session, Application, and Per-Job. The modes differ in cluster lifecycle, resource isolation and execution of the main() method. Flink can run jobs on Kubernetes via Application and Session Modes only.

  • Application Mode: This is a lightweight and scalable way to submit an application on Flink and is the preferred way to launch application as it supports better resource isolation. Resource isolation is achieved by running a cluster per job. Once the application shuts down all the Flink components are cleaned up.
  • Session Mode: This is a long running Kubernetes deployment of Flink. Multiple applications can be launched on a cluster and the applications competes for the resources. There may be multiple jobs running on a TaskManager in parallel. Its main advantage is that it saves time on spinning up a new Flink cluster for new jobs, however if one of the Task Managers fails it may impact all the jobs running on that.

Amazon EKS

Amazon EKS is a fully managed Kubernetes service. EKS supports creating and managing Spot Instances using Amazon EKS managed node groups following Spot best practices. This enables you to take advantage of the steep savings and scale that Spot Instances provide for interruptible workloads. EKS-managed node groups require less operational effort compared to using self-managed nodes. You can learn more in the blog “Amazon EKS now supports provisioning and managing EC2 Spot Instances in managed node groups.”

Apache Flink and Spot

Big Data frameworks like Spark and Flink are distributed to manage and process high volumes of data. Designed for failure, they can run on machines with different configurations, inherently resilient and flexible. Spot Instances can optimize runtimes by increasing throughput, while spending the same (or less). Flink can tolerate interruptions using restart and failover strategies.

Fault Tolerance

Fault tolerance is implemented in Flink with the help of check-pointing the state. Checkpoints allow Flink to recover state and positions in the streams. There are two per-requisites for check-pointing a persistent data source (Apache Kafka, Amazon Kinesis) which has the ability to replay data and a persistent distributed storage to store state (Amazon Simple Storage Service (Amazon S3), HDFS).

Cost Optimization

Job Manager and Task Manager are key building blocks of Flink. The Task Manager is the compute intensive part and Job Manager is the orchestrator. We would be running Task Manager on Spot Instances and Job Manager on On Demand Instances.

Scaling

Flink supports elastic scaling via Reactive Mode, Task Managers can be added/removed based on metrics monitored by an external service monitor like Horizontal Pod Autoscaling (HPA). When scaling up new pods would be added, if the cluster has resources they would be scheduled it not then they will go in pending state. Cluster Autoscaler (CA) detects pods in pending state and new nodes will be added by EC2 Auto Scaling. This is ideal with Spot Instances as it implements elastic scaling with higher throughput in a cost optimized way.

Tutorial: Running Flink applications in a cost optimized way

In this tutorial, I review steps, which help you launch cost optimized and resilient Flink workloads running on EKS via Application mode. The streaming application will read dummy Stock ticker prices send to an Amazon Kinesis Data Stream by Amazon Kinesis Data Generator, try to determine the highest price within a per-defined window, and output will be written onto Amazon S3 files.

Image shows Flink application pipeline with data flowing from Amazon Kinesis Data Generator to Kinesis Data Stream, processed in Apache Flink and output being written in Amazon S3

The configuration files can be found in this github location. To run the workload on Kubernetes, make sure you have eksctl and kubectl command line utilities installed on your computer or on an AWS Cloud9 environment. You can run this by using an AWS IAM user or role that has the Administrator Access policy attached to it, or check the minimum required permissions for using eksctl. The Spot node groups in the Amazon EKS cluster can be launched both in a managed or a self-managed way, in this post I use the EKS Managed node group for Spot Instances.

Steps

When we deploy Flink in Application Mode it runs as a single application. The cluster is exclusive for the job. We will be bundling the user code in the Flink image for that purpose and upload in Amazon Elastic Container Registry (Amazon ECR). Amazon ECR is a fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.

1. Build the Amazon ECR Image

  • Login using the following cmd and don’t forget to replace the AWS_REGION and AWS_ACCOUNT_ID with your details.

aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS —password-stdin ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

  • Create a repository

aws ecr create-repository --repository-name flink-demo --image-scanning-configuration scanOnPush=true —region ${AWS_REGION}

  • Build the Docker image:

Download the Docker file. I am using multistage docker build here. The sample code is from Github’s Amazon Kinesis Data Analytics Java examples. I modified the code to allow checkpointing and change the sliding window interval. Build and push the docker image using the following instructions.

docker build --tag flink-demo .

  • Tag and Push your image to Amazon ECR

docker tag flink-demo:latest ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/flink-demo:latest
docker push ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.
amazonaws.com/flink-demo:latest

2. Create Amazon S3/Amazon Kinesis Access Policy

First, I must create an access policy to allow the Flink application to read/write from Amazon fFS3 and read Kinesis data streams. Download the Amazon S3 policy file from here and modify the <<output folder>> to an Amazon S3 bucket which you have to create.

  • Run the following to create the policy. Note the ARN.

aws iam create-policy --policy-name flink-demo-policy --policy-document file://flink-demo-policy.json

3. Cluster and node groups deployment

  • Create an EKS cluster using the following command:

eksctl create cluster –name= flink-demo --node-private-networking --without-nodegroup --asg-access –region=<<AWS Region>>

The cluster takes approximately 15 minutes to launch.

  • Create the node group using the nodeGroup config file. I am using multiple nodeGroups of different sizes to adapt Spot best practice of diversification.  Replace the <<Policy ARN>> string using the ARN string from the previous step.

eksctl create nodegroup -f managedNodeGroups.yml

  • Download the Cluster Autoscaler and edit it to add the cluster-name (flink-demo)

curl -LO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

4. Install the Cluster AutoScaler using the following command:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

  • Using EKS Managed node groups requires significantly less operational effort compared to using self-managed node group and enables:
    • Auto enforcement of Spot best practices.
    • Spot Instance lifecycle management.
    • Auto labeling of Pods.
  • eksctl has integrated amazon-ec2-instance-selector to enable auto selection of instances based on the criteria passed. This has multiple benefits
    • ‘instance diversification’ is implemented by enabling multiple instance types selection in the node group which works well with CA
    • Reduces manual effort of selecting the instances.
  • We can create node group manifests using ‘dryrun’ and then create node groups using that.

eksctl create cluster --name flink-demo --instance-selector-vcpus=2 --instance-selector-memory=4 --dry-run

eksctl create node group -f managedNodeGroups.yml

5. Create service accounts for Flink

$ kubectl create serviceaccount flink-service-account
$ kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account

6. Deploy Flink

This install folder here has all the YAML files required to deploy a standalone Flink cluster. Run the install.sh file. This will deploy the cluster with a JobManager, a pool of TaskManagers and a Service exposing JobManager’s ports.

  • This is a High-Availability(HA) deployment of Flink with the use of Kubernetes high availability service.
  • The JobManager runs on OnDemand and TaskManager on Spot. As the cluster is launched in Application Mode, if a node is interrupted only one job will be restarted.
  • Autoscaling is enabled by the use of ‘Reactive Mode’. Horizontal Pod Autoscaler is used to monitor the CPU load and scale accordingly.
  • Check-pointing is enabled which allows Flink to save state and be fault tolerant.

Image shows the Flink dashboard highlighting checkpoints for a job

7. Create Amazon Kinesis data stream and send dummy data      

Log in to AWS Management Console and create a Kinesis data stream name ‘ExampleInputStream’. Kinesis Data Generator is used to send data to the data stream. The template of the dummy data can be found here. Once this sends data the Flink application starts processing.

Image shows Amazon Kinesis Data Generator console sending data to Kinesis Data Strea

Observations

Spot Interruptions

If there is an interruption then the Flick application will be restarted using check-pointed data. The JobManager will restore the job as highlighted in the following log. The node will be replaced automatically by the Managed Node Group.

mage shows logs from a Flink job highlighting job restart using checkpoints.

One will be able to observe the graceful restart in the Flink UI.

Image shows the Flink dashboard highlighting job restart after failure.

AutoScaling

You can observe the elastic scaling using logs. The number of TaskManagers in the Flink UI will also reflect the scaling state.

Image shows kubectl output showing status of HPA during scale-out

Cleanup

If you are trying out the tutorial, run the following steps to make sure that you don’t encounter unwanted costs.

  • Run the delete.sh file.
  • Delete the EKS cluster and the node groups:
    • eksctl delete cluster --name flink-demo
  • Delete the Amazon S3 Access Policy:
    • aws iam delete-policy --policy-arn <<POLICY ARN>>
  • Delete the Amazon S3 Bucket:
    • aws s3 rb --force s3://<<S3_BUCKET>>
  • Delete the CloudFormation stack related to Kinesis Data Generator named ‘Kinesis-Data-Generator-Cognito-User’
  • Delete the Kinesis Data Stream.

Conclusion

In this blog, I demonstrated how you can run Flink workloads on a Kubernetes Cluster using Spot Instances, achieving scalability, resilience, and cost optimization. To cost optimize your Flink based big data workloads you should start thinking about using Amazon EKS and Spot Instances.

Deep Dive on Amazon EC2 VT1 Instances

Post Syndicated from Rick Armstrong original https://aws.amazon.com/blogs/compute/deep-dive-on-amazon-ec2-vt1-instances/

This post is written by:  Amr Ragab, Senior Solutions Architect; Bryan Samis, Principal Elemental SSA; Leif Reinert, Senior Product Manager

Introduction

We at AWS are excited to announce that new Amazon Elastic Compute Cloud (Amazon EC2) VT1 instances are now generally available in the US-East (N. Virginia), US-West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. This instance family provides dedicated video transcoding hardware in Amazon EC2 and offers up to 30% lower cost per stream as compared to G4dn GPU based instances or 60% lower cost per stream as compared to C5 CPU based instances. These instances are powered by Xilinx Alveo U30 media accelerators with up to eight U30 media accelerators per instance in the vt1.24xlarge. Each U30 accelerator comes with two XCU30 Zynq UltraScale+ SoCs, totaling 16 addressable devices in the vt1.24xlarge instance with H.264/H.265 Video Codec Units (VCU) cores.

Each U30 accelerator card comes with two XCU30 Zynq UltraScale+ SoCs

Currently, the VT1 family consists of three sizes, as summarized in the following:

Instance Type vCPUs RAM U30 accelerator cards Addressable XCU30 SoCs 
vt1.3xlarge 12 24 1 2
vt1.6xlarge 24 48 2 8
vt1.24xlarge 96 182 8 16

Each addressable XCU30 SoC device supports:

  • Codec: MPEG4 Part 10 H.264, MPEG-H Part 2 HEVC H.265
  • Resolutions: 128×128 to 3840×2160
  • Flexible rate control: Constant Bitrate (CBR), Variable Bitrate(VBR), and Constant Quantization Parameter(QP)
  • Frame Scan Types: Progressive H.264/H.265
  • Input Color Space: YCbCr 4:2:0, 8-bit per color channel.

The following table outlines the number of transcoding streams per addressable device and instance type:

Transcoding Each XCU30 SoC vt1.3xlarge vt1.6xlarge vt1.24xlarge
3840x2160p60 1 2 4 16
3840x2160p30 2 4 8 32
1920x1080p60 4 8 16 64
1920x1080p30 8 16 32 128
1280x720p30 16 32 64 256
960x540p30 24 48 92 384

Each XCU30 SoC can support the following stream densities: 1x 4kp60, 2x 4kp30, 4x 1080p60, 8x 1080p30, 16x 720p30

Customers with applications such as live broadcast, video conferencing and just-in-time transcoding can now benefit from a dedicated instance family devoted to video encoding and decoding with rescaling optimizations at the lowest cost per stream. This dedicated instance family lets customers run batch, real-time, and faster than real-time transcoding workloads.

Deployment and Quick Start

To get started, you launch a VT1 instance with prebuilt VT1 Amazon Machine Images (AMIs), available on the AWS Marketplace. However, if you have AMI hardening requirements or other requirements that require you to install the Xilinx software stack, you can reference the Xilinx Video SDK documentation for VT1.

The software stack utilizes a driver suite that is a combination of the driver stack as well as management and client tools. The following terminology will be used in this instance family:

  • XRT – Xilinx Runtime Library
  • XRM – Xilinx Runtime Management Library
  • XCDR – Xilinx Video Transcoding SDK
  • XMA – Xilinx Media Accelerator API and Samples
  • XOCL – Xilinx driver (xocl)

To run workloads directly on Amazon EC2 instances, you must load both the XRT and XRM stack. These are conveniently provided by loading the XCDR environment. To load the devices, run the following:

source /opt/xilinx/xcdr/setup.sh

With the output:

-----Source Xilinx U30 setup files-----
XILINX_XRT        : /opt/xilinx/xrt
PATH              : /opt/xilinx/xrt/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
LD_LIBRARY_PATH   : /opt/xilinx/xrt/lib:
PYTHONPATH        : /opt/xilinx/xrt/python:
XILINX_XRM      : /opt/xilinx/xrm
PATH            : /opt/xilinx/xrm/bin:/opt/xilinx/xrt/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
LD_LIBRARY_PATH : /opt/xilinx/xrm/lib:/opt/xilinx/xrt/lib:
Number of U30 devices found : 16  

Running Containerized Workloads on Amazon ECS and Amazon EKS

To help build AMIs for Amazon Linux2, Ubuntu 18/20, Amazon ECS and Amazon Elastic Kubernetes Service (Amazon EKS), we have provided a Github project in order to simplify the build process utilizing Packer:

https://github.com/aws-samples/aws-vt-baseami-pipeline

At the time of writing, Xilinx does not have an officially supported container runtime. However, it is possible to pass the specific devices in the docker run ... stanza, and in order to set this environment download this specific script. The following example is the output for vt1.24xlarge:

[[email protected] ~]$ source xilinx_aws_docker_setup.sh
XILINX_XRT : /opt/xilinx/xrt
PATH : /opt/xilinx/xrt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin
LD_LIBRARY_PATH  : /opt/xilinx/xrt/lib:
PYTHONPATH : /opt/xilinx/xrt/python:
XILINX_AWS_DOCKER_DEVICES : --device=/dev/dri/renderD128:/dev/dri/renderD128
--device=/dev/dri/renderD129:/dev/dri/renderD129
--device=/dev/dri/renderD130:/dev/dri/renderD130
--device=/dev/dri/renderD131:/dev/dri/renderD131
--device=/dev/dri/renderD132:/dev/dri/renderD132
--device=/dev/dri/renderD133:/dev/dri/renderD133
--device=/dev/dri/renderD134:/dev/dri/renderD134
--device=/dev/dri/renderD135:/dev/dri/renderD135
--device=/dev/dri/renderD136:/dev/dri/renderD136
--device=/dev/dri/renderD137:/dev/dri/renderD137
--device=/dev/dri/renderD138:/dev/dri/renderD138
--device=/dev/dri/renderD139:/dev/dri/renderD139
--device=/dev/dri/renderD140:/dev/dri/renderD140
--device=/dev/dri/renderD141:/dev/dri/renderD141
--device=/dev/dri/renderD142:/dev/dri/renderD142
--device=/dev/dri/renderD143:/dev/dri/renderD143
--mount type=bind,source=/sys/bus/pci/devices/0000:00:1d.0,target=/sys/bus/pci/devices/0000:00:1d.0 --mount type=bind,source=/sys/bus/pci/devices/0000:00:1e.0,target=/sys/bus/pci/devices/0000:00:1e.0

Once the devices have been enumerated, start the workload by running:

docker run -it $XILINX_AWS_DOCKER_DEVICES <image:tag>

Amazon EKS Setup

To launch an EKS cluster with VT1 instances, create the AMI from the scripts provided in the repo earlier.

https://github.com/aws-samples/aws-vt-baseami-pipeline

Once the AMI is created, launch an EKS cluster:

eksctl create cluster --region us-east-1 --without-nodegroup --version 1.19 \
       --zones us-east-1c,us-east-1d

Once the cluster is created, substitute the values for the cluster name, subnets, and AMI IDs in the following template.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: <cluster-name>
  region: us-east-1

vpc:
  id: vpc-285eb355
  subnets:
    public:
      endpoint-one:
        id: subnet-5163b237
      endpoint-two:
        id: subnet-baff22e5

managedNodeGroups:
  - name: vt1-ng-1d
    instanceType: vt1.3xlarge
    volumeSize: 200
    instancePrefix: vt1-ng-1d-worker
    ami: <ami-id>
    iam:
      withAddonPolicies:
        imageBuilder: true
        autoScaler: true
        ebs: true
        fsx: true
        cloudWatch: true
    ssh:
      allow: true
      publicKeyName: amrragab-aws
    subnets:
    - endpoint-one
    minSize: 1
    desiredCapacity: 1
    maxSize: 4
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh <cluster-name>

Save this file, and then deploy the nodegroup.

eksctl create nodegroup -f vt1-managed-ng.yaml

Once deployed, apply the FPGA U30 device plugin. The daemonset container is available on the Amazon Elastic Container Registry (ECR) public gallery. You can also access the daemonset deployment file.

kubectl apply -f xilinx-device-plugin.yml

Confirm that the Xilinx U30 device(s) are seen by K8s API server and can be allocatable in your job.

Capacity:
  attachable-volumes-aws-ebs:                  39
  cpu:                                         12
  ephemeral-storage:                           209702892Ki
  hugepages-1Gi:                               0
  hugepages-2Mi:                               0
  memory:                                      23079768Ki
  pods:                                        15
  xilinx.com/fpga-xilinx_u30_gen3x4_base_1-0:  1
Allocatable:
  attachable-volumes-aws-ebs:                  39
  cpu:                                         11900m
  ephemeral-storage:                           192188443124
  hugepages-1Gi:                               0
  hugepages-2Mi:                               0
  memory:                                      22547288Ki
  pods:                                        15
  xilinx.com/fpga-xilinx_u30_gen3x4_base_1-0:  1

Video Quality Analysis

The video quality produced by the U30 is roughly equivalent to the “faster” profile in the x264 and x265 codecs, or the “p4” preset using the nvenc codec on G4dn. For example, in the following test we encoded the same UHD (4K) video at multiple bitrates into H264, and then compared Video Multimethod Assessment Fusion (VMAF) scores:

Plotting VMAF and bitrate we see comparable quality across x264 faster, h264_nvenc p4 and u30

Stream Density and Encoding Performance

To illustrate the VT1 instance family stream density and encoding performance, let’s look at the smallest instance, the vt1.3xlarge, which can encode up to eight simultaneous 1080p60 streams into H.264. We chose a set of similar instances at a close price point, and then compared how many 1080p60 H264 streams they could encode simultaneously to an equivalent quality:

Column 1 Column 2 Column 3 Column 4 Column 5
Instance Codec us-east-1 Hourly Price* 1080p60 Streams / Instance Hourly Cost / Stream
c5.4xlarge x264 $0.680 2 $0.340
c6g.4xlarge x264 $0.544 2 $0.272
c5a.4xlarge x264 $0.616 3 $0.205
g4dn.xlarge nvenc $0.526 4 $0.132
vt1.3xlarge xma $0.650 8 $0.081

* Prices accurate as of the publishing date of this article.

As you can see, the vt1.3xlarge instance can encode four times as many streams as the c5.4xlarge, and at a lower hourly cost. It can also encode two times the number of streams as a g4dn.xlarge instance. Thus, yielding in this example a cost per stream reduction of up to 76% over c5.4xlarge, and up to 39% compared to g4dn.xlarge.

Faster than Real-time Transcoding

In addition to encoding multiple live streams in parallel, VT1 instances can also be utilized to encode file-based content at faster-than-real-time performance. This can be done by over-provisioning resources on a single XCU30 device so that more resources are dedicated to transcoding than are necessary to maintain real-time.

For example, running the following command (specifying -cores 4) will utilize all resources on a single XCU30 device, and yield an encode speed of approximately 177 FPS, or 2.95 times faster than real-time for a 60 FPS source:

$ ffmpeg -c:v mpsoc_vcu_h264 -i input_1920x1080p60_H264_8Mbps_AAC_Stereo.mp4 -f mp4 -b:v 5M -c:v mpsoc_vcu_h264 -cores 4 -slices 4 -y /tmp/out.mp4
frame=43092 fps=177 q=-0.0 Lsize= 402721kB time=00:11:58.92 bitrate=4588.9kbits/s speed=2.95x

To maximize FPS further, utilize the “split and stitch” operation to break the input file into segments, and then transcode those in parallel across multiple XCU30 chips or even multiple U30 cards in an instance. Then, recombine the file at the output. For more information, see the Xilinx Video SDK documentation on Faster than Real-time transcoding.

Using the provided example script on the same 12-minute source file as the preceding example on a vt1.3xlarge, we can utilize both addressable devices on the U30 card at once in order to yield an effective encode speed of 512 fps, or 8.5 times faster than real-time.

$ python3 13_ffmpeg_transcode_only_split_stitch.py -s input_1920x1080p60_H264_8Mbps_AAC_Stereo.mp4 -d /tmp/out.mp4 -i h264 -o h264 -b 5.0

There are 1 cards, 2 chips in the system
...
Time from start to completion : 84 seconds (1 minute and 24 seconds)
This clip was processed 8.5 times faster than realtime

This clip was effectively processed at 512.34 FPS

Conclusion

We are excited to launch VT1, our first EC2 instance with dedicated hardware acceleration for video transcoding, which provides up to 30% lower cost per stream as compared to G4dn or 60% lower cost per stream as compared to G5. With up to eight Xilinx Alveo U30 media accelerators, you can parallelize up to 16 4K UHD streams, for batch, real-time, and faster than real-time transcoding. If you have any questions, reach out to your account team. Now, go power up your video transcoding workloads with Amazon EC2 VT1 instances.

Cybersecurity Awareness Month: Learn about the job zero of securing your data using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/cybersecurity-awareness-month-learn-about-the-job-zero-of-securing-your-data-using-amazon-redshift/

Amazon Redshift is the most widely used cloud data warehouse. It allows you to run complex analytic queries against terabytes to petabytes of structured and semi-structured data, using sophisticated query optimization, columnar on high-performance storage, and massively parallel query execution.

At AWS, we embrace the culture that security is job zero, by which we mean it’s even more important than any number one priority. AWS provides comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out of the box at no extra cost. Amazon Redshift uses the AWS security frameworks to implement industry-leading security in the areas of authentication, access control, auditing, logging, compliance, data protection, and network security.

Cybersecurity Awareness Month raises awareness about the importance of cybersecurity, ensuring everyone has the resources they need to be safer and more secure online. This post highlights some of the key out-of-the-box capabilities available in Amazon Redshift to manage your data securely.

Authentication

Amazon Redshift supports industry-leading security with built-in AWS Identity and Access Management (IAM) integration, identity federation for single sign-on (SSO), and multi-factor authentication. You can federate database user authentication easily with IAM and Amazon Redshift using IAM and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta.

To get started, see the following posts:

Access control

Granular row and column-level security controls ensure users see only the data they should have access to. You can achieve column-level access control for data in Amazon Redshift by using the column-level grant and revoke statements without having to implement views-based access control or use another system. Amazon Redshift is also integrated with AWS Lake Formation, which makes sure that column and row level access in Lake Formation is enforced for Amazon Redshift queries on the data in the data lake.

The following guides can help you implement fine-grained access control on Amazon Redshift:

Auditing and logging

To monitor Amazon Redshift for any suspicious activities, you can take advantage of the auditing and logging features. Amazon Redshift logs information about connections and user activities, which can be uploaded to Amazon Simple Storage Service (Amazon S3) if you enable the audit logging feature. The API calls to Amazon Redshift are logged to AWS CloudTrail, and you can create a log trail by configuring CloudTrail to upload to Amazon S3. For more details, see Database audit logging, Analyze logs using Amazon Redshift spectrum, Querying AWS CloudTrail Logs, and System object persistence utility.

Compliance

Amazon Redshift is assessed by third-party auditors for compliance with multiple programs. If your use of Amazon Redshift is subject to compliance with standards like HIPAA, PCI, or FedRAMP, you can find more details at Compliance validation for Amazon Redshift.

Data protection

To protect data both at rest and while in transit, Amazon Redshift provides options to encrypt the data. Although the encryption settings are optional, we highly recommend enabling them. When you enable encryption at rest for your cluster, it encrypts both the data blocks as well as the metadata, and there are multiple ways to manage the encryption key (see Amazon Redshift database encryption). To safeguard your data while it’s in transit from your SQL clients to the Amazon Redshift cluster, we highly recommend configuring the security options as described in Configuring security options for connections.

For additional data protection options, see the following resources:

Amazon Redshift enables you to use an AWS Lambda function as a UDF in Amazon Redshift. You can write Lambda UDFs to enable external tokenization of data dynamic data masking, as illustrated in Amazon Redshift – Dynamic Data Masking.

Network security

Amazon Redshift is a service that runs within your VPC. There are multiple configurations to ensure access to your Amazon Redshift cluster is secured, whether the connection is from an application within your VPC or an on-premises system. For more information, see VPCs and subnets. Amazon Redshift for AWS PrivateLink ensures that all API calls from your VPC to Amazon Redshift stay within the AWS network. For more information, see Connecting to Amazon Redshift using an interface VPC endpoint.

Customer success stories

You can run your security-demanding analytical workload using out-of-the-box features. For example, SoePay, a Hong Kong–based payments solutions provider, uses AWS Fargate and Amazon Elastic Container Service (Amazon ECS) to scale its infrastructure, AWS Key Management Service (AWS KMS) to manage cryptographic keys, and Amazon Redshift to store data from merchants’ smart devices.

With AWS services, GE Renewable Energy has created a data lake where it collects and analyses machine data captured at GE wind turbines around the world. GE relies on Amazon S3 to store and protect its ever-expanding collection of wind turbine data and Amazon Redshift to help them gain new insights from the data it collects.

For more customer stories, see Amazon Redshift customers.

Conclusion and Next Steps

In this post, we discussed some of the key out-of-the-box capabilities at no extra cost available in Amazon Redshift to manage your data securely, such as authentication, access control, auditing, logging, compliance, data protection, and network security.

You should periodically review your AWS workloads to ensure security best practices have been implemented. The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. This framework can help you learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Review your security pillar provided in this framework.

In addition, AWS Security Hub, an AWS service, provides a comprehensive view of your security state within AWS that helps you check your compliance with security industry standards and best practices.

To adhere to the security needs of your organization, you can automate the deployment of an Amazon Redshift cluster in an AWS account using AWS CloudFormation and AWS Service Catalog. For more information, see Automate Amazon Redshift cluster creation using AWS CloudFormation and Automate Amazon Redshift Cluster management operations using AWS CloudFormation.


About the Authors

Kunal Deep Singh is a Software Development Manager at Amazon Web Services (AWS) and leads development of security features for Amazon Redshift. Prior to AWS he has worked at Amazon Ads and Microsoft Azure. He is passionate about building customer solutions for cloud, data and security.

Thiyagarajan Arumugam is a Principal Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

Building Modern Applications with Amazon EKS on Amazon Outposts

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/building-modern-applications-with-amazon-eks-on-amazon-outposts/

This post is written by Brad Kirby, Principal Outposts Specialist, and Chris Lunsford, Senior Outposts SA. 

Customers are modernizing applications by deconstructing monolithic architectures and migrating application components into container–based, service-oriented, and microservices architectures. Modern applications improve scalability, reliability, and development efficiency by allowing services to be owned by smaller, more focused teams.

This post shows how you can combine Amazon Elastic Kubernetes Service (Amazon EKS) with AWS Outposts to deploy managed Kubernetes application environments on-premises alongside your existing data and applications.

For a brief introduction to the subject matter, the watch this video, which demonstrates:

  • The benefits of application modernization
  • How containers are an ideal enabling technology for microservices architectures
  • How AWS Outposts combined with Amazon container services enables you to unwind complex service interdependencies and modernize on-premises applications with low latency, local data processing, and data residency requirements

Understanding the Amazon EKS on AWS Outposts architecture

Amazon EKS

Many organizations chose Kubernetes as their container orchestration platform because of its openness, flexibility, and a growing pool of Kubernetes literate IT professionals. Amazon EKS enables you to run Kubernetes clusters on AWS without needing to install and manage the Kubernetes control plane. The control plane, including the API servers, scheduler, and cluster store services, runs within a managed environment in the AWS Region. Kubernetes clients (like kubectl) and cluster worker nodes communicate with the managed control plane via public and private EKS service endpoints.

AWS Outposts

The AWS Outposts service delivers AWS infrastructure and services to on-premises locations from the AWS Global Cloud Infrastructure. An Outpost functions as an extension of the Availability Zone (AZ) where it is anchored. AWS operates, monitors, and manages AWS Outposts infrastructure as part of its parent Region. Each Outpost connects back to its anchor AZ via a Service Link connection (a set of encrypted VPN tunnels). AWS Outposts extend Virtual Private Cloud (VPC) environments from the Region to on-premises locations and enables you to deploy VPC subnets to Outposts in your data center and co-location spaces. The Outposts Local Gateway (LGW) routes traffic between Outpost VPC subnets and the on-premises network.

Amazon EKS on AWS Outposts

You deploy Amazon EKS worker nodes to Outposts using self-managed node groups. The worker nodes run on Outposts and register with the Kubernetes control plane in the AWS Region. The worker nodes, and containers running on the nodes, can communicate with AWS services and resources running on the Outpost and in the region (via the Service Link) and with on-premises networks (via the Local Gateway).

 

You use the same AWS and Kubernetes tools and APIs to work with EKS on Outposts nodes that you use to work with EKS nodes in the Region. You can use eksctl, the AWS Management Console, AWS CLI, or infrastructure as code (IaC) tools like AWS CloudFormation or HashiCorp Terraform to create self-managed node groups on AWS Outposts.

Amazon EKS self-managed node groups on AWS Outposts

Like many customers, you might use managed node groups when you deploy EKS worker nodes in your Region, and you may be wondering, “what are self-managed node groups?”

Self-managed node groups, like managed node groups, use standard Amazon Elastic Compute Cloud (Amazon EC2) instances in EC2 Auto Scaling groups to deploy, scale-up, and scale-down EKS worker nodes using Amazon EKS optimized Amazon Machine Images (AMIs). Amazon configures the EKS optimized AMIs to work with the EKS service. The images include Docker, kubelet, and the AWS IAM Authenticator. The AMIs also contain a specialized bootstrap script /etc/eks/bootstrap.sh that allows worker nodes to discover and connect to your cluster control plane and add Kubernetes labels that identify the nodes in the cluster.

What makes the node groups self-managed? Managed node groups have additional features that simplify updating nodes in the node group. With self-managed nodes, you must implement a process to update or replace your node group when you want to update the nodes to use a new Amazon EKS optimized AMI.

You create self-managed node groups on AWS Outposts using the same process and resources that you use to deploy EC2 instances using EC2 Auto Scaling groups. All instances in a self-managed node group must:

  • Be the same Instance type
  • Be running the same Amazon Machine Image (AMI)
  • Use the same Amazon EKS node IAM role
  • Be tagged with the io/cluster/<cluster-name>=owned tag

Additionally, to deploy on AWS Outposts, instances must:

  • Use encrypted EBS volumes
  • Launch in Outposts subnets

Kubernetes authenticates all API requests to a cluster. You must configure an EKS cluster to allow nodes from a self-managed node group to join the cluster. Self-managed nodes use the node IAM role to authenticate with an EKS cluster. Amazon EKS clusters use the AWS IAM Authenticator for Kubernetes to authenticate requests and Kubernetes native Role Based Access Control (RBAC) to authorize requests. To enable self-managed nodes to register with a cluster, configure the AWS IAM Authenticator to recognize the node IAM role and assign the role to the system:bootstrappers and system:nodes RBAC groups.

In the following tutorial, we take you through the steps required to deploy EKS worker nodes on an Outpost and register those nodes with an Amazon EKS cluster running in the Region. We created a sample Terraform module aws-eks-self-managed-node-group to help you get started quickly. If you are interested, you can dive into the module sample code to see the detailed configurations for self-managed node groups.

Deploying Amazon EKS on AWS Outposts (Terraform)

To deploy Amazon EKS nodes on an AWS Outposts deployment, you must complete two high-level steps:

Step 1: Create a self-managed node group

Step 2: Configure Kubernetes to allow the nodes to register

We use Terraform in this tutorial to provide visibility (through the sample code) into the details of the configurations required to deploy self-managed node groups on AWS Outposts.

Prerequisites

To follow along with this tutorial, you should have the following prerequisites:

  • An AWS account.
  • An operational AWS Outposts deployment (Outpost) associated with your AWS account.
  • AWS Command Line Interface (CLI) version 1.25 or later installed and configured on your workstation.
  • HashiCorp Terraform version 14.6 or later installed on your workstation.
  • Familiarity with Terraform HashiCorp Configuration Language (HCL) syntax and experience using Terraform to deploy AWS resources.
  • An existing VPC that meets the requirements for an Amazon EKS cluster. For more information, see Cluster VPC considerations. You can use the Getting started with Amazon EKS guide to walk you through creating a VPC that meets the requirements.
  • An existing Amazon EKS cluster. You can use the Creating an Amazon EKS cluster guide to walk you through creating the cluster.
  • Tip: We recommend creating and using a new VPC and Amazon EKS cluster to complete this tutorial. Procedures like modifying the aws-auth ConfigMap on the cluster may impact other nodes, users, and workloads on the cluster. Using a new VPC and Amazon EKS cluster will help minimize the risk of impacting other applications as you complete the tutorial.
  • Note: Do not reference subnets deployed on AWS Outposts when creating Amazon EKS clusters in the Region. The Amazon EKS cluster control plane runs in the Region and attaches to VPC subnets in the Region availability zones.
  • A symmetric KMS key to encrypt the EBS volumes of the EKS worker nodes. You can use the alias/aws/ebs AWS managed key for this prerequisite.

Using the sample code

The source code for the amazon-eks-self-managed-node-group Terraform module is available in AWS-Samples on GitHub.

Setup

To follow along with this tutorial, complete the following steps to setup your workstation:

  1. Open your terminal.
  2. Make a new directory to hold your EKS on Outposts Terraform configurations.
  3. Change to the new directory.
  4. Create a new file named providers.tf with the following contents:
    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.27"
        }
    
        kubernetes = {
          source  = "hashicorp/kubernetes"
          version = "~> 1.13.3"
        }
      }
    }

  5. Keep your terminal open and do all the work in this tutorial from this directory.

Step 1: Create a self-managed node group

To create a self-managed node group on your Outpost

  1. Create a new Terraform configuration file named self-managed-node-group.tf.
  2. Configure the aws Terraform provider with your AWS CLI profile and Outpost parent Region:
    provider "aws" {
      region  = "us-west-2"
      profile = "default"
    }
    

  3. Configure the aws-eks-self-managed-node-group module with the following (minimum) arguments:
    • eks_cluster_name the name of your EKS cluster
    • instance_type an instance type supported on your AWS Outposts deployment
    • desired_capacity, min_size, and max_size as desired to control the number of nodes in your node group (ensure that your Outpost has sufficient resources available to run the desired number nodes of the specified instance type)
    • subnets the subnet IDs of the Outpost subnets where you want the nodes deployed
    • (Optional) Kubernetes node_labels to apply to the nodes
    • Allow ebs_encrypted and configure the ebs_kms_key_arn with KMS key you want to use to encrypt the nodes’ EBS volumes (required for Outposts deployments)

Example:

module "eks_self_managed_node_group" {
  source = "github.com/aws-samples/amazon-eks-self-managed-node-group"

  eks_cluster_name = "cmluns-eks-cluster"
  instance_type    = "m5.2xlarge"
  desired_capacity = 1
  min_size         = 1
  max_size         = 1
  subnets          = ["subnet-0afb721a5cc5bd01f"]

  node_labels = {
    "node.kubernetes.io/outpost"    = "op-0d4579457ff2dc345"
    "node.kubernetes.io/node-group" = "node-group-a"
  }

  ebs_encrypted   = true
  ebs_kms_key_arn = "arn:aws:kms:us-west-2:799838960553:key/0e8f15cc-d3fc-4da4-ae03-5fadf45cc0fb"
}

You may configure other optional arguments as appropriate for your use case – see the module README for details.

Step 2: Configure Kubernetes to allow the nodes to register

Use the following procedures to configure Terraform to manage the AWS IAM Authenticator (aws-auth) ConfigMap on the cluster. This adds the node-group IAM role to the IAM authenticator and Kubernetes RBAC groups.

Configure the Terraform Kubernetes Provider to allow Terraform to interact with the Kubernetes control plane.

Note: If you add a node group to a cluster with existing node groups, mapped IAM roles, or mapped IAM users, the aws-auth ConfigMap may already be configured on your cluster. If the ConfigMap exists, you must download, edit, and replace the ConfigMap on the cluster using kubectl. We do not provide a procedure for this operation as it may affect workloads running on your cluster. Please see the section Managing users or IAM roles for your cluster in the Amazon EKS User Guide for more information.

To check if your cluster has the aws-auth ConfigMap configured

  1. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information needed to connect to your cluster, substituting your <region> and <cluster-name>.
❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
  1. Run the kubectl describe configmap -n kube-system aws-auth
  • If you receive an error stating, Error from server (NotFound): configmaps "aws-auth" not found, then proceed with the following procedures to use Terraform to apply the ConfigMap.
❯ kubectl describe configmap -n kube-system aws-auth
Error from server (NotFound): configmaps "aws-auth" not found
  • If you do not receive the preceding error, and kubectl returns an aws-auth ConfigMap, then you should not use Terraform to manage the ConfigMap.

To configure the Terraform Kubernetes provider

  1. Create a new Terraform configuration file named aws-auth-config-map.tf.
  2. Add the aws_eks_cluster Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Add the aws_eks_cluster_auth Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster_auth" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Configure the kubernetes provider with your cluster host (endpoint address), cluster_ca_certificate, and the token from the aws_eks_cluster and aws_eks_cluster_auth data sources.
provider "kubernetes" {
  load_config_file       = false
  host                   = data.aws_eks_cluster.selected.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.selected.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.selected.token
}

To configure the AWS IAM Authenticator on the cluster

  1. Open the aws-auth-config-map.tf Terraform configuration file you created in the last step.
  2. Add a kubernetes_config_map Terraform resource to add the aws-auth ConfigMap to the kube-system
  3. Configure the data argument with a YAML format heredoc string that adds the Amazon Resource Name (ARN) for your IAM role to the mapRoles
resource "kubernetes_config_map" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = <<-EOT
      - rolearn: ${module.eks_self_managed_node_group.role_arn}
        username: system:node:{{EC2PrivateDNSName}}
        groups:
          - system:bootstrappers
          - system:nodes
    EOT
  }
}

Apply the configuration and verify node registration

You have created the Terraform configurations to deploy an EKS self-managed node group to your Outpost and configure Kubernetes to authenticate the nodes. Now you apply the configurations and verify that the nodes register with your Kubernetes cluster.

To apply the Terraform configurations

  1. Run the terraform init command to download the providers, self-managed node group module, and prepare the directory for use.
  2. Run the terraform apply
  3. Review the resources that will be created.
  4. Enter yes to confirm that you want to create the resources.
    ❯ terraform apply
    
    Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
      + create
    
    Terraform will perform the following actions:
    
    <-- Output omitted for brevity -->
    
    Plan: 9 to add, 0 to change, 0 to destroy.
    
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    

  5. Press Enter.
    <-- Output omitted for brevity -->
    
    Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
    

  6. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information required to connect to your cluster – substituting your <region> and <cluster-name>.
    ❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
    Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
    

  7. Run the kubectl get nodes command to view the status of your cluster nodes.
  8. Verify your new nodes show up in the list with a STATUS of Ready.
    ❯ kubectl get nodes
    NAME                                          STATUS   ROLES    AGE   VERSION
    ip-10-253-46-119.us-west-2.compute.internal   Ready    <none>   37s   v1.18.9-eks-d1db3c
    

  9. (Optional) If your nodes are registering, and their STATUS does not show Ready, run the kubectl get nodes --watch command to watch them come online.
  10. (Optional) Run the kubectl get nodes --show-labels command to view the node list with the labels assigned to each node. The nodes created by your AWS Outposts node group will have the labels you assigned in Step 1.

To verify the Kubernetes system pods deploy on the worker nodes

  1. Run the kubectl get pods --namespace kube-system
  2. Verify that each pod shows READY 1/1 with a STATUS of Running.
❯ kubectl get pods --namespace kube-system
NAME                       READY   STATUS    RESTARTS   AGE
aws-node-84xlc             1/1     Running   0          2m16s
coredns-559b5db75d-jxqbp   1/1     Running   0          5m36s
coredns-559b5db75d-vdccc   1/1     Running   0          5m36s
kube-proxy-fspw4           1/1     Running   0          2m16s

The nodes in your AWS Outposts node group should be registered with the EKS control plane in the Region and the Kubernetes system pods should successfully deploy on the new nodes.

Clean up

One of the nice things about using infrastructure as code tools like Terraform is that they automate stack creation and deletion. Use the following procedure to remove the resources you created in this tutorial.

To clean up the resources created in this tutorial

  1. Run the terraform destroy
  2. Review the resources that will be destroyed.
  3. Enter yes to confirm that you want to destroy the resources.
❯ terraform destroy

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

Plan: 0 to add, 0 to change, 9 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
  1. Press Enter.

Destroy complete! Resources: 9 destroyed.

  1. Clean up any resources you created for the prerequisites.

Conclusion

In this post, we discussed how containers sit at the heart of the application modernization process, making it easy to adopt microservices architectures that improve application scalability, availability, and performance. We also outlined the challenges associated with modernizing on-premises applications with low latency, local data processing, and data residency requirements. In these cases, AWS Outposts brings cloud services, like Amazon EKS, close to the workloads being refactored. We also walked you through deploying Amazon EKS worker nodes on-premises on AWS Outposts.

Now that you have deployed Amazon EKS worker nodes on in a test VPC using Terraform, you should adapt the Terraform module(s) and resources to prepare and deploy your production Kubernetes clusters on-premises on Outposts. If you don’t have an Outpost and you want to get started modernizing your on-premises applications with Amazon EKS and AWS Outposts, contact your local AWS account Solutions Architect (SA) to help you get started.

Apple Mail’s iOS15 Privacy Protection Impact to Senders

Post Syndicated from Matt Strzelecki original https://aws.amazon.com/blogs/messaging-and-targeting/apple-mails-ios15-privacy-protection-impact-to-senders-2/

On June 7th at Apple’s Worldwide Developer’s Conference (WWDC 2021) Apple announced that Apple Mail users can now choose to use Apple Mail Privacy Protection. Apple Mail Privacy Protection will allow iOS to privately load remote message content which will hide recipient’s mail activity information like IP and user agent information, including geolocation and device(s) used to engage with the message. Apple Mail Privacy Protection will eliminate the open as being a reliable metric to evaluate user engagement on the sender’s side as all tracking pixels and images will be cached and fired as it hits Apple Mail. Apple is doing this in order to protect user information and increase privacy while also helping to facilitate a richer user experience as Apple Mail user can confidently open, read and engage with messages without all their email interactions are being tracked through remote images and tracking pixels. This will result in all messages that have the Apple Mail Privacy Protection enabled to register an open regardless of whether the recipient has read the email message or not. The end user will also have more confidence in the security of the message including its links.

When a user starts Apple Mail on their iOS device, emails to that user are initiated for download to their device but are first cached by Apple including all images and pixels, to a proxy server that does not expose individual recipient IP addresses but rather a generic IP of the Apple Cache. This happens regardless of if the user actually opens the mail at that time or not. If the user opens the email it pulls the message from the Apple Cache rather than from the original sending source, typically an email service provider (ESP). As a result, senders will not have open tracking insight as all tracking images and pixels will fire as the messages are downloaded to the Apple Cache.

Apple Mail Privacy Protection will apply to email opened on the Apple Mail app. If a user engages messages through another mail application such as the Gmail app, Apple Mail Privacy Protection will not be applied. Apple Mail Privacy Protection is not enabled by default but as you launch the Apple Mail app in iOS 15 initially, the user will be prompted to enable privacy protection which most users will choose to turn on.

Impact to Marketers

There will be a major impact to marketers who rely heavily on open rates as a conversion metric for user engagement as open data will be skewed as messages containing tracking links will fire regardless of if a recipient actually engages with the message or not. However, other data points and user activity will still be available such as click-through rates, onsite activity, and conversion history. These types of metrics will need to be relied upon to supplement open tracking data. Additionally, email deliverability best practices will be more important than ever to help maintain healthy lists and a responsive user base. Best practices such as confirmed opt-in list building, list maintenance & hygiene, consistent sending patterns and cadence, and honoring opt-outs and complaints will be even more important for marketers to adhere to as they adjust to the new Mail Privacy Protection feature.

While Mail Privacy Protection reduces visibility of open rates there are benefits to the user experience as user trust increases in the messages received through Apple Mail. For example, previous users who chose to receive text-only based messages to protect their privacy will now receive the more rich content of the full message providing a better user experience while engaging with the message. Full load of images and content will be sent to the recipients who will have a much higher sense of security in reading/ingesting/actioning the email and its content. Prior to Apple Mail Privacy Protection there could be skepticism of URLs and links within the messages leading to more deletes or false positive, potentially also resulting in more complaints and/or unsubscribes.

There are other benefits of Apple’s Mail Privacy Protection to marketers such as validation of email addresses. Since emails are cached as the messages are initiated for download to a device, and as a result it is downloaded to the Apple Cache and the tracking image or pixel is fired, it validates the existence of that email address. This does not mean you should use this feature as a validation tool as mailbox providers such as Gmail will still evaluate senders in part on list hygiene and high invalid requests will still lead to negative sender reputation with those providers. Confirmed opt-in practices are going to be even more crucial for managing healthy and long-term lists for marketers than it was prior to Apple Mail Privacy Protection. If a marketer is unsure about opt-in status, look into creating a re-confirmation campaign and only add back in recipients that re-confirm the opt-in by clicking a confirmation link in the message.

Conclusion

Email is still the most used tool to communicate whether that’s business-to-business, business-to-consumer or peer-to-peer, especially when it comes to marketing. Marketers need to continue to evolve and be creative when sending messages to their recipients because email, as it it relates to privacy & security, will continue evolve and leave marketers who don’t keep pace behind. While Apple’s Mail Privacy Protection reduces open rate visibility it does provide its user base with more security and confidence in messages passed to their devices. That confidence can allow marketers to focus on developing richer content for a better user experience and drive conversions rather than just opens.

Developing and managing a list with proper confirmed opt-in methods are crucial to developing long-term email lists and the trust of your recipients. The implementation of Apple Mail Privacy Protection reinforces this principle.

Lastly, email privacy & security will continue to advance forward and marketers along with email service providers should not be trying to “get around” these privacy features, rather they need to understand that these features are intended to help the end user and your customers. Work within the ideology of providing the customers what they want to receive and nothing more or less, and you can help your emails thrive. Stay tuned for more updates as they become available.

Enabling parallel file systems in the cloud with Amazon EC2 (Part I: BeeGFS)

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/enabling-parallel-file-systems-in-the-cloud-with-amazon-ec2-part-i-beegfs/

This post was authored by AWS Solutions Architects Ray Zaman, David Desroches, and Ameer Hakme.

In this blog series, you will discover how to build and manage your own Parallel Virtual File System (PVFS) on AWS. In this post you will learn how to deploy the popular open source parallel file system, BeeGFS, using AWS D3en and I3en EC2 instances. We will also provide a CloudFormation template to automate this BeeGFS deployment.

A PVFS is a type of distributed file system that distributes file data across multiple servers and provides concurrent data access to multiple execution tasks of an application. PVFS focuses on high-performance access to large datasets. It consists of a server process and a client library, which allows the file system to be mounted and used with standard utilities. PVFS on the Linux OS originated in the 1990’s and today several projects are available including Lustre, GlusterFS, and BeeGFS. Workloads such as shared storage for video transcoding and export, batch processing jobs, high frequency online transaction processing (OLTP) systems, and scratch storage for high performance computing (HPC) benefit from the high throughput and performance provided by PVFS.

Implementation of a PVFS can be complex and expensive. There are many variables you will want to take into account when designing a PVFS cluster including the number of nodes, node size (CPU, memory), cluster size, storage characteristics (size, performance), and network bandwidth. Due to the difficulty in estimating the correct configuration, systems procured for on-premises data centers are typically oversized, resulting in additional costs, and underutilized resources. In addition, the hardware procurement process is lengthy and the installation and maintenance of the hardware adds additional overhead.

AWS makes it easy to run and fully manage your parallel file systems by allowing you to choose from a variety of Amazon Elastic Compute Cloud (EC2) instances. EC2 instances are available on-demand and allow you to scale your workload as needed. AWS storage-optimized EC2 instances offer up to 60 TB of NVMe SSD storage per instance and up to 336 TB of local HDD storage per instance. With storage-optimized instances, you can easily deploy PVFS to support workloads requiring high-performance access to large datasets. You can test and iterate on different instances to find the optimal size for your workloads.

D3en instances leverage 2nd-generation Intel Xeon Scalable Processors (Cascade Lake) and provide a sustained all core frequency up to 3.1 GHz. These instances provide up to 336 TB of local HDD storage (which is the highest local storage capacity in EC2), up to 6.2 GiBps of disk throughput, and up to 75 Gbps of network bandwidth.

I3en instances are powered by 1st or 2nd generation Intel® Xeon® Scalable (Skylake or Cascade Lake) processors with 3.1 GHz sustained all-core turbo performance. These instances provide up to 60 TB of NVMe storage, up to 16 GB/s of sequential disk throughput, and up to 100 Gbps of network bandwidth.

BeeGFS, originally released by ThinkParQ in 2014, is an open source, software defined PVFS that runs on Linux. You can scale the size and performance of the BeeGFS file-system by configuring the number of servers and disks in the clusters up to thousands of nodes.

BeeGFS architecture

D3en instances offer HDD storage while I3en instances offer NVMe SSD storage. This diversity allows you to create tiers of storage based on performance requirements. In the example presented in this post you will use four D3en.8xlarge (32 vCPU, 128 GB, 16x14TB HDD, 50 Gbit) and two I3en.12xlarge (48 vCPU, 384 GB, 4 x 7.5-TB NVMe) instances to create two storage tiers. You may choose different sizes and quantities to meet your needs. The I3en instances, with SSD, will be configured as tier 1 and the D3en instances, with HDD, will be configured as tier 2. One disk from each instance will be formatted as ext4 and used for metadata while the remaining disks will be formatted as XFS and used for storage. You may choose to separate metadata and storage on different hosts for workloads where these must scale independently. The array will be configured RAID 0, since it will provide maximum performance. Software replication or other RAID types can be employed for higher durability.

BeeGFS architecture

Figure 1: BeeGFS architecture

You will deploy all instances within a single VPC in the same Availability Zone and subnet to minimize latency. Security groups must be configured to allow the following ports:

  • Management service (beegfs-mgmtd): 8008
  • Metadata service (beegfs-meta): 8005
  • Storage service (beegfs-storage): 8003
  • Client service (beegfs-client): 8004

You will use the Debian Quick Start Amazon Machine Image (AMI) as it supports BeeGFS. You can enable Amazon CloudWatch to capture metrics.

How to deploy the BeeGFS architecture

Follow the steps below to create the PVFS described above. For automated deployment, use the CloudFormation template located at AWS Samples.

  1. Use the AWS Management Console or CLI to deploy one D3en.8xlarge instance into a VPC as described above.
  2. Log in to the instance and update the system:
    • sudo apt update
    • sudo apt upgrade
  3. Install the XFS utilities and load the kernel module:
    • sudo apt-get -y install xfsprogs
    • sudo modprobe -v xfs

Format the first disk ext4 as it is used for metadata, the rest are formatted xfs. The disks will appear as “nvme???” which actually represent the HDD drives on the D3en instances.

4. View a listing of available disks:

    • sudo lsblk

5. Format hard disks:

    • sudo mkfs -t ext4 /dev/nvme0n1
    • sudo mkfs -t xfs /dev/nvme1n1
    • Repeat this command for disks nvme2n1 through nvme15n1

6. Create file system mount points:

    • sudo mkdir /disk00
    • sudo mkdir /disk01
    • Repeat this command for disks disk02 through disk15

7. Mount the filesystems:

    • sudo mount /dev/nvme0n1 /disk00
    • sudo mount /dev/nvme0n1 /disk01
    • Repeat this command for disks disk02 through disk15

Repeat steps 1 through 7 on the remaining nodes. Remember to account for fewer disks for i3en.12xlarge instances or if you decide to use different instance sizes.

8. Add the BeeGFS Repo to each node:

    • sudo apt-get -y install gnupg
    • wget https://www.beegfs.io/release/beegfs_7.2.3/dists/beegfs-deb10.list
    • sudo cp beegfs-deb10.list /etc/apt/sources.list.d/
    • sudo wget -q https://www.beegfs.io/release/latest-stable/gpg/DEB-GPG-KEY-beegfs -O- | sudo apt-key add -
    • sudo apt update

9. Install BeeGFS management (node 1 only):

    • sudo apt-get -y install beegfs-mgmtd
    • sudo mkdir /beegfs-mgmt
    • sudo /opt/beegfs/sbin/beegfs-setup-mgmtd -p /beegfs-mgmt/beegfs/beegfs_mgmtd

10. Install BeeGFS metadata and storage (all nodes):

    • sudo apt-get -y install beegfs-meta beegfs-storage beegfs-meta beegfs-client beegfs-helperd beegfs-utils
    • # -s is unique ID based on node - change this!, -m is hostname of management server
    • sudo /opt/beegfs/sbin/beegfs-setup-meta -p /disk00/beegfs/beegfs_meta -s 1 -m ip-XXX-XXX-XXX-XXX
    • # Change -s to nodeID and -i to (nodeid)0(disk), -m is hostname of management server
    • sudo /opt/beegfs/sbin/beegfs-setup-storage -p /disk01/beegfs_storage -s 1 -i 101 -m ip-XXX-XXX-XXX-XXX
    • sudo /opt/beegfs/sbin/beegfs-setup-storage -p /disk02/beegfs_storage -s 1 -i 102 -m ip-XXX-XXX-XXX-XXX
    • Repeat this last command for the remaining disks disk03 through disk15

11. Start the services:

    • #Only on node1
    • sudo systemctl start beegfs-mgmtd
    • #All servers
    • sudo systemctl start beegfs-meta
    • sudo systemctl start beegfs-storage

At this point, your BeeGFS cluster is running and ready for use by a client system. The client system requires BeeGFS client software in order to mount the cluster.

12. Deploy an m5n.2xlarge instance into the same subnet as the PVFS cluster.

13. Log in to the instance, install, and configure the client:

    • sudo apt update
    • sudo apt upgrade
    • sudo apt-get -y install gnupg
    • #Need linux sources for client compilation
    • sudo apt-get -y install linux-source
    • sudo apt-get -y install linux-headers-4.19.0-14-all
    • wget https://www.beegfs.io/release/beegfs_7.2.3/dists/beegfs-deb10.list
    • sudo cp beegfs-deb10.list /etc/apt/sources.list.d/
    • sudo wget -q https://www.beegfs.io/release/latest-stable/gpg/DEB-GPG-KEY-beegfs -O- | sudo apt-key add -
    • sudo apt update
    • sudo apt-get -y install beegfs-client beegfs-helperd beegfs-utils
    • sudo /opt/beegfs/sbin/beegfs-setup-client -m ip-XXX-XXX-XXX-XX # use the ip address of the management node
    • sudo systemctl start beegfs-helperd
    • sudo systemctl start beegfs-client

14. Create the storage pools:

    • sudo beegfs-ctl --addstoragepool —desc="tier1" —targets=501,502,503,601,602,603
    • sudo beegfs-ctl --addstoragepool --desc="tier2" --targets=101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,201,202,203,204,205,206,207,208,209,210,
      211,212,213,214,215,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,401,402,403,404,405,406,407,
      408,409,410,411,412,413,414,415
    • sudo beegfs-ctl --liststoragepools
    • Pool ID Pool Description                      Targets                 Buddy Groups
    • ======= ================== ============================ ============================
      • Default
      • tier1 501,502,503,601,602,603
      • tier2 101,102,103,104,105,106,107,
        • 108,109,110,111,112,113,114,
        • 115,201,202,203,204,205,206,
        • 207,208,209,210,211,212,213,
        • 214,215,301,302,303,304,305,
        • 306,307,308,309,310,311,312,
        • 313,314,315,401,402,403,404,
        • 405,406,407,408,409,410,411,
        • 412,413,414,415

15. Mount the pools to the file system:

    • sudo beegfs-ctl --setpattern --storagepoolid=2 /mnt/beegfs/tier1
    • sudo beegfs-ctl --setpattern --storagepoolid=3 /mnt/beegfs/tier2

The BeeGFS PVFS is now ready to be used by the client system.

How to test your new BeeGFS PVFS

BeeGFS provides StorageBench to evaluate the performance of BeeGFS on the storage targets. This benchmark measures the streaming throughput of the underlying file system and devices independent of the network performance. To simulate client I/O, this benchmark generates read/write locally on the servers without any client communication.

It is possible to benchmark specific targets or all targets together using the “servers” parameter. A “read” or “write” parameter sets the type pf test to perform. The “threads” parameter is set to the number of storage devices.

Try the following commands to test performance:

Write test (1x d3en):

sudo beegfs-ctl --storagebench --servers=1 --write --blocksize=512K —size=20G —threads=15

Write test (4x d3en):

sudo beegfs-ctl --storagebench --alltargets --write --blocksize=512K —size=20G —threads=15

Read test (4x d3en):

sudo beegfs-ctl --storagebench --servers=1,2,3,4 --read --blocksize=512K --size=20G --threads=15

Write test (1x i3en):

sudo beegfs-ctl --storagebench --servers=5 --write --blocksize=512K --size=20G --threads=3

Read test (2x i3en):

sudo beegfs-ctl --storagebench --servers=5,6 --read --blocksize=512K —size=20G —threads=3

StorageBench is a great way to test what the potential performance of a given environment looks like by reducing variables like network throughput and latency, but you may want to test in a more real-world fashion. For this, tools like ‘fio’ can generate mixed read/write workloads against files on the client BeeGFS mountpoint.

First, we need to define which directory goes to which Storage Pool (tier) by setting a pattern:

sudo beegfs-ctl --setpattern --storagepoolid=2 /mnt/beegfs/tier1 sudo beegfs-ctl --setpattern --storagepoolid=3 /mnt/beegfs/tier2

You can see how a file gets striped across the various disks in a pool by adding a file and running the command:

sudo beegfs-ctl —getentryinfo /mnt/beegfs/tier1/myfile.bin

Install fio:

sudo apt-get install -y fio

Now you can run a fio test against one of the tiers.  This example command runs eight threads running a 75/25 read/write workload against a 10-GB file:

sudo fio --numjobs=8 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt/beegfs/tier1/test --bs=512k --iodepth=64 --size=10G --readwrite=randrw --rwmixread=75

Cleaning up

To avoid ongoing charges for resources you created, you should:

Conclusion

In this blog post we demonstrated how to build and manage your own BeeGFS Parallel Virtual File System on AWS. In this example, you created two storage tiers using the I3en and D3en. The I3en was used as the first tier for SSD storage and the D3en was used as a second tier for HDD storage. By using two different tiers, you can optimize performance to meet your application requirements.

Amazon EC2 storage-optimized instances make it easy to deploy the BeeGFS Parallel Virtual File System. Using combinations of SSD and HDD storage available on the I3en and D3en instance types, you can achieve the capacity and performance needed to run the most demanding workloads. Read more about the D3en and I3en instances.