Tag Archives: Amazon EC2

Automate Amazon EC2 instance isolation by using tags

Post Syndicated from Jose Obando original https://aws.amazon.com/blogs/security/automate-amazon-ec2-instance-isolation-by-using-tags/

Containment is a crucial part of an overall Incident Response Strategy, as this practice allows time for responders to perform forensics, eradication and recovery during an Incident. There are many different approaches to containment. In this post, we will be focusing on isolation—the ability to keep multiple targets separated so that each target only sees and affects itself—as a containment strategy.

I’ll show you how to automate isolation of an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Lambda function that’s triggered by tag changes on the instance, as reported by Amazon CloudWatch Events.

CloudWatch Event Rules deliver a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. See also Amazon EventBridge.

Preparing for an incident is important as outlined in the Security Pillar of the AWS Well-Architected Framework.

Out of the 7 Design Principles for Security in the Cloud, as per the Well-Architected Framework, this solution will cover the following:

  • Enable traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.
  • Automate security best practices: Automated software-based security mechanisms can improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including through the implementation of controls that can be defined and managed by AWS as code in version-controlled templates.
  • Prepare for security events: Prepare for an incident by implementing incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to help increase your speed for detection, investigation, and recovery.

After detecting an event in the Detection phase and analyzing in the Analysis phase, you can automate the process of logically isolating an instance from a Virtual Private Cloud (VPC) in Amazon Web Services (AWS).

In this blog post, I describe how to automate EC2 instance isolation by using the tagging feature that a responder can use to identify instances that need to be isolated. A Lambda function then uses AWS API calls to isolate the instances by performing the actions described in the following sections.

Use cases

Your organization can use automated EC2 instance isolation for scenarios like these:

  • A security analyst wants to automate EC2 instance isolation in order to respond to security events in a timely manner.
  • A security manager wants to provide their first responders with a way to quickly react to security incidents without providing too much access to higher security features.

High-level architecture and design

The example solution in this blog post uses a CloudWatch Events rule to trigger a Lambda function. The CloudWatch Events rule is triggered when a tag is applied to an EC2 instance. The Lambda code triggers further actions based on the contents of the event. Note that the CloudFormation template includes the appropriate permissions to run the function.

The event flow is shown in Figure 1 and works as follows:

  1. The EC2 instance is tagged.
  2. The CloudWatch Events rule filters the event.
  3. The Lambda function is invoked.
  4. If the criteria are met, the isolation workflow begins.

When the Lambda function is invoked and the criteria are met, these actions are performed:

  1. Checks for IAM instance profile associations.
  2. If the instance is associated to a role, the Lambda function disassociates that role.
  3. Applies the isolation role that you defined during CloudFormation stack creation.
  4. Checks the VPC where the EC2 instance resides.
    • If there is no isolation security group in the VPC (if the VPC is new, for example), the function creates one.
  5. Applies the isolation security group.

Note: If you had a security group with an open (0.0.0.0/0) outbound rule, and you apply this Isolation security group, your existing SSH connections to the instance are immediately dropped. On the other hand, if you have a narrower inbound rule that initially allows the SSH connection, the existing connection will not be broken by changing the group. This is known as Connection Tracking.

Figure 1: High-level diagram showing event flow

Figure 1: High-level diagram showing event flow

For the deployment method, we will be using an AWS CloudFormation Template. AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

The AWS CloudFormation template that I provide here deploys the following resources:

  • An EC2 instance role for isolation – this is attached to the EC2 Instance to prevent further communication with other AWS Services thus limiting the attack surface to your overall infrastructure.
  • An Amazon CloudWatch Events rule – this is used to detect changes to an AWS EC2 resource, in this case a “tag change event”. We will use this as a trigger to our Lambda function.
  • An AWS Identity and Access Management (IAM) role for Lambda functions – this is what the Lambda function will use to execute the workflow.
  • A Lambda function for automation – this function is where all the decision logic sits, once triggered it will follow a set of steps described in the following section.
  • Lambda function permissions – this is used by the Lambda function to execute.
  • An IAM instance profile – this is a container for an IAM role that you can use to pass role information to an EC2 instance.

Supporting functions within the Lambda function

Let’s dive deeper into each supporting function inside the Lambda code.

The following function identifies the virtual private cloud (VPC) ID for a given instance. This is needed to identify which security groups are present in the VPC.

def identifyInstanceVpcId(instanceId):
    instanceReservations = ec2Client.describe_instances(InstanceIds=[instanceId])['Reservations']
    for instanceReservation in instanceReservations:
        instancesDescription = instanceReservation['Instances']
        for instance in instancesDescription:
            return instance['VpcId']

The following function modifies the security group of an EC2 instance.

def modifyInstanceAttribute(instanceId,securityGroupId):
    response = ec2Client.modify_instance_attribute(
        Groups=[securityGroupId],
        InstanceId=instanceId)

The following function creates a security group on a VPC that blocks all egress access to the security group.

def createSecurityGroup(groupName, descriptionString, vpcId):
    resource = boto3.resource('ec2')
    securityGroupId = resource.create_security_group(GroupName=groupName, Description=descriptionString, VpcId=vpcId)
    securityGroupId.revoke_egress(IpPermissions= [{'IpProtocol': '-1','IpRanges': [{'CidrIp': '0.0.0.0/0'}],'Ipv6Ranges': [],'PrefixListIds': [],'UserIdGroupPairs': []}])
    return securityGroupId.

Deploy the solution

To deploy the solution provided in this blog post, first download the CloudFormation template, and then set up a CloudFormation stack that specifies the tags that are used to trigger the automated process.

Download the CloudFormation template

To get started, download the CloudFormation template from Amazon S3. Alternatively, you can launch the CloudFormation template by selecting the following Launch Stack button:

Select the Launch Stack button to launch the template

Deploy the CloudFormation stack

Start by uploading the CloudFormation template to your AWS account.

To upload the template

  1. From the AWS Management Console, open the CloudFormation console.
  2. Choose Create Stack.
  3. Choose With new resources (standard).
  4. Choose Upload a template file.
  5. Select Choose File, and then select the YAML file that you just downloaded.
Figure 2: CloudFormation stack creation

Figure 2: CloudFormation stack creation

Specify stack details

You can leave the default values for the stack as long as there aren’t any resources provisioned already with the same name, such as an IAM role. For example, if left with default values an IAM role named “SecurityIsolation-IAMRole” will be created. Otherwise, the naming convention is fully customizable from this screen and you can enter your choice of name for the CloudFormation stack, and modify the parameters as you see fit. Figure 3 shows the parameters that you can set.

The Evaluation Parameters section defines the tag key and value that will initiate the automated response. Keep in mind that these values are case-sensitive.

Figure 3: CloudFormation stack parameters

Figure 3: CloudFormation stack parameters

Choose Next until you reach the final screen, shown in Figure 4, where you acknowledge that an IAM role is created and you trust the source of this template. Select the check box next to the statement I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create Stack.

Figure 4: CloudFormation IAM notification

Figure 4: CloudFormation IAM notification

After you complete these steps, the following resources will be provisioned, as shown in Figure 5:

  • EC2IsolationRole
  • EC2TagChangeEvent
  • IAMRoleForLambdaFunction
  • IsolationLambdaFunction
  • IsolationLambdaFunctionInvokePermissions
  • RootInstanceProfile
Figure 5: CloudFormation created resources

Figure 5: CloudFormation created resources

Testing

To start your automation, tag an EC2 instance using the tag defined during the CloudFormation setup. If you’re using the Amazon EC2 console, you can apply tags to resources by using the Tags tab on the relevant resource screen, or you can use the Tags screen, the AWS CLI or an AWS SDK. A detailed walkthrough for each approach can be found in the Amazon EC2 Documentation page.

Reverting Changes

If you need to remove the restrictions applied by this workflow, complete the following steps.

  1. From the EC2 dashboard, in the Instances section, check the box next to the instance you want to modify.

    Figure 6: Select the instance to modify

    Figure 6: Select the instance to modify

  2. In the top right, select Actions, choose Instance settings, and then choose Modify IAM role.

    Figure 7: Choose Actions > Instance settings > Modify IAM role

    Figure 7: Choose Actions > Instance settings > Modify IAM role

  3. Under IAM role, choose the IAM role to attach to your instance, and then select Save.

    Figure 8: Choose the IAM role to attach

    Figure 8: Choose the IAM role to attach

  4. Select Actions, choose Networking, and then choose Change security groups.

    Figure 9: Choose Actions > Networking > Change security groups

    Figure 9: Choose Actions > Networking > Change security groups

  5. Under Associated security groups, select Remove and add a different security group with the access you want to grant to this instance.

Summary

Using the CloudFormation template provided in this blog post, a Security Operations Center analyst could have only tagging privileges and isolate an EC2 instance based on this tag. Or a security service such as Amazon GuardDuty could trigger a lambda to apply the tag as part of a workflow. This means the Security Operations Center analyst wouldn’t need administrative privileges over the EC2 service.

This solution creates an isolation security group for any new VPCs that don’t have one already. The security group would still follow the naming convention defined during the CloudFormation stack launch, but won’t be part of the provisioned resources. If you decide to delete the stack, manual cleanup would be required to remove these security groups.

This solution terminates established inbound Secure Shell (SSH) sessions that are associated to the instance, and isolates the instance from new connections either inbound or outbound. For outbound connections that are already established (for example, reverse shell), you either need to shut down the network interface card (NIC) at the operating system (OS) level, restart the instance network stack at the OS level, terminate the established connections, or apply a network access control list (network ACL).

For more information, see the following documentation:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jose Obando

Jose is a Security Consultant on the Global Financial Services team. He helps the world’s top financial institutions improve their security posture in the cloud. He has a background in network security and cloud architecture. In his free time, you can find him playing guitar or training in Muay Thai.

Field Notes: Streaming VR to Wireless Headsets Using NVIDIA CloudXR

Post Syndicated from William Cannady original https://aws.amazon.com/blogs/architecture/field-notes-streaming-vr-to-wireless-headsets-using-nvidia-cloudxr/

It’s exciting to see many consumer-grade virtual reality (VR) hardware options, but setting up hardware can be cumbersome, expensive and complicated. Wired headsets require high-powered graphics workstations, and a solution to prevent you from tripping over the wires. Many room-scale headsets require two external peripherals (or ‘light towers’) to be installed so the headset can position itself in a room. These setups can take days to tune, and need resetting if the light towers are moved.

With the release of the Oculus Quest, users of virtual reality were delighted with a wireless, room-scale headset with dual hand tracking. They could enjoy VR without worrying about light towers or a high-powered graphics workstation. However, because the Quest was battery powered, it used an inherently low-powered central processing unit (CPU) and graphics processing unit (GPU). As a result, VR content had to be simplified to run on the Quest. This prevented customers from using the Quest for the most demanding graphics experiences, such as Computer Aided Design (CAD) review, or playing games such as Half-Life: Alyx.

Customers were faced with a difficult choice: expensive, complicated setups, or reduced-fidelity experiences.

In this blog post, we show you how to stream a full-fidelity VR experience from a computer on AWS to a wireless headset such as the Quest.

Overview of architecture

NVIDIA CloudXR takes NVIDIA’s experience in GPU encoding and decoding, and streams pixels to a remotely connected VR headset. By doing this, the rendering and compute requirements of visually intensive applications take place on a remote server instead of a local headset. This makes mobile headsets work with any application, regardless of their visual complexity and density.

 

Figure 1: architecture for streaming VR experiences from the AWS Cloud to a VR headset using NVIDIA’s CloudXR server running on EC2.

Figure 1: architecture for streaming VR experiences from the AWS Cloud to a VR headset

To provide global scalability,  NVIDIA announced the CloudXR platform will be available on G4 and P3 EC2 instances. It provides the following benefits:

  • At a global scale, customers can stream remote AR/VR experiences from Regions that are close them.
  • It enables centrally managed and deployed software experiences on Amazon Elastic Compute Cloud (Amazon EC2) instances. Previously these required physical transportation and implementation of devices and server hardware.
  • Lastly, IT administrators can now centrally manage content that may be sensitive or require frequent changes.

Walkthrough

Using CloudXR on AWS requires EC2 instances with NVIDIA GPUs (that is, the P3 or G4 instance types) running within your virtual private cloud (VPC). The instance must be network accessible to a remote CloudXR client running on a VR headset. Connections are 1:1, meaning that each CloudXR client is connected to a dedicated EC2 instance. If needs require multiple CloudXR clients, you can deploy multiple EC2 instances.

Note that the process outlined is accurate as of January 2021. CloudXR, and X Reality (XR) overall, is rapidly changing. Consult the latest information about CloudXR from NVIDIA. Using CloudXR within your AWS account requires you setup P3 or G4 EC2 instances, as you would within an Amazon VPC. You must also add a security group that allows the ports required for CloudXR communication. These specific ports can be found in the CloudXR documentation, available from NVIDIA.

We have created a CloudFormation template that deploys an EC2 instance with CloudXR configured for reference, linked in the prerequisites. Because it makes reference to a private AMI, it must be shared with your account in order to deploy successfully. If you’re interested in using this template, contact your AWS Account team.

Prerequisites

The following steps describe how to configure the EC2 instance manually. CloudXR streaming requires using a connection other than Windows RDP to connect to the remote EC2 instance. We use NICE DCV, which is provided at no cost to EC2 instances for remote connectivity.

For this walkthrough, you should have the following prerequisites:

Deploy CloudXR Server onto Amazon EC2

It’s important to note the steps outlined are for configuring a G4 instance. If you’d prefer to use a P3 instance, manually deploy your P3 instance and install NICE DCV as described in the documentation.

  1. Log into your AWS Account and navigate to the AWS Marketplace to install an EC2 instance with NICE DCV configured.
  2. Create a new security group during deployment that matches the CloudXR port settings and apply it to your instance. Consult the CloudXR documentation for the latest port settings.
  3. Wait 5 minutes for everything to initialize properly. Make note of the issues public IP address (or attach an Elastic IP address to the instance).
  4. Navigate to https://<IP-OF-INSTANCE>:8443 to connect to NICE DCV web-browser client. b.Use the credentials created during EC2 initialization to Log in

NICE DCV login screen

NICE DCV login screen on Web Browser Client

5. Once logged into your EC2 instance, install SteamVR and CloudXR onto the remote EC2 instance. SteamVR is used as an OpenVR/XR proxy between your VR application and CloudXR. CloudXR is used to stream the SteamVR experience to a remote CloudXR Client.

6. Verify installation of the CloudXR plugin into SteamVR by navigating to the Manage Add-ons page within the Advanced Settings option. Make sure it lists CloudXRRemoteHMD as an addon and is set to ON.

Verification of CloudXR Installation

Verification of CloudXR Installation

7. Add an allow entry to the Windows Firewall Entry for VRServer.exe. This allows SteamVR to use the CloudXR to stream properly. By default this file is located at %ProgramFiles% (x86)\Steam\steamapps\common\SteamVR\bin\win64\vrserver.exe\

Enabling the VRSERVER.EXE application through the Windows Application Firewall.

 

8. Install a CloudXR client onto your VR headset. If using an android-powered headset (that is, the Oculus Quest), you can use the sample APK within the CloudXR SDK

9. Select Finish.

Connect to your CloudXR Server and start streaming

1. Launch SteamVR on your remote EC2 instance by logging into your Steam account or configuring a no-login link following the Installation/use of SteamVR in an environment without internet access instructions.

2. When loaded, it will report a headset cannot be detected. This is OK.

SteamVR will display Headset not Detected—this is OK

SteamVR will display Headset not Detected—this is OK.

3. Within your Client headset, load the CloudXR Client application you recently installed.

4. Once connected, the headset will start display the SteamVR “void”. You also should see a view of your headset if SteamVR mirroring is enabled. The status box on the SteamVR server application will show a headset and 2 controllers attached as well.

SteamVR “Void” and Headset Connected icons

SteamVR “Void” and Headset Connected icons

5. Congratulations. You’re now connected to an AWS EC2 instance using NVIDIA CloudXR! Any VR application you now run on the EC2 server that uses OpenVR will be streamed to your VR headset!

Cleaning up

EC2 instances are billed only when they’re being used. You’ll want to make sure to stop your instance or shut it down when you are finished with your session. Terminating your instance is not necessary.

Conclusion

In this blog post, we showed how to stream a full-fidelity VR experience from a computer on AWS to a wireless headset. Having the ability to remotely connect to GPU-powered servers to run graphic workloads is not necessarily new, but connecting to a remote server with a VR headset and having full interactivity certainly is. With this architecture, you realize the benefits of CloudXR combined with the agility and scalability available on AWS. It becomes less challenging to manage content played on VR headsets because content doesn’t reside on the VR headset—it lives on the EC2 server.

Deploying to any AWS region where GPU instances are available allows you to offer CloudXR to your users at global scale.  As networks get faster and closer through services like AWS Outposts and AWS Wavelength, remote VR work will become possible for more customers. We’re excited to see what new workloads come next as this way of working grows.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

 

Leo Chan

Leo Chan

Leo Chan is the Worldwide Tech Lead for Spatial Computing at AWS. He loves working at the intersection of Art and Technology and lives with his family in a rain forest just off the coast of Vancouver, Canada.

Secure and automated domain membership management for EC2 instances with no internet access

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/security/secure-and-automated-domain-membership-management-for-ec2-instances-with-no-internet-access/

In this blog post, I show you how to deploy an automated solution that helps you fully automate the Active Directory join and unjoin process for Amazon Elastic Compute Cloud (Amazon EC2) instances that don’t have internet access.

Managing Active Directory domain membership for EC2 instances in Amazon Web Services (AWS) Cloud is a typical use case for many organizations. In a dynamic environment that can grow and shrink multiple times in a day, adding and removing computer objects from an Active Directory domain is a critical task and is difficult to manage without automation.

AWS seamless domain join provides a secure and reliable option to join an EC2 instance to your AWS Directory Service for Microsoft Active Directory. It’s a recommended approach for automating joining a Windows or Linux EC2 instance to the AWS Managed Microsoft AD or to an existing on-premises Active Directory using AD Connector, or a standalone Simple AD directory running in the AWS Cloud. This method requires your EC2 instances to have connectivity to the public AWS Directory Service endpoints. At the time of writing, Directory Service doesn’t have PrivateLink endpoint support. This means you must allow traffic from your instances to the public Directory Service endpoints via an internet gateway, network address translation (NAT) device, virtual private network (VPN) connection, or AWS Direct Connect connection.

At times, your organization might require that any traffic between your VPC and Directory Service—or any other AWS service—not leave the Amazon network. That means launching EC2 instances in an Amazon Virtual Private Cloud (Amazon VPC) with no internet access and still needing to join and unjoin the instances from the Active Directory domain. Provided your instances have network connectivity to the directory DNS addresses, the simplest solution in this scenario is to run the domain join commands manually on the EC2 instances and enter the domain credentials directly. Though this process can be secure—as you don’t need to store or hardcode the credentials—it’s time consuming and becomes difficult to manage in a dynamic environment where EC2 instances are launched and terminated frequently.

VPC endpoints enable private connections between your VPC and supported AWS services. Private connections enable you to privately access services by using private IP addresses. Traffic between your VPC and other AWS services doesn’t leave the Amazon network. Instances in your VPC don’t need public IP addresses to communicate with resources in the service.

The solution in this blog post uses AWS Secrets Manager to store the domain credentials and VPC endpoints to enable private connection between your VPC and other AWS services. The solution described here can be used in the following scenarios:

  1. Manage domain join and unjoin for EC2 instances that don’t have internet access.
  2. Manage only domain unjoin if you’re already using seamless domain join provided by AWS, or any other method for domain joining.
  3. Manage only domain join for EC2 instances that don’t have internet access.

This solution uses AWS CloudFormation to deploy the required resources in your AWS account based on your choice from the preceding scenarios.

Note: If your EC2 instances can access the internet, then we recommend using the seamless domain join feature and using scenario 2 to remove computers from the Active Directory domain upon instance termination.

The solution described in this blog post is designed to provide a secure, automated method for joining and unjoining EC2 instances to an on-premises or AWS Managed Microsoft AD domain. The solution is best suited for use cases where the EC2 instances don’t have internet connectivity and the seamless domain join option cannot be used.

How this solution works

This blog post includes a CloudFormation template that you can use to deploy this solution. The CloudFormation stack provisions an EC2 Windows instance running in an Amazon EC2 Auto Scaling group that acts as a worker and is responsible for joining and unjoining other EC2 instances from the Active Directory domain. The worker instance communicates with other required AWS services such as Amazon Simple Storage Service (Amazon S3), Secrets Manager, and Amazon Simple Queue Service (Amazon SQS) using VPC endpoints. The stack also creates all of the other resources needed for this solution to work.

Figure 1 shows the domain join and unjoin workflow for EC2 instances in an AWS account.

Figure 1: Workflow for joining and unjoining an EC2 instance from a domain with full protection of Active Directory credentials

Figure 1: Workflow for joining and unjoining an EC2 instance from a domain with full protection of Active Directory credentials

The event flow in Figure 1 is as follows:

  1. An EC2 instance is launched or terminated in an account.
  2. An Amazon CloudWatch Events rule detects if the EC2 instance is in running or terminated state.
  3. The CloudWatch event triggers an AWS Lambda function that looks for the tag JoinAD: true to check if the instance needs to join or unjoin the Active Directory domain.
  4. If the tag value is true, the Lambda function writes the instance details to an Amazon Simple Queue Service (Amazon SQS) queue.
  5. A standalone, highly secured EC2 instance acts as a worker and polls the Amazon SQS queue for new messages.
  6. Whenever there’s a new message in the queue, the worker EC2 instance invokes scripts on the remote EC2 instance to add or remove the instance from the domain based on the instance operating system and state.

In this solution, the security of the Active Directory credentials is enhanced by storing them in Secrets Manager. To secure the stored credentials, the solution uses resource-based policies to restrict the access to only intended users and roles.

The credentials can only be fetched dynamically from the EC2 instance that’s performing the domain join and unjoin operations. Any access to that instance is further restricted by a custom AWS Identity and Access Management (IAM) policy created by the CloudFormation stack. The following policies are created by the stack to enhance security of the solution components.

  1. Resource-based policies for Secrets Manager to restrict all access to the stored secret to only specific IAM entities (such as the EC2 IAM role).
  2. An S3 bucket policy to prevent unauthorized access to the Active Directory join and remove scripts that are stored in the S3 bucket.
  3. The IAM role that’s used to fetch the credentials from Secrets Manager is restricted by a custom IAM policy and can only be assumed by the worker EC2 instance. This prevents every entity other than the worker instance from using that IAM role.
  4. All API and console access to the worker EC2 instance is restricted by a custom IAM policy with an explicit deny.
  5. A policy to deny all but the worker EC2 instance access to the credentials in Secrets Manager. With the worker EC2 instance doing the work, the EC2 instances that need to join the domain don’t need access to the credentials in Secrets Manager or to scripts in the S3 bucket.

Prerequisites and setup

Before you deploy the solution, you must complete the following in the AWS account and Region where you want to deploy the CloudFormation stack.

  1. AWS Managed Microsoft AD with an appropriate DNS name (for example, test.com). You can also use your on premises Active Directory, provided it’s reachable from the Amazon VPC over Direct Connect or AWS VPN.
  2. Create a DHCP option set with on-premises DNS servers or with the DNS servers pointing to the IP addresses of directories provided by AWS.
  3. Associate the DHCP option set with the Amazon VPC that you’re going to use with this solution.
  4. Any other Amazon VPCs that are hosting EC2 instances to be domain joined must be peered with the VPC that hosts the relevant AWS Managed Microsoft AD. Alternatively, AWS Transit Gateway can be used to establish this connectivity.
  5. Make sure to have the latest AWS Command Line Interface (AWS CLI) installed and configured on your local machine.
  6. Create a new SSH key pair and store it in Secrets Manager using the following commands. Replace <Region> with the Region of your deployment. Replace <MyKeyPair> with any custom name or leave it default.

Bash:

aws ec2 create-key-pair --region <Region> --key-name <MyKeyPair> --query 'KeyMaterial' --output text > adsshkey
aws secretsmanager create-secret --region <Region> --name "adsshkey" --description "my ssh key pair" --secret-string file://adsshkey

PowerShell:

aws ec2 create-key-pair --region <Region> --key-name <MyKeyPair>  --query 'KeyMaterial' --output text | out-file -encoding ascii -filepath adsshkey
aws secretsmanager create-secret --region <Region> --name "adsshkey" --description "my ssh key pair" --secret-string file://adsshkey

Note: Don’t change the name of the secret, as other scripts in the solution reference it. The worker EC2 instance will fetch the SSH key using GetSecretValue API to SSH or RDP into other EC2 instances during domain join process.

Deploy the solution

With the prerequisites in place, your next step is to download or clone the GitHub repo and store the files on your local machine. Go to the location where you cloned or downloaded the repo and review the contents of the config/OS_User_Mapping.json file to validate the instance user name and operating system mapping. Update the file if you’re using a user name other than the one used to log in to the EC2 instances. The default user name used in this solution is ec2-user for Linux instances and Administrator for Windows.

The solution requires installation of some software on the worker EC2 instance. Because the EC2 instance doesn’t have internet access, you must download the latest Windows 64-bit version of the following software to your local machine and upload it into the solution deployment S3 bucket in subsequent steps.

Note: This step isn’t required if your EC2 instances have internet access.

Once done, use the following steps to deploy the solution in your AWS account:

Steps to deploy the solution:

  1. Create a private Amazon Simple Storage Service (Amazon S3) bucket using this documentation to store the Lambda functions and the domain join and unjoin scripts.
  2. Once created, enable versioning on this bucket using the following documentation. Versioning lets you keep multiple versions of your objects in one bucket and helps you easily retrieve and restore previous versions of your scripts.
  3. Upload the software you downloaded to the S3 bucket. This is only required if your instance doesn’t have internet access.
  4. Upload the cloned or downloaded GitHub repo files to the S3 bucket.
  5. Go to the S3 bucket and select the template name secret-active-dir-solution.json, and copy the object URL.
  6. Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
  7. Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 5, and then choose Next.
  8. On the Specify stack details page, enter a name for the stack and provide the following input parameters. You can modify the default values to customize the solution for your environment.
    • ADUSECASE – From the dropdown menu, select your required use case. There is no default value.
    • AdminUserId – The canonical user ID of the IAM user who manages the Active Directory credentials stored in Secrets Manager. To learn how to find the canonical user ID for your IAM user, scroll down to Finding the canonical user ID for your AWS account in AWS account identifiers.
    • DenyPolicyName – The name of the IAM policy that restricts access to the worker EC2 instance and the IAM role used by the worker to fetch credentials from Secrets Manager. You can keep the default value or provide another name.
    • InstanceType – Instance type to be used when launching the worker EC2 instance. You can keep the default value or use another instance type if necessary.
    • Placeholder – This is a dummy parameter that’s used as a placeholder in IAM policies for the EC2 instance ID. Keep the default value.
    • S3Bucket – The name of the S3 bucket that you created in the first step of the solution deployment. Replace the default value with your S3 bucket name.
    • S3prefix – Amazon S3 object key where the source scripts are stored. Leave the default value as long as the cloned GitHub directory structure hasn’t been changed.
    • SSHKeyRequired – Select true or false based on whether an SSH key pair is required to RDP into the EC2 worker instance. If you select false, the worker EC2 instance will not have an SSH key pair.
    • SecurityGroupId – Security group IDs to be associated with the worker instance to control traffic to and from the instance.
    • Subnet – Select the VPC subnet where you want to launch the worker EC2 instance.
    • VPC – Select the VPC where you want to launch the worker EC2 instance. Use the VPC where you have created the AWS Managed Microsoft AD.
    • WorkerSSHKeyName – An existing SSH key pair name that can be used to get the password for RDP access into the EC2 worker instance. This isn’t mandatory if you’re using user name and password based login or AWS Systems Manager Session Manager. This is required only if you have selected true for the SSHKeyRequired parameter.
    Figure 2: Defining the stack name and input parameters for the CloudFormation stack

    Figure 2: Defining the stack name and input parameters for the CloudFormation stack

  9. Enter values for all of the input parameters, and then choose Next.
  10. On the Options page, keep the default values and then choose Next.
  11. On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and choose Create Stack.
  12. Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:
    • An EC2 instance that acts as a worker and runs Active Directory join scripts on the remote EC2 instances. It also unjoins instances from the domain upon instance termination.
    • A secret with a default Active Directory domain name, user name, and a dummy password. The name of the default secret is myadcredV1.
    • A Secrets Manager resource-based policy to deny all access to the secret except to the intended IAM users and roles.
    • An EC2 IAM profile and IAM role to be used only by the worker EC2 instance.
    • A managed IAM policy called DENYPOLICY that can be assigned to an IAM user, group, or role to restrict access to the solution resources such as the worker EC2 instance.
    • A CloudWatch Events rule to detect running and terminated states for EC2 instances and trigger a Lambda function that posts instance details to an SQS queue.
    • A Lambda function that reads instance tags and writes to an SQS queue based on the instance tag value, which can be true or false.
    • An SQS queue for storing the EC2 instance state—running or terminated.
    • A dead-letter queue for storing unprocessed messages.
    • An S3 bucket policy to restrict access to the source S3 bucket from unauthorized users or roles.
    • A CloudWatch log group to stream the logs of the worker EC2 instance.

Test the solution

Now that the solution is deployed, you can test it to check if it’s working as expected. Before you test the solution, you must navigate to the secret created in Secrets Manager by CloudFormation and update the Active Directory credentials—domain name, user name, and password.

To test the solution

  1. In the CloudFormation console, choose Services, and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the ADSecret entry.
  2. Choose the ADSecret link to go to the configuration for the secret in the Secrets Manager console. Scroll down to the section titled Secret value, and then choose Retrieve secret value to display the default Secret Key and Secret Value as shown in Figure 3.

    Figure 3: Retrieve value in Secrets Manager

    Figure 3: Retrieve value in Secrets Manager

  3. Choose the Edit button and update the default dummy credentials with your Active Directory domain credentials.(Optional) Directory_ou is used to store the organizational unit (OU) and directory components (DC) for the directory; for example, OU=test,DC=example,DC=com.

Note: instance_password is an optional secret key and is used only when you’re using user name and password based login to access the EC2 instances in your account.

Now that the secret is updated with the correct credentials, you can launch a test EC2 instance and determine if the instance has successfully joined the Active Directory domain.

Create an Amazon Machine Image

Note: This is only required for Linux-based operating systems other than Amazon Linux. You can skip these steps if your instances have internet access.

As your VPC doesn’t have internet access, for Linux-based systems other than Amazon Linux 1 or Amazon Linux 2, the required packages must be available on the instances that need to join the Active Directory domain. For that, you must create a custom Amazon Machine Image (AMI) from an EC2 instance with the required packages. If you already have a process to build your own AMIs, you can add these packages as part of that existing process.

To install the package into your AMI

  1. Create a new EC2 Linux instance for the required operating system in a public subnet or a private subnet with access to the internet via a NAT gateway.
  2. Connect to the instance using any SSH client.
  3. Install the required software by running the following command that is appropriate for the operating system:
    • For CentOS:
      yum -y install realmd adcli oddjob-mkhomedir oddjob samba-winbind-clients samba-winbind samba-common-tools samba-winbind-krb5-locator krb5-workstation unzip
      

    • For RHEL:
      yum -y  install realmd adcli oddjob-mkhomedir oddjob samba-winbind-clients samba-winbind samba-common-tools samba-winbind-krb5-locator krb5-workstation python3 vim unzip
      

    • For Ubuntu:
      apt-get -yq install realmd adcli winbind samba libnss-winbind libpam-winbind libpam-krb5 krb5-config krb5-locales krb5-user packagekit  ntp unzip python
      

    • For SUSE:
      sudo zypper -n install realmd adcli sssd sssd-tools sssd-ad samba-client krb5-client samba-winbind krb5-client python
      

    • For Debian:
      apt-get -yq install realmd adcli winbind samba libnss-winbind libpam-winbind libpam-krb5 krb5-config krb5-locales krb5-user packagekit  ntp unzip
      

  4. Follow Manually join a Linux instance to install the AWS CLI on Linux.
  5. Create a new AMI based on this instance by following the instructions in Create a Linux AMI from an instance.

You now have a new AMI that can be used in the next steps and in future to launch similar instances.

For Amazon Linux-based EC2 instances, the solution will use the mechanism described in How can I update yum or install packages without internet access on my EC2 instances to install the required packages and you don’t need to create a custom AMI. No additional packages are required if you are using Windows-based EC2 instances.

To launch a test EC2 instance

  1. Navigate to the Amazon EC2 console and launch an Amazon Linux or Windows EC2 instance in the same Region and VPC that you used when creating the CloudFormation stack. For any other operating system, make sure you are using the custom AMI created before.
  2. In the Add Tags section, add a tag named JoinAD and set the value as true. Add another tag named Operating_System and set the appropriate operating system value from:
    • AMAZON_LINUX
    • FEDORA
    • RHEL
    • CENTOS
    • UBUNTU
    • DEBIAN
    • SUSE
    • WINDOWS
  3. Make sure that the security group associated with this instance is set to allow all inbound traffic from the security group of the worker EC2 instance.
  4. Use the SSH key pair name from the prerequisites (Step 6) when launching the instance.
  5. Wait for the instance to launch and join the Active Directory domain. You can now navigate to the CloudWatch log group named /ad-domain-join-solution/ created by the CloudFormation stack to determine if the instance has joined the domain or not. On successful join, you can connect to the instance using a RDP or SSH client and entering your login credentials.
  6. To test the domain unjoin workflow, you can terminate the EC2 instance launched in Step 1 and log in to the Active Directory tools instance to validate that the Active Directory computer object that represents the instance is deleted.

Solution review

Let’s review the details of the solution components and what happens during the domain join and unjoin process:

1) The worker EC2 instance:

The worker EC2 instance used in this solution is a Windows instance with all configurations required to add and remove machines to and from an Active Directory domain. It can also be used as an Active Directory administration tools instance. This instance is continuously running a bash script that is polling the SQS queue for new messages. Upon arrival of a new message, the script performs the following tasks:

  1. Check if the instance is in running or terminated state to determine if it needs to be added or removed from the Active Directory domain.
  2. If the message is from a newly launched EC2 instance, then this means that this instance needs to join the Active Directory domain.
  3. The script identifies the instance operating system and runs the appropriate PowerShell or bash script on the remote EC2.
  4. Similarly, if the instance is in terminated state, then the worker will run the domain unjoin command locally to remove the computer object from the Active Directory domain.
  5. If the worker fails to process a message in the SQS queue, it sends the unprocessed message to a backup queue for debugging.
  6. The worker writes logs related to the success or failure of the domain join to a CloudWatch log group. Use /ad-domain-join-solution to filter for all other logs created by the worker instance in CloudWatch.

2) The worker bash script running on the instance:

This script polls the SQS queue every 5 seconds for new messages and is responsible for following activities:

  • Fetching Active Directory join credentials (user name and password) from Secrets Manager.
  • If the remote EC2 instance is running Windows, running the Invoke-Command PowerShell cmdlet on the instance to perform the Active Directory join operation.
  • If the remote EC2 instance is running Linux, running realm join command on the instance to perform the Active Directory join operation.
  • Running the Remove-ADComputer command to remove the computer object from the Active Directory domain for terminated EC2 instances.
  • Storing domain-joined EC2 instance details—computer name and IP address—in an Amazon DynamoDB table. These details are used to check if an instance is already part of the domain and when removing the instance from the Active Directory domain.

More information

Now that you have tested the solution, here are some additional points to be noted:

  • The Active Directory join and unjoin scripts provided with this solution can be replaced with your existing custom scripts.
  • To update the scripts on the worker instance, you must upload the modified scripts to the S3 bucket and the changes will automatically synchronize on the instance.
  • This solution works with single account, Region, and VPC combination. It can be modified to use across multiple Regions and VPC combinations.
  • For VPCs in a different account or Region, you must share your AWS Managed Microsoft AD with another AWS account when the networking prerequisites have been completed.
  • The instance user name and operating system mapping used in the solution is based on the default user name used by AWS.
  • You can use AWS Systems Manager with VPC endpoints to log in to EC2 instances that don’t have internet access.

The solution is protecting your Active Directory credentials and is making sure that:

  • Active Directory credentials can be accessed only from the worker EC2 instance.
  • The IAM role used by the worker EC2 instance to fetch the secret cannot be assumed by other IAM entities.
  • Only authorized users can read the credentials from the Secrets Manager console, through AWS CLI, or by using any other AWS Tool—such as an AWS SDK.

The focus of this solution is to demonstrate a method you can use to secure Active Directory credentials and automate the process of EC2 instances joining and unjoining from an Active Directory domain.

  • You can associate the IAM policy named DENYPOLICY with any IAM group or user in the account to block that user or group from accessing or modifying the worker EC2 instance and the IAM role used by the worker.
  • If your account belongs to an organization, you can use an organization-level service control policy instead of an IAM-managed policy—such as DENYPOLICY—to protect the underlying resources from unauthorized users.

Conclusion

In this blog post, you learned how to deploy an automated and secure solution through CloudFormation to help secure the Active Directory credentials and also manage adding and removing Amazon EC2 instances to and from an Active Directory domain. When using this solution, you incur Amazon EC2 charges along with charges associated with Secrets Manager pricing and AWS PrivateLink.

You can use the following references to help diagnose or troubleshoot common errors during the domain join or unjoin process.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Rakesh Singh

Rakesh is a Technical Account Manager with AWS. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Running cost optimized Spark workloads on Kubernetes using EC2 Spot Instances

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/

This post is written by Kinnar Sen, Senior Solutions Architect, EC2 Spot 

Apache Spark is an open-source, distributed processing system used for big data workloads. It provides API operations to perform multiple tasks such as streaming, extract transform load (ETL), query, machine learning (ML), and graph processing. Spark supports four different types of cluster managers (Spark standalone, Apache Mesos, Hadoop YARN, and Kubernetes), which are responsible for scheduling and allocation of resources in the cluster. Spark can run with native Kubernetes support since 2018 (Spark 2.3). AWS customers that have already chosen Kubernetes as their container orchestration tool can also choose to run Spark applications in Kubernetes, increasing the effectiveness of their operations and compute resources.

In this post, I illustrate the deployment of scalable, resilient, and cost optimized Spark application using Kubernetes via Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon EC2 Spot Instances. Learn how to save money on big data workloads by implementing this solution.

Overview

Amazon EC2 Spot Instances

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand Instance prices. Capacity pools are a group of EC2 instances that belong to particular instance family, size, and Availability Zone (AZ). If EC2 needs capacity back for On-Demand Instance usage, Spot Instances can be interrupted by EC2 with a two-minute notification. There are many graceful ways to handle the interruption to ensure that the application is well architected for resilience and fault tolerance. This can be automated via the application and/or infrastructure deployments. Spot Instances are ideal for stateless, fault tolerant, loosely coupled and flexible workloads that can handle interruptions.

Amazon Elastic Kubernetes Service

Amazon EKS is a fully managed Kubernetes service that makes it easy for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane. It provides a highly available and scalable managed control plane. It also provides managed worker nodes, which let you create, update, or terminate shut down worker nodes for your cluster with a single command. It is a great choice for deploying flexible and fault tolerant containerized applications. Amazon EKS supports creating and managing Amazon EC2 Spot Instances using Amazon EKS-managed node groups following Spot best practices. This enables you to take advantage of the steep savings and scale that Spot Instances provide for interruptible workloads running in your Kubernetes cluster. Using EKS-managed node groups with Spot Instances requires less operational effort compared to using self-managed nodes. In addition to launching Spot Instances in managed node groups, it is possible to specify multiple instance types in EKS managed node groups. You can find more in this blog.

Apache Spark and Kubernetes

When a spark application is submitted to the Kubernetes cluster the following happens:

  • A Spark driver is created.
  • The driver and the run within pods.
  • The Spark driver then requests for executors, which are scheduled to run within pods. The executors are managed by the driver.
  • The application is launched and once it completes, the executor pods are cleaned up. The driver pod persists the logs and remains in a completed state until the pod is cleared by garbage collection or manually removed. The driver in a completed stage does not consume any memory or compute resources.

Spark Deployment on Kubernetes Cluster

When a spark application runs on clusters managed by Kubernetes, the native Kubernetes scheduler is used. It is possible to schedule the driver/executor pods on a subset of available nodes. The applications can be launched either by a vanilla ‘spark submit’, a workflow orchestrator like Apache Airflow or the spark operator. I use vanilla ‘spark submit’ in this blog. is also able to schedule Spark applications on EKS clusters as described in this launch blog, but Amazon EMR on EKS is out of scope for this post.

Cost optimization

For any organization running big data workloads there are three key requirements: scalability, performance, and low cost. As the size of data increases, there is demand for more compute capacity and the total cost of ownership increases. It is critical to optimize the cost of big data applications. Big Data frameworks (in this case, Spark) are distributed to manage and process high volumes of data. These frameworks are designed for failure, can run on machines with different configurations, and are inherently resilient and flexible.

If Spark deploys on Kubernetes, the executor pods can be scheduled on EC2 Spot Instances and driver pods on On-Demand Instances. This reduces the overall cost of deployment – Spot Instances can save up to 90% over On-Demand Instance prices. This also enables faster results by scaling out executors running on Spot Instances. Spot Instances, by design, can be interrupted when EC2 needs the capacity back. If a driver pod is running on a Spot Instance, which is interrupted then the application fails and the application must be re-submitted. To avoid this situation, the driver pod can be scheduled on On-Demand Instances only. This adds a layer of resiliency to the Spark application running on Kubernetes. To cost optimize the deployment, all the executor pods are scheduled on Spot Instances as that’s where the bulk of compute happens. Spark’s inherent resiliency has the driver launch new executors to replace the ones that fail due to Spot interruptions.

There are a couple of key points to note here.

  • The idea is to start with minimum number of nodes for both On-Demand and Spot Instances (one each) and then auto-scale usingCluster Autoscaler and EC2 Auto Scaling  Cluster Autoscaler for AWS provides integration with Auto Scaling groups. If there are not sufficient resources, the driver and executor pods go into pending state. The Cluster Autoscaler detects pods in pending state and scales worker nodes within the identified Auto Scaling group in the cluster using EC2 Auto Scaling.
  • The scaling for On-Demand and Spot nodes is exclusive of one another. So, if multiple applications are launched the driver and executor pods can be scheduled in different node groups independently per the resource requirements. This helps reduce job failures due to lack of resources for the driver, thus adding to the overall resiliency of the system.
  • Using EKS Managed node groups
    • This requires significantly less operational effort compared to using self-managed nodegroup and enables:
      • Auto enforcement of Spot best practices like Capacity Optimized allocation strategy, Capacity Rebalancing and use multiple instances types.
      • Proactive replacement of Spot nodes using rebalance notifications.
      • Managed draining of Spot nodes via re-balance recommendations.
    • The nodes are auto-labeled so that the pods can be scheduled with NodeAffinity.
      • eks.amazonaws.com/capacityType: SPOT
      • eks.amazonaws.com/capacityType: ON_DEMAND

Now that you understand the products and best practices of used in this tutorial, let’s get started.

Tutorial: running Spark in EKS managed node groups with Spot Instances

In this tutorial, I review steps, which help you launch cost optimized and resilient Spark jobs inside Kubernetes clusters running on EKS. I launch a word-count application counting the words from an Amazon Customer Review dataset and write the output to an Amazon S3 folder. To run the Spark workload on Kubernetes, make sure you have eksctl and kubectl installed on your computer or on an AWS Cloud9 environment. You can run this by using an AWS IAM user or role that has the AdministratorAccess policy attached to it, or check the minimum required permissions for using eksctl. The spot node groups in the Amazon EKS cluster can be launched both in a managed or a self-managed way, in this post I use the former. The config files for this tutorial can be found here. The job is finally launched in cluster mode.

Create Amazon S3 Access Policy

First, I must create an Amazon S3 access policy to allow the Spark application to read/write from Amazon S3. Amazon S3 Access is provisioned by attaching the policy by ARN to the node groups. This associates Amazon S3 access to the NodeInstanceRole and, hence, the node groups then have access to Amazon S3. Download the Amazon S3 policy file from here and modify the <<output folder>> to an Amazon S3 bucket you created. Run the following to create the policy. Note the ARN.

aws iam create-policy --policy-name spark-s3-policy --policy-document file://spark-s3.json

Cluster and node groups deployment

Create an EKS cluster using the following command:

eksctl create cluster –name= sparkonk8 --node-private-networking  --without-nodegroup --asg-access –region=<<AWS Region>>

The cluster takes approximately 15 minutes to launch.

Create the nodegroup using the nodeGroup config file. Replace the <<Policy ARN>> string using the ARN string from the previous step.

eksctl create nodegroup -f managedNodeGroups.yml

Scheduling driver/executor pods

The driver and executor pods can be assigned to nodes using affinity. PodTemplates can be used to configure the detail, which is not supported by Spark launch configuration by default. This feature is available from Spark 3.0.0, requiredDuringScheduling node affinity is used to schedule the driver and executor jobs. Sample podTemplates have been uploaded here.

Launching a Spark application

Create a service account. The spark driver pod uses the service account to create and watch executor pods using Kubernetes API server.

kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole='edit'  --serviceaccount=default:spark --namespace=default

Download the Cluster Autoscaler and edit it to add the cluster-name. 

curl -LO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Install the Cluster AutoScaler using the following command:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

Get the details of Kubernetes master to get the head URL.

kubectl cluster-info 

command output

Use the following instructions to build the docker image.

Download the application file (script.py) from here and upload into the Amazon S3 bucket created.

Download the pod template files from here. Submit the application.

bin/spark-submit \
--master k8s://<<MASTER URL>> \
--deploy-mode cluster \
--name 'Job Name' \
--conf spark.eventLog.dir=s3a:// <<S3 BUCKET>>/logs \
--conf spark.eventLog.enabled=true \
--conf spark.history.fs.inProgressOptimization.enabled=true \
--conf spark.history.fs.update.interval=5s \
--conf spark.kubernetes.container.image=<<ECR Spark Docker Image>> \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.driver.podTemplateFile='../driver_pod_template.yml' \
--conf spark.kubernetes.executor.podTemplateFile='../executor_pod_template.yml' \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.shuffleTracking.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=100 \
--conf spark.dynamicAllocation.executorAllocationRatio=0.33 \
--conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=30 \
--conf spark.dynamicAllocation.executorIdleTimeout=60s \
--conf spark.driver.memory=8g \
--conf spark.kubernetes.driver.request.cores=2 \
--conf spark.kubernetes.driver.limit.cores=4 \
--conf spark.executor.memory=8g \
--conf spark.kubernetes.executor.request.cores=2 \
--conf spark.kubernetes.executor.limit.cores=4 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \
--conf spark.hadoop.fs.s3a.fast.upload=true \
s3a://<<S3 BUCKET>>/script.py \
s3a://<<S3 BUCKET>>/output 

A couple of key points to note here

  • podTemplateFile is used here, which enables scheduling of the driver pods to On-Demand Instances and executor pods to Spot Instances.
  • Spark provides a mechanism to allocate dynamically resources dynamically based on workloads. In the latest release of Spark (3.0.0), dynamicAllocation can be used with Kubernetes cluster manager. The executors that do not store, active, shuffled files can be removed to free up the resources. DynamicAllocation works well in tandem with Cluster Autoscaler for resource allocation and optimizes resource for jobs. We are using dynamicAllocation here to enable optimized resource sharing.
  • The application file and output are both in Amazon S3.

Output Files in S3

  • Spark Event logs are redirected to Amazon S3. Spark on Kubernetes creates local temporary files for logs and removes them once the application completes. The logs are redirected to Amazon S3 and Spark History Server can be used to analyze the logs. Note, you can create more instrumentation using tools like Prometheus and Grafana to monitor and manage the cluster.

Spark History Server + Dynamic Allocation

Observations

EC2 Spot Interruptions

The following diagram and log screenshot details from Spark History server showcases the behavior of a Spark application in case of an EC2 Spot interruption.

Four Spark applications launched in parallel in a cluster and one of the Spot nodes was interrupted. A couple of executor pods were terminated shut down in three of the four applications, but due to the resilient nature of Spark new executors were launched and the applications finished almost around the same time.
The Spark Driver identified the shut down executors, which handled the shuffle files and relaunched the tasks running on those executors.
Spark jobs

The Spark Driver identified the shut down executors, which handled the shuffle files and relaunched the tasks running on those executors.

Dynamic Allocation

Dynamic Allocation works with the caveat that it is an experimental feature.

dynamic allocation

Cost Optimization

Cost Optimization is achieved in several different ways from this tutorial.

  • Use of 100% Spot Instances for the Spark executors
  • Use of dynamicAllocation along with cluster autoscaler does make optimized use of resources and hence save cost
  • With the deployment of one driver and executor nodes to begin with and then scaling up on demand reduces the waste of a continuously running cluster

Cluster Autoscaling

Cluster Autoscaling is triggered as it is designed when there are pending (Spark executor) pods.

The Cluster Autoscaler logs can be fetched by:

kubectl logs -f deployment/cluster-autoscaler -n kube-system —tail=10  

Cluster Autoscaler Logs 

Cleanup

If you are trying out the tutorial, run the following steps to make sure that you don’t encounter unwanted costs.

Delete the EKS cluster and the nodegroups with the following command:

eksctl delete cluster --name sparkonk8

Delete the Amazon S3 Access Policy with the following command:

aws iam delete-policy --policy-arn <<POLICY ARN>>

Delete the Amazon S3 Output Bucket with the following command:

aws s3 rb --force s3://<<S3_BUCKET>>

Conclusion

In this blog, I demonstrated how you can run Spark workloads on a Kubernetes Cluster using Spot Instances, achieving scalability, resilience, and cost optimization. To cost optimize your Spark based big data workloads, consider running spark application using Kubernetes and EC2 Spot Instances.

 

 

 

How to monitor Windows and Linux servers and get internal performance metrics

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/how-to-monitor-windows-and-linux-servers-and-get-internal-performance-metrics/

This post was written by Dean Suzuki, Senior Solutions Architect.

Customers who run Windows or Linux instances on AWS frequently ask, “How do I know if my disks are almost full?” or “How do I know if my application is using all the available memory and is paging to disk?” This blog helps answer these questions by walking you through how to set up monitoring to capture these internal performance metrics.

Solution overview

If you open the Amazon EC2 console, select a running Amazon EC2 instance, and select the Monitoring tab  you can see Amazon CloudWatch metrics for that instance. Amazon CloudWatch is an AWS monitoring service. The Monitoring tab (shown in the following image) shows the metrics that can be measured external to the instance (for example, CPU utilization, network bytes in/out). However, to understand what percentage of the disk is being used or what percentage of the memory is being used, these metrics require an internal operating system view of the instance. AWS places an extra safeguard on gathering data inside a customer’s instance so this capability is not enabled by default.

EC2 console showing Monitoring tab

To capture the server’s internal performance metrics, a CloudWatch agent must be installed on the instance. For Windows, the CloudWatch agent can capture any of the Windows performance monitor counters. For Linux, the CloudWatch agent can capture system-level metrics. For more details, please see Metrics Collected by the CloudWatch Agent. The agent can also capture logs from the server. The agent then sends this information to Amazon CloudWatch, where rules can be created to alert on certain conditions (for example, low free disk space) and automated responses can be set up (for example, perform backup to clear transaction logs). Also, dashboards can be created to view the health of your Windows servers.

There are four steps to implement internal monitoring:

  1. Install the CloudWatch agent onto your servers. AWS provides a service called AWS Systems Manager Run Command, which enables you to do this agent installation across all your servers.
  2. Run the CloudWatch agent configuration wizard, which captures what you want to monitor. These items could be performance counters and logs on the server. This configuration is then stored in AWS System Manager Parameter Store
  3. Configure CloudWatch agents to use agent configuration stored in Parameter Store using the Run Command.
  4. Validate that the CloudWatch agents are sending their monitoring data to CloudWatch.

The following image shows the flow of these four steps.

Process to install and configure the CloudWatch agent

In this blog, I walk through these steps so that you can follow along. Note that you are responsible for the cost of running the environment outlined in this blog. So, once you are finished with the steps in the blog, I recommend deleting the resources if you no longer need them. For the cost of running these servers, see Amazon EC2 On-Demand Pricing. For CloudWatch pricing, see Amazon CloudWatch pricing.

If you want a video overview of this process, please see this Monitoring Amazon EC2 Windows Instances using Unified CloudWatch Agent video.

Deploy the CloudWatch agent

The first step is to deploy the Amazon CloudWatch agent. There are multiple ways to deploy the CloudWatch agent (see this documentation on Installing the CloudWatch Agent). In this blog, I walk through how to use the AWS Systems Manager Run Command to deploy the agent. AWS Systems Manager uses the Systems Manager agent, which is installed by default on each AWS instance. This AWS Systems Manager agent must be given the appropriate permissions to connect to AWS Systems Manager, and to write the configuration data to the AWS Systems Manager Parameter Store. These access rights are controlled through the use of IAM roles.

Create two IAM roles

IAM roles are identity objects that you attach IAM policies. IAM policies define what access is allowed to AWS services. You can have users, services, or applications assume the IAM roles and get the assigned rights defined in the permissions policies.

To use System Manager, you typically create two IAM roles. The first role has permissions to write the CloudWatch agent configuration information to System Manager Parameter Store. This role is called CloudWatchAgentAdminRole.

The second role only has permissions to read the CloudWatch agent configuration from the System Manager Parameter Store. This role is called CloudWatchAgentServerRole.

For more details on creating these roles, please see the documentation on Create IAM Roles and Users for Use with the CloudWatch Agent.

Attach the IAM roles to the EC2 instances

Once you create the roles, you attach them to your Amazon EC2 instances. By attaching the IAM roles to the EC2 instances, you provide the processes running on the EC2 instance the permissions defined in the IAM role. In this blog, you create two Amazon EC2 instances. Attach the CloudWatchAgentAdminRole to the first instance that is used to create the CloudWatch agent configuration. Attach CloudWatchAgentServerRole to the second instance and any other instances that you want to monitor. For details on how to attach or assign roles to EC2 instances, please see the documentation on How do I assign an existing IAM role to an EC2 instance?.

Install the CloudWatch agent

Now that you have setup the permissions, you can install the CloudWatch agent onto the servers that you want to monitor. For details on installing the CloudWatch agent using Systems Manager, please see the documentation on Download and Configure the CloudWatch Agent.

Create the CloudWatch agent configuration

Now that you installed the CloudWatch agent on your server, run the CloudAgent configuration wizard to create the agent configuration. For instructions on how to run the CloudWatch Agent configuration wizard, please see this documentation on Create the CloudWatch Agent Configuration File with the Wizard. To establish a command shell on the server, you can use AWS Systems Manager Session Manager to establish a session to the server and then run the CloudWatch agent configuration wizard. If you want to monitor both Linux and Windows servers, you must run the CloudWatch agent configuration on a Linux instance and on a Windows instance to create a configuration file per OS type. The configuration is unique to the OS type.

To run the Agent configuration wizard on Linux instances, run the following command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

To run the Agent configuration wizard on Windows instances, run the following commands:

cd "C:\Program Files\Amazon\AmazonCloudWatchAgent"

amazon-cloudwatch-agent-config-wizard.exe

Note for Linux instances: do not select to collect the collectd metrics in the agent configuration wizard unless you have collectd installed on your Linux servers. Otherwise, you may encounter an error.

Review the Agent configuration

The CloudWatch agent configuration generated from the wizard is stored in Systems Manager Parameter Store. You can review and modify this configuration if you need to capture extra metrics. To review the agent configuration, perform the following steps:

  1. Go to the console for the System Manager service.
  2. Click Parameter store on the left hand navigation.
  3. You should see the parameter that was created by the CloudWatch agent configuration program. For Linux servers, the configuration is stored in: AmazonCloudWatch-linux and for Windows servers, the configuration is stored in:  AmazonCloudWatch-windows.

System Manager Parameter Store: Parameters created by CloudWatch agent configuration wizard

  1. Click on the parameter’s hyperlink (for example, AmazonCloudWatch-linux) to see all the configuration parameters that you specified in the configuration program.

In the following steps, I walk through an example of modifying the Windows configuration parameter (AmazonCloudWatch-windows) to add an additional metric (“Available Mbytes”) to monitor.

  1. Click the AmazonCloudWatch-windows
  2. In the parameter overview, scroll down to the “metrics” section and under “metrics_collected”, you can see the Windows performance monitor counters that will be gathered by the CloudWatch agent. If you want to add an additional perfmon counter, then you can edit and add the counter here.
  3. Press Edit at the top right of the AmazonCloudWatch-windows Parameter Store page.
  4. Scroll down in the Value section and look for “Memory.”
  5. After the “% Committed Bytes In Use”, put a comma “,” and then press Enter to add a blank line. Then, put on that line “Available Mbytes” The following screenshot demonstrates what this configuration should look like.

AmazonCloudWatch-windows parameter contents and how to add a new metric to monitor

  1. Press Save Changes.

To modify the Linux configuration parameter (AmazonCloudWatch-linux), you perform similar steps except you click on the AmazonCloudWatch-linux parameter. Here is additional documentation on creating the CloudWatch agent configuration and modifying the configuration file.

Start the CloudWatch agent and use the configuration

In this step, start the CloudWatch agent and instruct it to use your agent configuration stored in System Manager Parameter Store.

  1. Open another tab in your web browser and go to System Manager console.
  2. Specify Run Command in the left hand navigation of the System Manager console.
  3. Press Run Command
  4. In the search bar,
    • Select Document name prefix
    • Select Equal
    • Specify AmazonCloudWatch (Note the field is case sensitive)
    • Press enter

System Manager Run Command's command document entry field

  1. Select AmazonCloudWatch-ManageAgent. This is the command that configures the CloudWatch agent.
  2. In the command parameters section,
    • For Action, select Configure
    • For Mode, select ec2
    • For Optional Configuration Source, select ssm
    • For optional configuration location, specify the Parameter Store name. For Windows instances, you would specify AmazonCloudWatch-windows for Windows instances or AmazonCloudWatch-linux for Linux instances. Note the field is case sensitive. This tells the command to read the Parameter Store for the parameter specified here.
    • For optional restart, leave yes
  3. For Targets, choose your target servers that you wish to monitor.
  4. Scroll down and press Run. The Run Command may take a couple minutes to complete. Press the refresh button. The Run Command configures the CloudWatch agent by reading the Parameter Store for the configuration and configure the agent using those settings.

For more details on installing the CloudWatch agent using your agent configuration, please see this Installing the CloudWatch Agent on EC2 Instances Using Your Agent Configuration.

Review the data collected by the CloudWatch agents

In this step, I walk through how to review the data collected by the CloudWatch agents.

  1. In the AWS Management console, go to CloudWatch.
  2. Click Metrics on the left-hand navigation.
  3. You should see a custom namespace for CWAgent. Click on the CWAgent Please note that this might take a couple minutes to appear. Refresh the page periodically until it appears.
  4. Then click the ImageId, Instanceid hyperlinks to see the counters under that section.

CloudWatch Metrics: Showing counters under CWAgent

  1. Review the metrics captured by the CloudWatch agent. Notice the metrics that are only observable from inside the instance (for example, LogicalDisk % Free Space). These types of metrics would not be observable without installing the CloudWatch agent on the instance. From these metrics, you could create a CloudWatch Alarm to alert you if they go beyond a certain threshold. You can also add them to a CloudWatch Dashboard to review. To learn more about the metrics collected by the CloudWatch agent, see the documentation Metrics Collected by the CloudWatch Agent.

Conclusion

In this blog, you learned how to deploy and configure the CloudWatch agent to capture the metrics on either Linux or Windows instances. If you are done with this blog, we recommend deleting the System Manager Parameter Store entry, the CloudWatch data and  then the EC2 instances to avoid further charges. If you would like a video tutorial of this process, please see this Monitoring Amazon EC2 Windows Instances using Unified CloudWatch Agent video.

 

 

Deploying CIS Level 1 hardened AMIs with Amazon EC2 Image Builder

Post Syndicated from Joseph Keating original https://aws.amazon.com/blogs/devops/deploying-cis-level-1-hardened-amis-with-amazon-ec2-image-builder/

The NFL, an AWS Professional Services partner, is collaborating with NFL’s Player Health and Safety team to build the Digital Athlete Program. The Digital Athlete Program is working to drive progress in the prevention, diagnosis, and treatment of injuries; enhance medical protocols; and further improve the way football is taught and played. The NFL, in conjunction with AWS Professional Services, delivered an Amazon EC2 Image Builder pipeline for automating the production of Amazon Machine Images (AMIs). Following similar practices from the Digital Athlete Program, this post demonstrates how to deploy an automated Image Builder pipeline.

“AWS Professional Services faced unique environment constraints, but was able to deliver a modular pipeline solution leveraging EC2 Image Builder. The framework serves as a foundation to create hardened images for future use cases. The team also provided documentation and knowledge transfer sessions to ensure our team was set up to successfully manage the solution.”

—Joseph Steinke, Director, Data Solutions Architect, National Football League

A common scenario AWS customers face is how to build processes that configure secure AWS resources that can be leveraged throughout the organization. You need to move fast in the cloud without compromising security best practices. Amazon Elastic Compute Cloud (Amazon EC2) allows you to deploy virtual machines in the AWS Cloud. EC2 AMIs provide the configuration utilized to launch an EC2 instance. You can use AMIs for several use cases, such as configuring applications, applying security policies, and configuring development environments. Developers and system administrators can deploy configuration AMIs to bring up EC2 resources that require little-to-no setup. Often times, multiple patterns are adopted for building and deploying AMIs. Because of this, you need the ability to create a centralized, automated pattern that can output secure, customizable AMIs.

In this post, we demonstrate how to create an automated process that builds and deploys Center for Internet Security (CIS) Level 1 hardened AMIs. The pattern that we deploy includes Image Builder, a CIS Level 1 hardened AMI, an application running on EC2 instances, and Amazon Inspector for security analysis. You deploy the AMI configured with the Image Builder pipeline to an application stack. The application stack consists of EC2 instances running Nginx. Lastly, we show you how to re-hydrate your application stack with a new AMI utilizing AWS CloudFormation and Amazon EC2 launch templates. You use Amazon Inspector to scan the EC2 instances launched from the Image Builder-generated AMI against the CIS Level 1 Benchmark.

After going through this exercise, you should understand how to build, manage, and deploy AMIs to an application stack. The infrastructure deployed with this pipeline includes a basic web application, but you can use this pattern to fit many needs. After running through this post, you should feel comfortable using this pattern to configure an AMI pipeline for your organization.

The project we create in this post addresses the following use case: you need a process for building and deploying CIS Level 1 hardened AMIs to an application stack running on Amazon EC2. In addition to demonstrating how to deploy the AMI pipeline, we also illustrate how to refresh a running application stack with a new AMI. You learn how to deploy this configuration with the AWS Command Line Interface (AWS CLI) and AWS CloudFormation.

AWS services used
Image Builder allows you to develop an automated workflow for creating AMIs to fit your organization’s needs. You can streamline the creation and distribution of secure images, automate your patching process, and define security and application configuration into custom AWS AMIs. In this post, you use the following AWS services to implement this solution:

  • AWS CloudFormation – AWS CloudFormation allows you to use domain-specific languages or simple text files to model and provision, in an automated and secure manner, all the resources needed for your applications across all Regions and accounts. You can deploy AWS resources in a safe, repeatable manner, and automate the provisioning of infrastructure.
  • AWS KMSAmazon Key Management Service (AWS KMS) is a fully managed service for creating and managing cryptographic keys. These keys are natively integrated with most AWS services. You use a KMS key in this post to encrypt resources.
  • Amazon S3Amazon Simple Storage Service (Amazon S3) is an object storage service utilized for storing and encrypting data. We use Amazon S3 to store our configuration files.
  • AWS Auto ScalingAWS Auto Scaling allows you to build scaling plans that automate how groups of different resources respond to changes in demand. You can optimize availability, costs, or a balance of both. We use Auto Scaling to manage Nginx on Amazon EC2.
  • Launch templatesLaunch templates contain configurations such as AMI ID, instance type, and security group. Launch templates enable you to store launch parameters so that they don’t have to be specified every time instances are launched.
  • Amazon Inspector – This automated security assessment service improves the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposures, vulnerabilities, and deviations from best practices.

Architecture overview
We use Ansible as a configuration management component alongside Image Builder. The CIS Ansible Playbook applies a Level 1 set of rules to the local host of which the AMI is provisioned on. For more information about the Ansible Playbook, see the GitHub repo. Image Builder offers AMIs with Security Technical Implementation Guides (STIG) levels low-high as part of its pipeline build.

The following diagram depicts the phases of the Image Builder pipeline for building a Nginx web server. The numbers 1–6 represent the order of when each phase runs in the build process:

  1. Source
  2. Build components
  3. Validate
  4. Test
  5. Distribute
  6. AMI

Figure: Shows the EC2 Image Builder steps

The workflow includes the following steps:

  1. Deploy the CloudFormation templates.
  2. The template creates an Image Builder pipeline.
  3. AWS Systems Manager completes the AMI build process.
  4. Amazon EC2 starts an instance to build the AMI.
  5. Systems Manager starts a test instance build after the first build is successful.
  6. The AMI starts provisioning.
  7. The Amazon Inspector CIS benchmark starts.

CloudFormation templates
You deploy the following CloudFormation templates. These CloudFormation templates have a great deal of configurations. They deploy the following resources:

  • vpc.yml – Contains all the core networking configuration. It deploys the VPC, two private subnets, two public subnets, and the route tables. The private subnets utilize a NAT gateway to communicate to the internet. The public subnets have full outbound access to the IGW.
  • kms.yml – Contains the AWS KMS configuration that we use for encrypting resources. The KMS key policy is also configured in this template.
  • s3-iam-config.yml – Contains the launch configuration and autoscaling groups for the initial Nginx launch. For updates and patching to Nginx, we use Image Builder to build those changes.
  • infrastructure-ssm-params.yml – Contains the Systems Manager parameter store configuration. The parameters are populated by using outputs from other CloudFormation templates.
  • nginx-config.yml – Contains the configuration for Nginx. Additionally, this template contains the network load balancer, target groups, security groups, and EC2 instance AWS Identity and Access Management (IAM) roles.
  • nginx-image-builder.yml – Contains the configuration for the Image Builder pipeline that we use to build AMIs.

Prerequisites
To follow the steps to provision the pipeline deployment, you must have the following prerequisites:

Deploying the CloudFormation templates
To deploy your templates, complete the following steps:

1. Clone the source code repository found in the following location:

git clone https://github.com/aws-samples/deploy-cis-level-1-hardened-ami-with-ec2-image-builder-pipeline.git

You now use the AWS CLI to deploy the CloudFormation templates. Make sure to leave the CloudFormation template names as we have written in this post.

2. Deploy the VPC CloudFormation template:

aws cloudformation create-stack \
--stack-name vpc-config \
--template-body file://Templates/vpc.yml \
--parameters file://Parameters/vpc-params.json  \
--capabilities CAPABILITY_IAM \
--region us-east-1

The output should look like the following code:

{

    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/vpc-config/7faaab30-247f-11eb-8712-0e65b6fb18f9"
}

 

3. Open the Parameters/kms-params.json file and update the UserARN parameter with your account ID:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::<input_your_account_id>:root"
  }
]

 

4. Deploy the KMS key CloudFormation template:

aws cloudformation create-stack \
--stack-name kms-config \
--template-body file://Templates/kms.yml \
--parameters file://Parameters/kms-params.json \
--capabilities CAPABILITY_IAM \
--region us-east-1

The output should look like the following:

{
"StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/kms-config/f65aca80-08ff-11eb-8795-12275bc6e1ef"
}

 

5. Open the Parameters/s3-iam-config.json file and update the DemoConfigS3BucketName parameter to a unique name of your choosing:

[
  {
    "ParameterKey" : "Environment",
    "ParameterValue" : "dev"
  },
  {
    "ParameterKey": "NetworkStackName",
    "ParameterValue" : "vpc-config"
  },
  {
    "ParameterKey" : "KMSStackName",
    "ParameterValue" : "kms-config"
  },
  {
    "ParameterKey": "DemoConfigS3BucketName",
    "ParameterValue" : "<input_your_unique_bucket_name>"
  },
  {
    "ParameterKey" : "EC2InstanceRoleName",
    "ParameterValue" : "EC2InstanceRole"
  }
]

 

6. Deploy the IAM role configuration template:

aws cloudformation create-stack \
--stack-name s3-iam-config \
--template-body file://Templates/s3-iam-config.yml \
--parameters file://Parameters/s3-iam-config.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

The output should look like the following:

{
"StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/s3-iam-config/9be9f990-0909-11eb-811c-0a78092beb51"
}

 

Configuring IAM roles and policies

This solution uses a couple of service-linked roles. Let’s generate these roles using the AWS CLI.

 

1. Run the following commands:

aws iam create-service-linked-role --aws-service-name autoscaling.amazonaws.com
aws iam create-service-linked-role --aws-service-name imagebuilder.amazonaws.com

If you see a message similar to following code, it means that you already have the service-linked role created in your account and you can move on to the next step:

An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForImageBuilder has been taken in this account, please try a different suffix.

Now that you have generated the IAM roles used in this post, you add them to the KMS key policy. This allows the roles to encrypt and decrypt the KMS key.

 

2. Open the Parameters/kms-params.json file:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::12345678910:root"
  }
]

 

3. Add the following values as a comma-separated list to the UserARN parameter key:

arn:aws:iam::<input_your_aws_account_id>:role/EC2InstanceRole
arn:aws:iam::<input_your_aws_account_id>:role/EC2ImageBuilderRole
arn:aws:iam::<input_your_aws_account_id>:role/NginxS3PutLambdaRole
arn:aws:iam::<input_your_aws_account_id>:role/aws-service-role/imagebuilder.amazonaws.com/AWSServiceRoleForImageBuilder
arn:aws:iam::<input_your_aws_account_id>:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling

 

When finished, the file should look similar to the following:

[
  {
      "ParameterKey": "KeyName",
      "ParameterValue": "DemoKey"
  },
  {
    "ParameterKey": "UserARN",
    "ParameterValue": "arn:aws:iam::123456789012:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling,arn:aws:iam::<input_your_aws_account_id>:role/NginxS3PutLambdaRole,arn:aws:iam::123456789012:role/aws-service-role/imagebuilder.amazonaws.com/AWSServiceRoleForImageBuilder,arn:aws:iam::12345678910:role/EC2InstanceRole,arn:aws:iam::12345678910:role/EC2ImageBuilderRole,arn:aws:iam::12345678910:root"
  }
]

Updating the CloudFormation stack

Now that the AWS KMS parameter file has been updated, you update the AWS KMS CloudFormation stack.

1. Run the following command to update the kms-config stack:

aws cloudformation update-stack \
--stack-name kms-config \
--template-body file://Templates/kms.yml \
--parameters file://Parameters/kms-params.json \
--capabilities CAPABILITY_IAM \
--region us-east-1

 

The output should look like the following:

{
"StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/kms-config/6e84b750-0905-11eb-b543-0e4dccb471bf"
}

 

2. Open the AnsibleConfig/component-nginx.yml file and update the <input_s3_bucket_name> value with the bucket name you generated from the s3-iam-config stack:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
name: 'Ansible Playbook Execution on Amazon Linux 2'
description: 'This is a sample component that demonstrates how to download and execute an Ansible playbook against Amazon Linux 2.'
schemaVersion: 1.0
constants:
  - s3bucket:
      type: string
      value: <input_s3_bucket_name>
phases:
  - name: build
    steps:
      - name: InstallAnsible
        action: ExecuteBash
        inputs:
          commands:
           - sudo amazon-linux-extras install -y ansible2
      - name: CreateDirectory
        action: ExecuteBash
        inputs:
          commands:
            - sudo mkdir -p /ansibleloc/roles
      - name: DownloadLinuxCis
        action: S3Download
        inputs:
          - source: 's3://{{ s3bucket }}/components/linux-cis.zip'
            destination: '/ansibleloc/linux-cis.zip'
      - name: UzipLinuxCis
        action: ExecuteBash
        inputs:
          commands:
            - unzip /ansibleloc/linux-cis.zip -d /ansibleloc/roles
            - echo "unzip linux-cis file"
      - name: DownloadCisPlaybook
        action: S3Download
        inputs:
          - source: 's3://{{ s3bucket }}/components/cis_playbook.yml'
            destination: '/ansibleloc/cis_playbook.yml'
      - name: InvokeCisAnsible
        action: ExecuteBinary
        inputs:
          path: ansible-playbook
          arguments:
            - '{{ build.DownloadCisPlaybook.inputs[0].destination }}'
            - '--tags=level1'
      - name: DeleteCisPlaybook
        action: ExecuteBash
        inputs:
          commands:
            - rm '{{ build.DownloadCisPlaybook.inputs[0].destination }}'
      - name: DownloadNginx
        action: S3Download
        inputs:
          - source: s3://{{ s3bucket }}/components/nginx.zip'
            destination: '/ansibleloc/nginx.zip'
      - name: UzipNginx
        action: ExecuteBash
        inputs:
          commands:
            - unzip /ansibleloc/nginx.zip -d /ansibleloc/roles
            - echo "unzip Nginx file"
      - name: DownloadNginxPlaybook
        action: S3Download
        inputs:
          - source: 's3://{{ s3bucket }}/components/nginx_playbook.yml'
            destination: '/ansibleloc/nginx_playbook.yml'
      - name: InvokeNginxAnsible
        action: ExecuteBinary
        inputs:
          path: ansible-playbook
          arguments:
            - '{{ build.DownloadNginxPlaybook.inputs[0].destination }}'
      - name: DeleteNginxPlaybook
        action: ExecuteBash
        inputs:
          commands:
            - rm '{{ build.DownloadNginxPlaybook.inputs[0].destination }}'

  - name: validate
    steps:
      - name: ValidateDebug
        action: ExecuteBash
        inputs:
          commands:
            - sudo echo "ValidateDebug section"

  - name: test
    steps:
      - name: TestDebug
        action: ExecuteBash
        inputs:
          commands:
            - sudo echo "TestDebug section"
      - name: Download_Inspector_Test
        action: S3Download
        inputs:
          - source: 's3://ec2imagebuilder-managed-resources-us-east-1-prod/components/inspector-test-linux/1.0.1/InspectorTest'
            destination: '/workdir/InspectorTest'
      - name: Set_Executable_Permissions
        action: ExecuteBash
        inputs:
          commands:
            - sudo chmod +x /workdir/InspectorTest
      - name: ExecuteInspectorAssessment
        action: ExecuteBinary
        inputs:
          path: '/workdir/InspectorTest'

 

Adding files to your S3 buckets

Now you assume a role you generated from one of the previous CloudFormation stacks. This allows you to add files to the encrypted S3 bucket.

1. Run the following command and make sure to update the role to use your AWS account ID number:

aws sts assume-role --role-arn "arn:aws:iam::<input_your_aws_account_id>:role/EC2ImageBuilderRole" --role-session-name AWSCLI-Session

You see an output similar to the following:

{
    "Credentials": {
        "AccessKeyId": "<AWS_ACCESS_KEY_ID>",
        "SecretAccessKey": "<AWS_SECRET_ACCESS_KEY_ID>",
        "SessionToken": "<AWS_SESSION_TOKEN>",
        "Expiration": "2020-11-20T02:54:17Z"
    },
    "AssumedRoleUser": {
        "AssumedRoleId": "ACPATGCCLSNJCNSJCEWZ:AWSCLI-Session",
        "Arn": "arn:aws:sts::123456789012:assumed-role/EC2ImageBuilderRole/AWSCLI-Session"
    }
}

You now assume the EC2ImageBuilderRole IAM role from the command line. This role allows you to create objects in the S3 bucket generated from the s3-iam-config stack. Because this bucket is encrypted with AWS KMS, any user or IAM role requires specific permissions to decrypt the key. You have already accounted for this in a previous step by adding the EC2ImageBuilderRole IAM role to the KMS key policy.

 

2. Create the following environment variable to use the EC2ImageBuilderRole role. Update the values with the output from the previous step:

export AWS_ACCESS_KEY_ID=AccessKeyId
export AWS_SECRET_ACCESS_KEY=SecretAccessKey
export AWS_SESSION_TOKEN=SessionToken

 

3. Check to make sure that you have actually assumed the role EC2ImageBuilderRole:

aws sts get-caller-identity

You should see an output similar to the following:

{
    "UserId": "AROATG5CKLSWENUYOF6A4:AWSCLI-Session",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/EC2ImageBuilderRole/AWSCLI-Session"
}

 

4. Create a folder inside of the encrypted S3 bucket generated in the s3-iam-config stack:

aws s3api put-object --bucket <input_your_bucket_name> --key components

 

5. Zip the configuration files that you use in the Image Builder pipeline process:

zip -r linux-cis.zip LinuxCis/
zip -r nginx.zip Nginx/

 

6. Upload the configuration files to S3 bucket. Update the bucket name with the S3 bucket name you generated in the s3-iam-config stack.

aws s3 cp linux-cis.zip s3://<input_your_bucket_name>/components/

aws s3 cp nginx.zip s3://<input_your_bucket_name>/components/

aws s3 cp AnsibleConfig/cis_playbook.yml s3://<input_your_bucket_name>/components/

aws s3 cp AnsibleConfig/nginx_playbook.yml s3://<input_your_bucket_name>/components/

aws s3 cp AnsibleConfig/component-nginx.yml s3://<input_your_bucket_name>/components/

Deploying your pipeline

You’re now ready to deploy your pipeline.

1. Switch back to the original IAM user profile you used before assuming the EC2ImageBuilderRole. For instructions, see How do I assume an IAM role using the AWS CLI?

 

2. Open the Parameters/nginx-image-builder-params.json file and update the ImageBuilderBucketName parameter with the S3 bucket name generated in the s3-iam-config stack:

[
  {
    "ParameterKey": "Environment",
    "ParameterValue": "dev"
  },
  {
    "ParameterKey": "ImageBuilderBucketName",
    "ParameterValue": "<input_your_bucket_name>"
  },
  {
    "ParameterKey": "NetworkStackName",
    "ParameterValue": "vpc-config"
  },
  {
    "ParameterKey": "KMSStackName",
    "ParameterValue": "kms-config"
  },
  {
    "ParameterKey": "S3ConfigStackName",
    "ParameterValue": "s3-iam-config"
  }
]

 

3. Deploy the nginx-image-builder.yml template:

aws cloudformation create-stack \
--stack-name cis-image-builder \
--template-body file://Templates/nginx-image-builder.yml \
--parameters file://Parameters/nginx-image-builder-params.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

The template takes around 35 minutes to complete. Deploying this template starts the Image Builder pipeline.

 

Monitoring the pipeline

You can get more details about the pipeline on the AWS Management Console.

1. On the Image Builder console, choose Image pipelines to see the status of the pipeline.

Figure: Shows the EC2 Image Builder Pipeline status

 

2. Choose the pipeline (for this post, cis-image-builder-LinuxCis-Pipeline)

On the pipeline details page, you can view more information and make updates to its configuration.

Figure: Shows the Image Builder Pipeline metadata

At this point, the Image Builder pipeline has started running the automation document in Systems Manager. Here you can monitor the progress of the AMI build.

 

3. On the Systems Manager console, choose Automation.

 

4. Choose the execution ID of the arn:aws:ssm:us-east-1:123456789012:document/ImageBuilderBuildImageDocument document.

Figure: Shows the Image Builder Pipeline Systems Manager Automation steps

 

5. Choose the step ID to see what is happening in each step.

At this point, the Image Builder pipeline is bringing up an Amazon Linux 2 EC2 instance. From there, we run Ansible playbooks that configure the security and application settings. The automation is pulling its configuration from the S3 bucket you deployed in a previous step. When the Ansible run is complete, the instance stops and an AMI is generated from this instance. When this is complete, a cleanup is initiated that ends the EC2 instance. The final result is a CIS Level 1 hardened Amazon Linux 2 AMI running Nginx.

 

Updating parameters

When the stack is complete, you retrieve some new parameter values.

1. On the Systems Manager console, choose Automation.

 

2. Choose the execution ID of the arn:aws:ssm:us-east-1:123456789012:document/ImageBuilderBuildImageDocument document.

 

3. Choose step 21.

The following screenshot shows the output of this step.

Figure: Shows step of EC2 Image Builder Pipeline

 

4. Open the Parameters/nginx-config.json file and update the AmiId parameter with the AMI ID generated from the previous step:

[
  {
    "ParameterKey" : "Environment",
    "ParameterValue" : "dev"
  },
  {
    "ParameterKey": "NetworkStackName",
    "ParameterValue" : "vpc-config"
  },
  {
    "ParameterKey" : "S3ConfigStackName",
    "ParameterValue" : "s3-iam-config"
  },
  {
    "ParameterKey": "AmiId",
    "ParameterValue" : "<input_the_cis_hardened_ami_id>"
  },
  {
    "ParameterKey": "ApplicationName",
    "ParameterValue" : "Nginx"
  },
  {
    "ParameterKey": "NLBName",
    "ParameterValue" : "DemoALB"
  },
  {
    "ParameterKey": "TargetGroupName",
    "ParameterValue" : "DemoTG"
  }
]

 

5. Deploy the nginx-config.yml template:

aws cloudformation create-stack \
--stack-name nginx-config \
--template-body file://Templates/nginx-config.yml \
--parameters file://Parameters/nginx-config.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/nginx-config/fb2b0f30-24f6-11eb-ad7c-0a3238f55eb3"
}

 

6. Deploy the infrastructure-ssm-params.yml template:

aws cloudformation create-stack \
--stack-name ssm-params-config \
--template-body file://Templates/infrastructure-ssm-params.yml \
--parameters file://Parameters/infrastructure-ssm-parameters.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

 

Verifying Nginx is running

Let’s verify that our Nginx service is up and running properly. You use Session Manager to connect to a testing instance.

1. On the Amazon EC2 console, choose Instances.

You should see three instances, as in the following screenshot.

Figure: Shows the Nginx EC2 instances

You can connect to either one of the Nginx instances.

 

2. Select the testing instance.

 

3. On the Actions menu, choose Connect.

 

4. Choose Session Manager.

 

5. Choose Connect.

A terminal on the EC2 instance opens, similar to the following screenshot.

Figure: Shows the Session Manager terminal

6. Run the following command to ensure that Nginx is running properly:

curl localhost:8080

You should see an output similar to the following screenshot.

Figure: Shows Nginx output from terminal

Reviewing resources and configurations

Now that you have deployed the core services that for the solution, take some time to review the services that you have just deployed.

 

IAM roles

This project creates several IAM roles that are used to manage AWS resources. For example, EC2ImageBuilderRole is used to configure new AMIs with the Image Builder pipeline. This role contains only the permissions required to manage the Image Builder process. Adopting this pattern enforces the practice of least privilege. Additionally, many of the IAM polices attached to the IAM roles are restricted down to specific AWS resources. Let’s look at a couple of examples of managing IAM permissions with this project.

 

The following policy restricts Amazon S3 access to a specific S3 bucket. This makes sure that the role this policy is attached to can only access this specific S3 bucket. If this role needs to access any additional S3 buckets, the resource has to be explicitly added.

Policies:
  - PolicyName: GrantS3Read
    PolicyDocument:
      Statement:
        - Sid: GrantS3Read
          Effect: Allow
          Action:
            - s3:List*
            - s3:Get*
            - s3:Put*
          Resource: !Sub 'arn:aws:s3:::${S3Bucket}*'

Let’s look at the EC2ImageBuilderRole. A common scenario that occurs is when you need to assume a role locally in order to perform an action on a resource. In this case, because you’re using AWS KMS to encrypt the S3 bucket, you need to assume a role that has access to decrypt the KMS key so that artifacts can be uploaded to the S3 bucket. In the following AssumeRolePolicyDocument, we allow Amazon EC2 and Systems Manager services to be assumed by this role. Additionally, we allow IAM users to assume this role as well.

AssumeRolePolicyDocument:
  Version: 2012-10-17
  Statement:
    - Effect: Allow
      Principal:
        Service:
          - ec2.amazonaws.com
          - ssm.amazonaws.com
          - imagebuilder.amazonaws.com
        AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
      Action:
        - sts:AssumeRole

The principle !Sub 'arn:aws:iam::${AWS::AccountId}:root allows for any IAM user in this account to assume this role locally. Normally, this role should be scoped down to specific IAM users or roles. For the purpose of this post, we grant permissions to all users of the account.

 

Nginx configuration

The AMI built from the Image Builder pipeline contains all of the application and security configurations required to run Nginx as a web application. When an instance is launched from this AMI, no additional configuration is required.

We use Amazon EC2 launch templates to configure the application stack. The launch templates contain information such as the AMI ID, instance type, and security group. When a new AMI is provisioned, you simply update the launch template CloudFormation parameter with the new AMI and update the CloudFormation stack. From here, you can start an Auto Scaling group instance refresh to update the application stack to use the new AMI. The Auto Scaling group is updated with instances running on the updated AMI by bringing down one instance at a time and replacing it.

 

Amazon Inspector configuration

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. With Amazon Inspector, assessments are generated for exposure, vulnerabilities, and deviations from best practices.

After performing an assessment, Amazon Inspector produces a detailed list of security findings prioritized by level of severity. These findings can be reviewed directly or as part of detailed assessment reports that are available via the Amazon Inspector console or API. We can use Amazon Inspector to assess our security posture against the CIS Level 1 standard that we use our Image Builder pipeline to provision. Let’s look at how we configure Amazon Inspector.

A resource group defines a set of tags that, when queried, identify the AWS resources that make up the assessment target. Any EC2 instance that is launched with the tag specified in the resource group is in scope for Amazon Inspector assessment runs. The following code shows our configuration:

ResourceGroup:
  Type: "AWS::Inspector::ResourceGroup"
  Properties:
    ResourceGroupTags:
      - Key: "ResourceGroup"
        Value: "Nginx"

AssessmentTarget:
  Type: AWS::Inspector::AssessmentTarget
  Properties:
    AssessmentTargetName : "NginxAssessmentTarget"
    ResourceGroupArn : !Ref ResourceGroup

In the following code, we specify the tag set in the resource group, which makes sure that when an instance is launched from this AMI, it’s under the scope of Amazon Inspector:

IBImage:
  Type: AWS::ImageBuilder::Image
  Properties:
    ImageRecipeArn: !Ref Recipe
    InfrastructureConfigurationArn: !Ref Infrastructure
    DistributionConfigurationArn: !Ref Distribution
    ImageTestsConfiguration:
      ImageTestsEnabled: false
      TimeoutMinutes: 60
    Tags:
      ResourceGroup: 'Nginx'

 

Building and deploying a new image with Amazon Inspector tests enabled

For this final portion of this post, we build and deploy a new AMI with an Amazon Inspector evaluation.

1. In your text editor, open Templates/nginx-image-builder.yml and update the pipeline and IBImage resource property ImageTestsEnabled to true.

The updated configuration should look like the following:

IBImage:
  Type: AWS::ImageBuilder::Image
  Properties:
    ImageRecipeArn: !Ref Recipe
    InfrastructureConfigurationArn: !Ref Infrastructure
    DistributionConfigurationArn: !Ref Distribution
    ImageTestsConfiguration:
      ImageTestsEnabled: true
      TimeoutMinutes: 60
    Tags:
      ResourceGroup: 'Nginx'

 

2. Update the stack with the new configuration:

aws cloudformation update-stack \
--stack-name cis-image-builder \
--template-body file://Templates/nginx-image-builder.yml \
--parameters file://Parameters/nginx-image-builder-params.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1

This starts a new AMI build with an Amazon Inspector evaluation. The process can take up to 2 hours to complete.

3. On the Amazon Inspector console, choose Assessment Runs.

Figure: Shows Amazon Inspector Assessment Run

4. Under Reports, choose Download report.

5. For Select report type, select Findings report.

6. For Select report format, select PDF.

7. Choose Generate report.

The following screenshot shows the findings report from the Amazon Inspector run.

This report generates an assessment against the CIS Level 1 standard. Any policies that don’t comply with the CIS Level 1 standard are explicitly called out in this report.

Section 3.1 lists any failed policies.

 

Figure: Shows Inspector findings

These failures are detailed later in the report, along with suggestions for remediation.

In section 4.1, locate the entry 1.3.2 Ensure filesystem integrity is regularly checked. This section shows the details of a failure from the Amazon Inspector findings report. You can also see suggestions on how to remediate the issue. Under Recommendation, the findings report suggests a specific command that you can use to remediate the issue.

 

Figure: Shows Inspector findings issue

You can use the Image Builder pipeline to simply update the Ansible playbooks with this setting, then run the Image Builder pipeline to build a new AMI, deploy the new AMI to an EC2 Instance, and run the Amazon Inspector report to ensure that the issue has been resolved. Finally, we can see the specific instances that have been assessed that have this issue.

Organizations often customize security settings based off of a given use case. Your organization may choose CIS Level 1 as a standard but elect to not apply all the recommendations. For example, you might choose to not use the FirewallD service on your Linux instances, because you feel that using Amazon EC2 security groups gives you enough networking security in place that you don’t need an additional firewall. Disabling FirewallD causes a high severity failure in the Amazon Inspector report. This is expected and can be ignored when evaluating the report.

 

Conclusion
In this post, we showed you how to use Image Builder to automate the creation of AMIs. Additionally, we also showed you how to use the AWS CLI to deploy CloudFormation stacks. Finally, we walked through how to evaluate resources against CIS Level 1 Standard using Amazon Inspector.

 

About the Authors

 

Joe Keating is a Modernization Architect in Professional Services at Amazon Web Services. He works with AWS customers to design and implement a variety of solutions in the AWS Cloud. Joe enjoys cooking with a glass or two of wine and achieving mediocrity on the golf course.

 

 

 

Virginia Chu is a Sr. Cloud Infrastructure Architect in Professional Services at Amazon Web Services. She works with enterprise-scale customers around the globe to design and implement a variety of solutions in the AWS Cloud.

 

Creating a cross-region Active Directory domain with AWS Launch Wizard for Microsoft Active Directory

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/compute/creating-a-cross-region-active-directory-domain-with-aws-launch-wizard-for-microsoft-active-directory/

AWS Launch Wizard is a console-based service to quickly and easily size, configure, and deploy third party applications, such as Microsoft SQL Server Always On and HANA based SAP systems, on AWS without the need to identify and provision individual AWS resources. AWS Launch Wizard offers an easy way to deploy enterprise applications and optimize costs. Instead of selecting and configuring separate infrastructure services, you go through a few steps in the AWS Launch Wizard and it deploys a ready-to-use application on your behalf. It reduces the time you need to spend on investigating how to provision, cost and configure your application on AWS.

You can now use AWS Launch Wizard to deploy and configure self-managed Microsoft Windows Server Active Directory Domain Services running on Amazon Elastic Compute Cloud (EC2) instances. With Launch Wizard, you can have fully-functioning, production-ready domain controllers within a few hours—all without having to manually deploy and configure your resources.

You can use AWS Directory Service to run Microsoft Active Directory (AD) as a managed service, without the hassle of managing your own infrastructure. If you need to run your own AD infrastructure, you can use AWS Launch Wizard to simplify the deployment and configuration process.

In this post, I walk through creation of a cross-region Active Directory domain using Launch Wizard. First, I deploy a single Active Directory domain spanning two regions. Then, I configure Active Directory Sites and Services to match the network topology. Finally, I create a user account to verify replication of the Active Directory domain.

Diagram of Resources deployed in this post

Figure 1: Diagram of resources deployed in this post

Prerequisites

  1. You must have a VPC in your home. Additionally, you must have remote regions that have CIDRs that do not overlap with each other. If you need to create VPCs and subnets that do not overlap, please refer here.
  2. Each subnet used must have outbound internet connectivity. Feel free to either use a NAT Gateway or Internet Gateway.
  3. The VPCs must be peered in order to complete the steps in this post. For information on creating a VPC Peering connection between regions, please refer here.
  4. If you choose to deploy your Domain Controllers to a private subnet, you must have an RDP jump / bastion instance setup to allow you to RDP to your instance.

Deploy Your Domain Controllers in the Home Region using Launch Wizard

In this section, I deploy the first set of domain controllers into the us-east-1 the home region using Launch Wizard. I refer to US-East-1 as the home region, and US-West-2 as the remote region.

  1. In the AWS Launch Wizard Console, select Active Directory in the navigation pane on the left.
  2. Select Create deployment.
  3. In the Review Permissions page, select Next.
  4. In the Configure application settings page set the following:
    • General:
      • Deployment name: UsEast1AD
    • Active Directory (AD) installation
      • Installation type: Active Directory on EC2
    • Domain Settings:
      • Number of domain controllers: 2
      • AMI installation type: License-included AMI
    • License-included AMI: ami-################# | Windows_Server-2019-English-Full-Base-202#-##-##
    • Connection type: Create new Active Directory
    • Domain DNS name: corp.example.com
    • Domain NetBIOS Name: CORP
    • Connectivity:
      • Key Pair Name: Choose and exiting Key pair or select and existing one.
      • Virtual Private Cloud (VPC): Select Virtual Private Cloud (VPC)
    • VPC: Select your home region VPC
    • Availability Zone (AZ) and private subnets:
      • Select 2 Availability Zones
      • Choose the proper subnet in each subnet
      • Assign a Controller IP address for each domain controller
    • Remote Desktop Gateway preferences: Disregard for now, this is set up later.
    • Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box.
  1. Select Next.
  2. In the Define infrastructure requirements page, set the following inputs.
    • Storage and compute: Based on infrastructure requirements
    • Number of AD users: Up to 5000 users
  3. Select Next.
  4. In the Review and deploy page, review your selections. Then, select Deploy.

Note that it may take up to 2 hours for your domain to be deployed. Once the status has changed to Completed, you can proceed to the next section. In the next section, I prepare Active Directory Sites and Services for the second set of domain controller in my other region.

Configure Active Directory Sites and Services

In this section, I configure the Active Directory Sites and Services topology to match my network topology. This step ensures proper Active Directory replication routing so that domain clients can find the closest domain controller. For more information on Active Directory Sites and Services, please refer here.

Retrieve your Administrator Credentials from Secrets Manager

  1. From the AWS Secrets Manager Console in us-east-1, select the Secret that begins with LaunchWizard-UsEast1AD.
  2. In the middle of the Secret page, select Retrieve secret value.
    1. This will display the username and password key with their values.
    2. You need these credentials when you RDP into one of the domain controllers in the next steps.

Rename the Default First Site

  1. Log in to the one of the domain controllers in us-east-1.
  2. Select Start, type dssite and hit Enter on your keyboard.
  3. The Active Directory Sites and Services MMC should appear.
    1. Expand Sites. There is a site named Default-First-Site-Name.
    2. Right click on Default-First-Site-Name select Rename.
    3. Enter us-east-1 as the name.
  4. Leave the Active Directory Sites and Services MMC open for the next set of steps.

Create a New Site and Subnet Definition for US-West-2

  1. Using the Active Directory Sites and Services MMC from the previous steps, right click on Sites.
  2. Select New Site… and enter the following inputs:
    • Name: us-west-2
    • Select DEFAULTIPSITELINK.
  3.  Select OK.
  4. A pop up will appear telling you there will need to be some additional configuration. Select OK.
  5. Expand Sites and right click on Subnets and select New Subnet.
  6. Enter the following information:
    • Prefix: the CIDR of your us-west-2 VPC. An example would be 1.0.0/24
    • Site: select us-west-2
  7. Select OK.
  8. Leave the Active Directory Sites and Services MMC open for the following set of steps.

Configure Site Replication Settings

Using the Active Directory Sites and Services MMC from the previous steps, expand Sites, Inter-Site Transports, and select IP. You should see an object named DEFAULTIPSITELINK,

  1. Right click on DEFAULTIPSITELINK.
  2. Select Properties. Set or verify the following inputs on the General tab:
  3. Select Apply.
  4. In the DEFAULTIPSITELINK Properties, select the Attribute Editor tab and modify the following:
    • Scroll down and double click on Enter 1 for the Value, then select OK twice.
      • For more information on these settings, please refer here.
  5. Close the Active Directory Sites and Services MMC, as it is no longer needed.

Prepare Your Home Region Domain Controllers Security Group

In this section, I modify the Domain Controllers Security Group in us-east-1. This allows the domain controllers deployed in us-west-2 to communicate with each other.

  1. From the Amazon Elastic Compute Cloud (Amazon EC2) console, select Security Groups under the Network & Security navigation section.
  2. Select the Domain Controllers Security Group that was created with Launch Wizard Active Directory.
  3. Select Edit inbound rules. The Security Group should start with LaunchWizard-UsEast1AD-.
  4. Choose Add rule and enter the following:
    • Type: Select All traffic
    • Protocol: All
    • Port range: All
    • Source: Select Custom
    • Enter the CIDR of your remote VPC. An example would be 1.0.0/24
  5. Select Save rules.

Create a Copy of Your Administrator Secret in Your Remote Region

In this section, I create a Secret in Secrets Manager that contains the Administrator credentials when I created a home region.

  1. Find the Secret that being with LaunchWizard-UsEast1AD from the AWS Secrets Manager Console in us-east-1.
  2. In the middle of the Secret page, select Retrieve secret value.
    • This displays the username and password key with their values. Make note of these keys and values, as we need them for the next steps.
  3. From the AWS Secrets Manager Console, change the region to us-west-2.
  4. Select Store a new secret. Then, enter the following inputs:
    • Select secret type: Other type of secrets
    • Add your first keypair
    • Select Add row to add the second keypair
  5. Select Next, then enter the following inputs.
    • Secret name: UsWest2AD
    • Select Next twice
    • Select Store

Deploy Your Domain Controllers in the Remote Region using Launch Wizard

In this section, I deploy the second set of domain controllers into the us-west-1 region using Launch Wizard.

  1. In the AWS Launch Wizard Console, select Active Directory in the navigation pane on the left.
  2. Select Create deployment.
  3. In the Review Permissions page, select Next.
  4. In the Configure application settings page, set the following inputs.
    • General
      • Deployment name: UsWest2AD
    • Active Directory (AD) installation
      • Installation type: Active Directory on EC2
    • Domain Settings:
      • Number of domain controllers: 2
      • AMI installation type: License-included AMI
      • License-included AMI: ami-################# | Windows_Server-2019-English-Full-Base-202#-##-##
    • Connection type: Add domain controllers to existing Active Directory
    • Domain DNS name: corp.example.com
    • Domain NetBIOS Name: CORP
    • Domain Administrator secret name: Select you secret you created above.
    • Add permission to secret
      • After you verified the Secret you created above has the policy listed. Check the checkbox confirming the secret has the required policy.
    • Domain DNS IP address for resolution: The private IP of either domain controller in your home region
    • Connectivity:
      • Key Pair Name: Choose an existing Key pair
      • Virtual Private Cloud (VPC): Select Virtual Private Cloud (VPC)
    • VPC: Select your home region VPC
    • Availability Zone (AZ) and private subnets:
      • Select 2 Availability Zones
      • Choose the proper subnet in each subnet
      • Assign a Controller IP address for each domain controller
    • Remote Desktop Gateway preferences: disregard for now, as I set this later.
    • Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box
  1. In the Define infrastructure requirements page set the following:
    • Storage and compute: Based on infrastructure requirements
    • Number of AD users: Up to 5000 users
  2. In the Review and deploy page, review your selections. Then, select Deploy.

Note that it may take up to 2 hours to deploy domain controllers. Once the status has changed to Completed, proceed to the next section. In this next section, I prepare Active Directory Sites and Services for the second set of domain controller in another region.

Prepare Your Remote Region Domain Controllers Security Group

In this section, I modify the Domain Controllers Security Group in us-west-2. This allows the domain controllers deployed in us-west-2 to communicate with each other.

  1. From the Amazon Elastic Compute Cloud (Amazon EC2) console, select Security Groups under the Network & Security navigation section.
  2. Select the Domain Controllers Security Group that was created by your Launch Wizard Active Directory.
  3. Select Edit inbound rules. The Security Group should start with LaunchWizard-UsWest2AD-EC2ADStackExistingVPC-
  4. Choose Add rule and enter the following:
    • Type: Select All traffic
    • Protocol: All
    • Port range: All
    • Source: Select Custom
    • Enter the CIDR of your remote VPC. An example would be 0.0.0/24
  5. Choose Save rules.

Create an AD User and Verify Replication

In this section, I create a user in one region and verify that it replicated to the other region. I also use AD replication diagnostics tools to verify that replication is working properly.

Create a Test User Account

  1. Log in to one of the domain controllers in us-east-1.
  2. Select Start, type dsa and press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
  3. Right click on the Users container and select New > User.
  4. Enter the following inputs:
    • First name: John
    • Last name: Doe
    • User logon name: jdoe and select Next
    • Password and Confirm password: Your choice of complex password
    • Uncheck User must change password at next logon
  5. Select Next.
  6. Select Finish.

Verify Test User Account Has Replicated

  1. Log in to the one of the domain controllers in us-west-2.
  2. Select Start and type dsa.
  3. Then, press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
  4. Select Users. You should see a user object named John Doe.

Note that if the user is not present, it may not have been replicated yet. Replication should not take longer than 60 seconds from when the item was created.

Summary

Congratulations, you have created a cross-region Active Directory! In this post you:

  1. Launched a new Active Directory forest in us-east-1 using AWS Launch Wizard.
  2. Configured Active Directory Sites and Service for a multi-region configuration.
  3. Launched a set of new domain controllers in the us-west-2 region using AWS Launch Wizard.
  4. Created a test user and verified replication.

This post only touches on a couple of features that are available in the AWS Launch Wizard Active Directory deployment. AWS Launch Wizard also automates the creation of a Single Tier PKI infrastructure or trust creation. One of the prime benefits of this solution is the simplicity in deploying a fully functional Active Directory environment in just a few clicks. You no longer need to do the undifferentiated heavy lifting required to deploy Active Directory.  For more information, please refer to AWS Launch Wizard documentation.

Automate domain join for Amazon EC2 instances from multiple AWS accounts and Regions

Post Syndicated from Sanjay Patel original https://aws.amazon.com/blogs/security/automate-domain-join-for-amazon-ec2-instances-multiple-aws-accounts-regions/

As organizations scale up their Amazon Web Services (AWS) presence, they are faced with the challenge of administering user identities and controlling access across multiple accounts and Regions. As this presence grows, managing user access to cloud resources such as Amazon Elastic Compute Cloud (Amazon EC2) becomes increasingly complex. AWS Directory Service for Microsoft Active Directory (also known as an AWS Managed Microsoft AD) makes it easier and more cost-effective for you to manage this complexity. AWS Managed Microsoft AD is built on highly available, AWS managed infrastructure. Each directory is deployed across multiple Availability Zones, and monitoring automatically detects and replaces domain controllers that fail. In addition, data replication and automated daily snapshots are configured for you. You don’t have to install software, and AWS handles all patching and software updates. AWS Managed Microsoft AD enables you to leverage your existing on-premises user credentials to access cloud resources such as the AWS Management Console and EC2 instances.

This blog post describes how EC2 resources launched across multiple AWS accounts and Regions can automatically domain-join a centralized AWS Managed Microsoft AD. The solution we describe in this post is implemented for both Windows and Linux instances. Removal of Computer objects from Active Directory upon instance termination is also implemented. The solution uses Amazon DynamoDB to centrally store account and directory information in a central security account. We also provide AWS CloudFormation templates and platform-specific domain join scripts for you to use with AWS Lambda as a quick start solution.

Architecture

The following diagram shows the domain-join process for EC2 instances across multiple accounts and Regions using AWS Managed Microsoft AD.

Figure 1: EC2 domain join architecture

Figure 1: EC2 domain join architecture

The event flow works as follows:

  1. An EC2 instance is launched in a peered virtual private cloud (VPC) of a workload or security account. VPCs that are hosting EC2 instances need to be peered with the VPC that contains AWS Managed Microsoft AD to enable network connectivity with Active Directory.
  2. An Amazon CloudWatch Events rule detects an EC2 instance in the “running” state.
  3. The CloudWatch event is forwarded to a regional CloudWatch event bus in the security account.
  4. If the CloudWatch event bus is in the same Region as AWS Managed Microsoft AD, it delivers the event to an Amazon Simple Queue Service (Amazon SQS) queue, referred to as the domain-join queue in this post.
  5. If the CloudWatch event bus is in a different Region from AWS Managed Microsoft AD, it delivers the event to an Amazon Simple Notification Service (Amazon SNS) topic. The event is then delivered to the domain-join queue described in step 4, through the Amazon SNS topic subscription.
  6. Messages in the domain-join queue are held for five minutes to allow for EC2 instances to stabilize after they reach the “running” state. This delay allows time for installation of additional software components and agents through the use of EC2 user data and AWS Systems Manager Distributor.
  7. After the holding period is over, messages in the domain-join queue invoke the AWS AD Join/Leave Lambda function. The Lambda function does the following:
    1. Retrieves the AWS account ID that originated the event from the message and retrieves account-specific configurations from a DynamoDB table. This configuration identifies AWS Managed Microsoft AD domain controller IPs, credentials required to perform EC2 domain join, and an AWS Identity and Access Management (IAM) role that can be assumed by the Lambda function to invoke AWS Systems Manager Run Command.
    2. If needed, uses AWS Security Token Service (AWS STS) and prepares a cross-account access session.
    3. Retrieves EC2 instance information, such as the instance state, platform, and tags, and validates the instance state.
    4. Retrieves platform-specific domain-join scripts that are deployed with the Lambda function’s code bundle, and configures invocation of those scripts by using data read from the DynamoDB table (bash script for Linux instances and PowerShell script for Windows instances).
    5. Uses AWS Systems Manager Run Command to invoke the domain-join script on the instance. Run Command enables you to remotely and securely manage the configuration of your managed instances.
    6. The domain-join script runs on the instance. It uses script parameters and instance attributes to configure the instance and perform the domain join. The adGroupName tag value is used to configure the Active Directory user group that will have permissions to log in to the instance. The instance is rebooted to complete the domain join process. Various software components are installed on the instance when the script runs. For the Linux instance, sssd, realmd, krb5, samba-common, adcli, unzip, and packageit are installed. For the Windows instance, the RDS-RD-Server feature is installed.

Removal of EC2 instances from AWS Managed Microsoft AD upon instance termination follows a similar sequence of steps. Each instance that is domain joined creates an Active Directory domain object under the “Computer” hierarchy. This domain object needs to be removed upon instance termination so that a new instance that uses the same private IP address in the subnet (at a future time) can successfully domain join and enable instance access with Active Directory credentials. Removal of the Active Directory Computer object is done by running the leaveDomaini.ps1 script (included with this blog) through Run Command on the Active Directory Tools instance identified in Figure 1.

Prerequisites and setup

To build the solution outlined in this post, you need:

  • AWS Managed Microsoft AD with an appropriate DNS name (for example, example.com). For more information about getting started with AWS Managed Microsoft AD, see Create Your AWS Managed Microsoft AD directory.
  • AD Tools. To install AD Tools and use it to create the required users:
    1. Launch a Windows EC2 instance in the same account and Region, and domain-join it with the directory you created in the previous step. Log in to the instance through Remote Desktop Protocol (RDP) and install AD Tools as described in Installing the Active Directory Administration Tools.
    2. After the AD Tools are installed, launch the AD Users & Computers application to create domain users, and assign those users to an Active Directory security group (for example, my_UserGroup) that has permission to access domain-joined instances.
    3. Create a least-privileged user for performing domain joins as described in Delegate Directory Join Privileges for AWS Managed Microsoft AD. The identity of this user is stored in the DynamoDB table and read by the AD Join Lambda function to invoke Active Directory join scripts.
    4. Store the password for the least-privileged user in an encrypted Systems Manager parameter. The password for this user is stored in the secure string System Manager parameter and read by the AD Join Lambda function at runtime while processing Amazon SQS messages.
    5. Assign a unique tag key and value to identify the AD Tools instance. This instance will be invoked by the Lambda function to delete Computer objects from Active Directory upon termination of domain-joined instances.
  • All VPCs that are hosting EC2 instances to be domain joined must be peered with the VPC that hosts the relevant AWS Managed Microsoft AD. Alternatively, AWS Transit Gateway could be used to establish this connectivity.
  • In addition to having network connectivity to the AWS Managed Microsoft AD domain controllers, domain join scripts that run on EC2 instances must be able to resolve relevant Active Directory resource records. In this solution, we leverage Amazon Route 53 Outbound Resolver to forward DNS queries to the AWS Managed Microsoft AD DNS servers, while still preserving the default DNS capabilities that are available to the VPC. Learn more about deploying Route 53 Outbound Resolver and resolver rules to resolve your directory DNS name to DNS IPs.
  • Each domain-join EC2 instance must have a Systems Manager Agent (SSM Agent) installed and an IAM role that provides equivalent permissions as provided by the AmazonEC2RoleforSSM built-in policy. The SSM Agent is used to allow domain-join scripts to run automatically. See Working with SSM Agent for more information on installing and configuring SSM Agents on EC2 instances.

Solution deployment

The steps in this section deploy AD Join solution components by using the AWS CloudFormation service.

The CloudFormation template provided with this solution (mad_auto_join_leave.json) deploys resources that are identified in the security account’s AWS Region that hosts AWS Managed Microsoft AD (the top left quadrant highlighted in Figure 1). The template deploys a DynamoDB resource with 5 read and 5 write capacity units. This should be adjusted to match your usage. DynamoDB also provides the ability to auto-scale these capacities. You will need to create and deploy additional CloudFormation stacks for cross-account, cross-Region scenarios.

To deploy the solution

  1. Create a versioned Amazon Simple Storage Service (Amazon S3) bucket to store a zip file (for example, adJoinCode.zip) that contains Python Lambda code and domain join/leave bash and PowerShell scripts. Upload the source code zip file to an S3 bucket and find the version associated with the object.
  2. Navigate to the AWS CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
  3. Choose Upload a template file (for this solution, mad_auto_join_leave.json), select the CloudFormation stack file, and then choose Next.
  4. Enter the stack name and values for the other parameters, and then choose Next.
    Figure 2: Defining the stack name and parameters

    Figure 2: Defining the stack name and parameters

    The parameters are defined as follows:

  • S3CodeBucket: The name of the S3 bucket that holds the Lambda code zip file object.
  • adJoinLambdaCodeFileName: The name of the Lambda code zip file that includes Lambda Python code, bash, and Powershell scripts.
  • adJoinLambdaCodeVersion: The S3 Version ID of the uploaded Lambda code zip file.
  • DynamoDBTableName: The name of the DynamoDB table that will hold account configuration information.
  • CreateDynamoDBTable: The flag that indicates whether to create a new DynamoDB table or use an existing table.
  • ADToolsHostTagKey: The tag key of the Windows EC2 instance that has AD Tools installed and that will be used for removal of Active Directory Computer objects upon instance termination.
  • ADToolsHostTagValue: The tag value for the key identified by the ADToolsHostTagKey parameter.
  • Acknowledge creation of AWS resources and choose to continue to deploy AWS resources through AWS CloudFormation.The CloudFormation stack creation process is initiated, and after a few minutes, upon completion, the stack status is marked as CREATE_COMPLETE. The following resources are created when the CloudFormation stack deploys successfully:
    • An AD Join Lambda function with associated scripts and IAM role.
    • A CloudWatch Events rule to detect the “running” and “terminated” states for EC2 instances.
    • An SQS event queue to hold the EC2 instance “running” and “terminated” events.
    • CloudWatch event mapping to the SQS event queue and further to the Lambda function.
    • A DynamoDB table to hold the account configuration (if you chose this option).

The DynamoDB table hosts account-level configurations. Account-specific configuration is required for an instance from a given account to join the Active Directory domain. Each DynamoDB item contains the account-specific configuration shown in the following table. Storing account-level information in the DynamoDB table provides the ability to use multiple AWS Managed Microsoft AD directories and group various accounts accordingly. Additional account configurations can also be stored in this table for implementation of various centralized security services (instance inspection, patch management, and so on).

Attribute Description
accountId AWS account number
adJoinUserName User ID with AD Join permissions
adJoinUserPwParam Encrypted Systems Manager parameter containing the AD Join user’s password
dnsIP1 Domain controller 1 IP address2
dnsIP2 Domain controller 2 IP address
assumeRoleARN Amazon Resource Name (ARN) of the role assumed by the AD Join Lambda function

Following is an example of how you could insert an item (row) in a DynamoDB table for an account.

aws dynamodb put-item --table-name <DynamoDB-Table-Name> --item file://itemData.json

where itemData.json is as follows.

{
    "accountId": { "S": "123412341234" },
    " adJoinUserName": { "S": "ADJoinUser" },
    " adJoinUserPwParam": { "S": "ADJoinUser-PwParam" },
    "dnsName": { "S": "example.com" },
    "dnsIP1": { "S": "192.0.2.1" },
    "dnsIP2": { "S": "192.0.2.2" },
    "assumeRoleARN": { "S": "arn:aws:iam::111122223333:role/adJoinLambdaRole" }
}

(Update with your own values as appropriate for your environment.)

In the preceding example, adJoinLambdaRole is assumed by the AD Join Lambda function (if needed) to establish cross-account access using AWS Security Token Service (AWS STS). The role needs to provide sufficient privileges for the AD Join Lambda function to retrieve instance information and run cross-account Systems Manager commands.

adJoinUserName identifies a user with the minimum privileges to do the domain join; you created this user in the prerequisite steps.

adJoinUserPwParam identifies the name of the encrypted Systems Manager parameter that stores the password for the AD Join user. You created this parameter in the prerequisite steps.

Solution test

After you successfully deploy the solution using the steps in the previous section, the next step is to test the deployed solution.

To test the solution

  1. Navigate to the AWS EC2 console and launch a Linux instance. Launch the instance in a public subnet of the available VPC.
  2. Choose an IAM role that gives at least AmazonEC2RoleforSSM permissions to the instance.
  3. Add an adGroupName tag with the value that identifies the name of the Active Directory security group whose members should have access to the instance.
  4. Make sure that the security group associated with your instance has permissions for your IP address to log in to the instance by using the Secure Shell (SSH) protocol.
  5. Wait for the instance to launch and perform the Active Directory domain join. You can navigate to the AWS SQS console and observe a delayed message that represents the CloudWatch instance “running” event. This message is processed after five minutes; after that you can observe the Lambda function’s message processing log in CloudWatch logs.
  6. Log in to the instance with Active Directory user credentials. This user must be the member of the Active Directory security group identified by the adGroupName tag value. Following is an example login command.
    ssh ‘[email protected]’@<public-dns-name|public-ip-address>
    

  7. Similarly, launch a Windows EC2 instance to validate the Active Directory domain join by using Remote Desktop Protocol (RDP).
  8. Terminate domain-joined instances. Log in to the AD Tools instance to validate that the Active Directory Computer object that represents the instance is deleted.

The AD Join Lambda function invokes Systems Manager commands to deliver and run domain join scripts on the EC2 instances. The AWS-RunPowerShellScript command is used for Microsoft Windows instances, and the AWS-RunShellScript command is used for Linux instances. Systems Manager command parameters and execution status can be observed in the Systems Manager Run Command console.

The AD user used to perform the domain join is a least-privileged user, as described in Delegate Directory Join Privileges for AWS Managed Microsoft AD. The password for this user is passed to instances by way of SSM Run Commands, as described above. The password is visible in the SSM Command history log and in the domain join scripts run on the instance. Alternatively, all script parameters can be read locally on the instance through the “adjoin” encrypted SSM parameter. Refer to the domain join scripts for details of the “adjoin” SSM parameter.

Additional information

Directory sharing

AWS Managed Microsoft AD can be shared with other AWS accounts in the same Region. Learn how to use this feature and seamlessly domain join Microsoft Windows EC2 instances and Linux instances.

autoadjoin tag

Launching EC2 instances with an autoadjoin tag key with a “false” value excludes the instance from the automated Active Directory join process. You might want to do this in scenarios where you want to install additional agent software before or after the Active Directory join process. You can invoke domain join scripts (bash or PowerShell) by using user data or other means. However, you’ll need to reboot the instance and re-run scripts to complete the domain join process.

Summary

In this blog post, we demonstrated how you could automate the Active Directory domain join process for EC2 instances to AWS Managed Microsoft AD across multiple accounts and Regions, and also centrally manage this configuration by using AWS DynamoDB. By adopting this model, administrators can centrally manage Active Directory–aware applications and resources across their accounts.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sanjay Patel

Sanjay is a Senior Cloud Application Architect with AWS Professional Services. He has a diverse background in software design, enterprise architecture, and API integrations. He has helped AWS customers automate infrastructure security. He enjoys working with AWS customers to identify and implement the best fit solution.

Author

Vaibhawa Kumar

Vaibhawa is a Senior Cloud Infrastructure Architect with AWS Professional Services. He helps customers with the architecture, design, and automation to build innovative, secured, and highly available solutions using various AWS services. In his free time, you can find him spending time with family, sports, and cooking.

Author

Kevin Higgins

Kevin is a Senior Cloud Infrastructure Architect with AWS Professional Services. He helps customers with the architecture, design, and development of cloud-optimized infrastructure solutions. As a member of the Microsoft Global Specialty Practice, he collaborates with AWS field sales, training, support, and consultants to help drive AWS product feature roadmap and go-to-market strategies.

Now in Preview – Larger & Faster io2 Block Express EBS Volumes with Higher Throughput

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-in-preview-larger-faster-io2-ebs-volumes-with-higher-throughput/

Amazon Elastic Block Store (EBS) volumes have been an essential EC2 component since they were launched in 2008. Today, you can choose between six types of HDD and SSD volumes, each designed to serve a particular use case and to deliver a specified amount of performance.

Earlier this year we launched io2 volumes with 100x higher durability and 10x more IOPS/GiB than the earlier io1 volumes. The io2 volumes are a great fit for your most I/O-hungry and latency-sensitive applications, including high-performance, business-critical workloads.

Even More
Today we are opening up a preview of io2 Block Express volumes that are designed to deliver even higher performance!

Built on our new EBS Block Express architecture that takes advantage of some advanced communication protocols implemented as part of the AWS Nitro System, the volumes will give you up to 256K IOPS & 4000 MBps of throughput and a maximum volume size of 64 TiB, all with sub-millisecond, low-variance I/O latency. Throughput scales proportionally at 0.256 MB/second per provisioned IOPS, up to a maximum of 4000 MBps per volume. You can provision 1000 IOPS per GiB of storage, twice as many as before. The increased volume size & higher throughput means that you will no longer need to stripe multiple EBS volumes together, reducing complexity and management overhead.

Block Express is a modular storage system that is designed to increase performance and scale. Scalable Reliable Datagrams (as described in A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC) are implemented using custom-built, dedicated hardware, making communication between Block Express volumes and Nitro-powered EC2 instances fast and efficient. This is, in fact, the same technology that the Elastic Fabric Adapter (EFA) uses to support high-end HPC and Machine Learning workloads on AWS,

Putting it all together, these volumes are going to deliver amazing performance for your SAP HANA, Microsoft SQL Server, Oracle, and Apache Cassandra workloads, and for your mission-critical transaction processing applications such as airline reservation systems and banking that once mandated the use of an expensive and inflexible SAN (Storage Area Network).

Join the Preview
The preview is currently available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), and Europe (Frankfurt) Regions. During the preview, we support the use of R5b instances, with support for other Nitro-powered instances in the works.

You can opt-in to the preview on a per-account, per-region basis, create new io2 Block Express volumes, and then attach them to R5b instances. All newly created io2 volumes in that account/region will then make use of Block Express, and will perform as described above.

This is still a work in progress. We’re still adding support for a couple of features (Multi-Attach, Elastic Volumes, and Fast Snapshot Restore) and we’re building a new I/O fencing feature so that you can attach the same volume to multiple instances while ensuring consistent access and protecting shared data.

The volumes support encryption, but you can’t create encrypted volumes from unencrypted AMIs or snapshots, or from encrypted AMIs or snapshots that were shared from another AWS account. We expect to take care of all of these items during the preview. To learn more, visit the io2 page and read the io2 documentation.

To get started, opt-in to the io2 Block Express Preview today!

Jeff;

 

Coming Soon – Amazon EC2 G4ad Instances Featuring AMD GPUs for Graphics Workloads

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/new-amazon-ec2-g4ad-instances-featuring-amd-gpus-for-graphics-workloads/

Customers with high performance graphic workloads, such as game streaming, animation, and video rendering for example, are always looking for higher performance with less cost. Today, I’m happy to announce new Amazon Elastic Compute Cloud (EC2) instances in the G4 instance family are in the works and will be available soon, to improve performance and reduce cost for graphics-intensive workloads. The new G4ad instances feature AMD’s latest Radeon Pro V520 GPUs and 2nd generation EPYC processors, and are the first in EC2 to feature AMD GPUs.

G4dn instances, released in 2019 and featuring NVIDIA T4 GPUs, were previously the most cost-effective GPU-based instances in EC2. G4dn instances are ideal for deploying machine learning models in production and also graphics-intensive applications. However, when compared to G4dn the new G4ad instances enable up to 45% better price performance for graphics-intensive workloads, including the aforementioned game streaming, remote graphics workstations, and rendering scenarios. Compared to an equally-sized G4dn instance, G4ad instances offer up to 40% improvement in performance.

G4dn instances will continue to be the best option for small-scale machine learning (ML) training and GPU-based ML inference due to included hardware optimizations like Tensor Cores. Additionally, G4dn instances are still best suited for graphics applications that need access to NVIDIA libraries such as CUDA, CuDNN, and NVENC. However, when there is no dependency on NVIDIA’s libraries, we recommend customers try the G4ad instances to benefit from the improved price and performance.

AMD Radeon Pro V520 GPUs in G4ad instances support DirectX 11/12, Vulkan 1.1, and OpenGL 4.5 APIs. For operating systems, customers can choose from Windows Server 2016/2019, Amazon Linux 2, Ubuntu 18.04.3, and CentOS 7.7. Instances using G4ad can be purchased as On-Demand, Savings Plan, Reserved Instances, or Spot Instances. Three instance sizes are available, from G4ad.4xlarge, with 1 GPU, to G4ad.16xlarge with 4 GPUs, as shown below.

Instance Size GPUs GPU Memory (GB) vCPUs Memory (GB) Storage EBS Bandwidth (Gbps) Network Bandwidth (Gbps)
g4ad.4xlarge 1 8 16 64 600 Up to 3 Up to 10
g4ad.8xlarge 2 16 32 128 1200 3 15
g4ad.16xlarge 4 32 64 256 2400 6 25

The new G4ad instances will be available soon in US East (N. Virginia), US West (Oregon), and Europe (Ireland).

Learn more about G4ad instances.

— Steve

New EC2 M5zn Instances – Fastest Intel Xeon Scalable CPU in the Cloud

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-ec2-m5zn-instances-fastest-intel-xeon-scalable-cpu-in-the-cloud/

We launched the compute-intensive z1d instances in mid-2018 for customers who asked us for extremely high per-core performance and a high memory-to-core ratio to power their front-end Electronic Design Automation (EDA), actuarial, and CPU-bound relational database workloads.

In order to address a complementary set of use cases, customers have asked us for an EC2 instance that will give them high per-core performance like z1d, with no local NVMe storage, higher networking throughput, and a reduced memory-to-vCPU ratio. They have indicated if we built an instance with this set of attributes, it would be an excellent fit for workloads such as gaming, financial applications, simulation modeling applications such as those used in the automobile, aerospace, energy and telecommunication industries, and High Performance Computing (HPC).

Introducing M5zn
Building on the success of the z1d instances, we are launching M5zn instances in seven sizes today. These instances use 2nd generation custom Intel® Xeon® Scalable (Cascade Lake) processors with a sustained all-core turbo clock frequency of up to 4.5 GHz. M5zn instances feature high frequency processing, are a variant of the general-purpose M5 instances, and are built on the AWS Nitro System. These instances also feature low latency 100 Gbps networking and the Elastic Fabric Adapter (EFA), in order to improve performance for HPC and communication-intensive applications.

Here are the M5zn instances (all VPC-only, HVM-only, and EBS-Optimized, with support for Optimize vCPU). As you can see, the memory-to-vCPU ratio on these instances is half that of the existing z1d instances:

Instance Name vCPUs
RAM
Network Bandwidth EBS-Optimized Bandwidth
m5zn.large 2 8 GiB Up to 25 Gbps Up to 3.170 Gbps
m5zn.xlarge 4 16 GiB Up to 25 Gbps Up to 3.170 Gbps
m5zn.2xlarge 8 32 GiB Up to 25 Gbps 3.170 Gbps
m5zn.3xlarge 12 48 GiB Up to 25 Gbps 4.750 Gbps
m5zn.6xlarge 24 96 GiB 50 Gbps 9.500 Gbps
m5zn.12xlarge 48 192 GiB 100 Gbps 19 Gbps
m5zn.metal 48 192 GiB 100 Gbps 19 Gbps

The Nitro Hypervisor allows M5zn instances to deliver performance that is just about indistinguishable from bare metal. Other AWS Nitro System components such as the Nitro Security Chip and hardware-based processing for EBS increase performance, while VPC encryption provides greater security.

Things To Know
Here are a couple of “fun facts” about the M5zn instances:

Placement Groups – M5zn instances can be used in Cluster (for low latency and high network throughput), Spread (to keep critical instances separate from each other), and Partition (to reduce correlated failures) placement groups.

Networking – M5zn instances support the Elastic Network Adapter (ENA) with dedicated 100 Gbps network connections and a dedicated 19 Gbps connection to EBS. If you are building distributed ML or HPC applications for use on a cluster of M5zn instances, be sure to take a look at the Elastic Fabric Adapter (EFA). Your HPC applications can use the Message Passing Interface (MPI) to communicate efficiently at high speed while scaling to thousands of nodes.

C-State Control – You can configure CPU Power Management on m5zn.6xlarge and m5zn.12xlarge instances. This is definitely an advanced feature, but one worth exploring in those situations where you need to squeeze every possible cycle of available performance from the instance.

NUMA – You can make use of Non-Uniform Memory Access on m5zn.12xlarge instances. This is also an advanced feature, but worth exploring in situations where you have an in-depth understanding of your application’s memory access patterns.

To learn more about these and other features, visit the EC2 M5 Instances page.

Available Now
As you can see, the M5zn instances are a great fit for gaming, HPC and simulation modeling workloads such as those used by the financial, automobile, aerospace, energy, and telecommunications industries.

You can launch M5zn instances today in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Ireland), Europe (Frankfurt), and Asia Pacific (Tokyo) Regions in On-Demand, Reserved Instance, Savings Plan, and Spot form. Dedicated Instances and Dedicated Hosts are also available.

Support is available in the EC2 Forum or via your usual AWS Support contact. The EC2 team is interested in your feedback and you can contact them at [email protected].

Jeff;

 

 

Coming Soon – EC2 C6gn Instances – 100 Gbps Networking with AWS Graviton2 Processors

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/coming-soon-ec2-c6gn-instances-100-gbps-networking-with-aws-graviton2-processors/

Based on the amazing feedback from customers such as Snap, NextRoll, Intuit, SmugMug, and Honeycomb who are running their workloads on Amazon Elastic Compute Cloud (EC2) instances powered by AWS Graviton2, today we are announcing an addition to our broad Arm-based Graviton2 portfolio with C6gn instances that deliver up to 100 Gbps network bandwidth, up to 38 Gbps Amazon Elastic Block Store (EBS) bandwidth, up to 40% higher packet processing performance, and up to 40% better price/performance versus comparable current generation x86-based network optimized instances.

Compared to C6g instances, this new instance type provides 4x higher network bandwidth, 4x higher packet processing performance, and 2x higher EBS bandwidth. This means that customers with workloads that need high networking bandwidth such as high performance computing (HPC), network appliance, real-time video communications, and data analytics, will be able to bring their biggest and most challenging applications to Arm and take advantage of the performance and cost-optimization.

C6gn instances will be available in 8 sizes:

Name vCPUs Memory
(GiB)
Network Bandwidth
(Gbps)
EBS Throughput
(Gbps)
c6gn.medium 1 2 Up to 25 Up to 9.5
c6gn.large 2 4 Up to 25 Up to 9.5
c6gn.xlarge 4 8 Up to 25 Up to 9.5
c6gn.2xlarge 8 16 Up to 25 Up to 9.5
c6gn.4xlarge 16 32 25 9.5
c6gn.8xlarge 32 64 50 19
c6gn.12xlarge 48 96 75 28.5
c6gn.16xlarge 64 128 100 38

The new instances are built on the AWS Nitro System, a collection of AWS-designed hardware and software innovations that maximize resource efficiency. C6gn instances support Elastic Fabric Adapter (EFA) on the c6gn.16xlarge sizes for workloads that can take advantage of lower network latency (such as HPC and video processing) and use Message Passing Interface (MPI) for highly scalable clusters. These new instances also fully support network frameworks like Data Plane Development Kit (DPDK), making it easier to migrate network appliance workloads.

Coming Soon
EC2 C6gn instances will be available later this month and make it easier to optimize costs for HPC and workloads that require high network bandwidth and low latency. Let me know what you are going to build with them!

To get practice with the AWS Graviton2 architecture, you can try t4g.micro instances for free for up to 750 hours per month until March 31st, 2021.

Learn more about EC2 C6gn instances today.

Danilo

New – Amazon EC2 R5b Instances Provide 3x Higher EBS Performance

Post Syndicated from Harunobu Kameda original https://aws.amazon.com/blogs/aws/new-amazon-ec2-r5b-instances-providing-3x-higher-ebs-performance/

In July 2018, we announced memory-optimized R5 instances for the Amazon Elastic Compute Cloud (Amazon EC2). R5 instances are designed for memory-intensive applications such as high-performance databases, distributed web scale in-memory caches, in-memory databases, real time big data analytics, and other enterprise applications.

R5 instances offer two different block storage options. R5d instances offer up to 3.6TB of NMVe instance storage for applications that need access to high-speed, low latency local storage. In addition, all R5b instances work with Amazon Elastic Block Store. Amazon EBS is an easy-to-use, high-performance and highly available block storage service designed for use with Amazon EC2 for both throughput- and transaction-intensive workloads at any scale. A broad range of workloads, such as relational and non-relational databases, enterprise applications, containerized applications, big data analytics engines, file systems, and media workflows are widely deployed on Amazon EBS.

Today, we are happy to announce the availability of R5b, a new addition to the R5 instance family. The new R5b instance is powered by the AWS Nitro System to provide the best network-attached storage performance available on EC2. This new instance offers up to 60Gbps of EBS bandwidth and 260,000 I/O operations per second (IOPS).

Amazon EC2 R5b Instance
Many customers use R5 instances with EBS for large relational database workloads such as commerce platforms, ERP systems, and health record systems, and they rely on EBS to provide scalable, durable, and high availability block storage. These instances provide sufficient storage performance for many use cases, but some customers require higher EBS performance on EC2.

R5 instances provide bandwidth up to 19Gbps and maximum EBS performance of 80K IOPS, while the new R5b instances support bandwidth up to 60Gbps and EBS performance of 260K IOPS, providing 3x higher EBS-Optimized performance compared to R5 instances, enabling customers to lift and shift large relational databases applications to AWS. R5b and R5 vCPU to memory ratio and network performance are the same.

Instance Name vCPUs Memory EBS Optimized Bandwidth (Mbps) EBS Optimized [email protected] (IO/s)
r5b.large 2 16 GiB Up to 10,000 Up to 43,333
r5b.xlarge 4 32 GiB Up to 10,000 Up to 43,333
r5b.2xlarge 8 64 GiB Up to 10,000 Up to 43,333
r5b.4xlarge 16 128 GiB 10,000 43,333
r5b.8xlarge 32 256 GiB 20,000 86,667
r5b.12xlarge 48 384 GiB 30,000 130,000
r5b.16xlarge 64 512 GiB 40,000 173,333
r5b.24xlarge 96 768 GiB 60,000 260,000
r5b.metal 96 768 GiB 60,000 260,000

Customers operating storage performance sensitive workloads can migrate from R5 to R5b to consolidate their existing workloads into fewer or smaller instances. This can reduce the cost of both infrastructure and licensed commercial software working on those instances. R5b instances are supported by Amazon RDS for Oracle and Amazon RDS for SQL Server, simplifying the migration path for large commercial database applications and improving storage performance for current RDS customers by up to 3x.

All Nitro compatible AMIs support R5b instances, and the EBS-backed HVM AMI must have NVMe 1.0e and ENA drivers installed at R5b instance launch. R5b supports io1, io2 Block Express (in preview), gp2, gp3, sc1, st1 and standard volumes. R5b does not support io2 volumes and io1 volumes that have multi-attach enabled, which are coming soon.

Available Today

R5b instances are available in the following regions: US West (Oregon), Asia Pacific (Tokyo), US East (N. Virginia), US East (Ohio), Asia Pacific (Singapore), and Europe (Frankfurt). RDS on r5b is available in US East (Ohio), Asia Pacific (Singapore), and Europe (Frankfurt), and support in other regions is coming soon.

Learn more about EC2 R5 instances and get started with Amazon EC2 today.

– Kame;

EC2 Update – D3 / D3en Dense Storage Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-update-d3-d3en-dense-storage-instances/

We have launched several generations of EC2 instances with dense storage including the HS1 in 2012 and the D2 in 2015. As you can guess from the name, our customers use these instances when they need massive amounts of very economical on-instance storage for their data warehouses, data lakes, network file systems, Hadoop clusters, and the like. These workloads demand plenty of I/O and network throughput, but work fine with a high ratio of storage to compute power.

New D3 and D3en Instances
Today we are launching the D3 and D3en instances. Like their predecessors, they give you access to massive amounts of low-cost on-instance HDD storage. The D3 instances are available in four sizes, with up to 32 vCPUs and 48 TB of storage. Here are the specs:

Instance Name vCPUs RAM HDD Storage Aggregate Disk Throughput
(128 KiB Blocks)
Network Bandwidth EBS-Optimized Bandwidth
d3.xlarge 4 32 GiB 6 TB (3 x 2 TB)  580 MiBps Up to 15 Gbps 850 Mbps
d3.2xlarge 8 64 GiB 12 TB (6 x 2 TB) 1,100 MiBps Up to 15 Gbps 1,700 Mbps
d3.4xlarge 16 128 GiB 24 TB (12 x 2 TB) 2,300 MiBps Up to 15 Gbps 2,800 Mbps
d3.8xlarge 32 256 GiB 48 TB (24 x 2 TB) 4,600 MiBps 25 Gbps 5,000 Mbps

As you can see from the table above, the D3 instances are available in the same configurations as the D2 instances for easy migration. You’ll get 5% more memory per vCPU, a 30% boost in compute power, and 2.5x higher network performance if you migrate from D2 to D3. The instances provide low-cost dense storage that delivers high performance sequential access to large data sets. They are perfect for distributed file systems such as HDFS and MapR FS, big data analytical workloads, data warehouses, log processing, and data processing.

The D3en instances are available in six sizes, with up to 48 vCPUs and 336 TB of storage. Here are the specs:

Instance Name vCPUs RAM HDD Storage Aggregate Disk Throughput
(128 KiB Blocks)
Network Bandwidth EBS-Optimized Bandwidth
d3en.xlarge 4 16 GiB 28 TB (2 x 14 TB) 500 MiBps Up to 25 Gbps 850 Mbps
d3en.2xlarge 8 32 GiB 56 TB (4 x 14 TB) 1,000 MiBps Up to 25 Gbps 1,700 Mbps
d3en.4xlarge 16 64 GiB 112 TB (8 x 14 TB) 2,000 MiBps 25 Gbps 2,800 Mbps
d3en.6xlarge 24 96 GiB 168 TB (12 x 14 TB) 3,100 MiBps 40 Gbps 4,000 Mbps
d3en.8xlarge 32 128 GiB  224 TB (16 x 14 TB) 4,100 MiBps 50 Gbps 5,000 Mbps
d3en.12xlarge 48 192 GiB 336 TB (24 x 14 TB) 6,200 MiBps 75 Gbps 7,000 Mbps

The D3en instances have a high ratio of storage to vCPU, and are optimized for high throughput and high sequential I/O to very large data sets, with a cost-per-TB that is 80% lower than on D2 instances. D3en instances can host Lustre, BeeGFS, GPFS, and other distributed file systems, they can store your data lakes, and they can run your Amazon EMR, Spark, and Hadoop analytical workloads.

Both of the instance types are built on the AWS Nitro System and are powered by custom Intel® Second Generation Scalable Xeon® (Cascade Lake) processors that can deliver all-core turbo performance of up to 3.1 GHz. The HDD storage is encrypted at rest using AES-256-XTS; traffic between D3 or D3en instances in the same VPC or within peered VPCs is encrypted using a 256-bit key.

Things to Know
Here are a couple of things that you should keep in mind regarding the D3 and D3en instances:

Regions – D3en instances are available in the US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions; D3en instances are available in all of those regions and also in the US East (Ohio) Region, with more regions coming soon.

Purchase Options – You can purchase D3 and D3 instances in On-Demand, Savings Plan, Reserved Instance, Spot, and Dedicated Instance form.

AMIs – You must use AMIs that include the Elastic Network Adapter (ENA) and NVMe drivers.

Now Available
D3 and D3en instances are available now and you can start using them today!

Jeff;

New – Use Amazon EC2 Mac Instances to Build & Test macOS, iOS, ipadOS, tvOS, and watchOS Apps

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-use-mac-instances-to-build-test-macos-ios-ipados-tvos-and-watchos-apps/

Throughout the course of my career I have done my best to stay on top of new hardware and software. As a teenager I owned an Altair 8800 and an Apple II. In my first year of college someone gave me a phone number and said “call this with modem.” I did, it answered “PENTAGON TIP,” and I had access to ARPANET!

I followed the emerging PC industry with great interest, voraciously reading every new issue of Byte, InfoWorld, and several other long-gone publications. In early 1983, rumor had it that Apple Computer would soon introduce a new system that was affordable, compact, self-contained, and very easy to use. Steve Jobs unveiled the Macintosh in January 1984 and my employer ordered several right away, along with a pair of the Apple Lisa systems that were used as cross-development hosts. As a developer, I was attracted to the Mac’s rich collection of built-in APIs and services, and still treasure my phone book edition of the Inside Macintosh documentation!

New Mac Instance
Over the last couple of years, AWS users have told us that they want to be able to run macOS on Amazon Elastic Compute Cloud (EC2). We’ve asked a lot of questions to learn more about their needs, and today I am pleased to introduce you to the new Mac instance!


The original (128 KB) Mac

Powered by Mac mini hardware and the AWS Nitro System, you can use Amazon EC2 Mac instances to build, test, package, and sign Xcode applications for the Apple platform including macOS, iOS, iPadOS, tvOS, watchOS, and Safari. The instances feature an 8th generation, 6-core Intel Core i7 (Coffee Lake) processor running at 3.2 GHz, with Turbo Boost up to 4.6 GHz. There’s 32 GiB of memory and access to other AWS services including Amazon Elastic Block Store (EBS), Amazon Elastic File System (EFS), Amazon FSx for Windows File Server, Amazon Simple Storage Service (S3), AWS Systems Manager, and so forth.

On the networking side, the instances run in a Virtual Private Cloud (VPC) and include ENA networking with up to 10 Gbps of throughput. With EBS-Optimization, and the ability to deliver up to 55,000 IOPS (16KB block size) and 8 Gbps of throughput for data transfer, EBS volumes attached to the instances can deliver the performance needed to support I/O-intensive build operations.

Mac instances run macOS 10.14 (Mojave) and 10.15 (Catalina) and can be accessed via command line (SSH) or remote desktop (VNC). The AMIs (Amazon Machine Images) for EC2 Mac instances are EC2-optimized and include the AWS goodies that you would find on other AWS AMIs: An ENA driver, the AWS Command Line Interface (CLI), the CloudWatch Agent, CloudFormation Helper Scripts, support for AWS Systems Manager, and the ec2-user account. You can use these AMIs as-is, or you can install your own packages and create custom AMIs (the homebrew-aws repo contains the additional packages and documentation on how to do this).

You can use these instances to create build farms, render farms, and CI/CD farms that target all of the Apple environments that I mentioned earlier. You can provision new instances in minutes, giving you the ability to quickly & cost-effectively build code for multiple targets without having to own & operate your own hardware. You pay only for what you use, and you get to benefit from the elasticity, scalability, security, and reliability provided by EC2.

EC2 Mac Instances in Action
As always, I asked the EC2 team for access to an instance in order to put it through its paces. The instances are available in Dedicated Host form, so I started by allocating a host:

$ aws ec2 allocate-hosts --instance-type mac1.metal \
  --availability-zone us-east-1a --auto-placement on \
  --quantity 1 --region us-east-1

Then I launched my Mac instance from the command line (console, API, and CloudFormation can also be used):

$ aws ec2 run-instances --region us-east-1 \
  --instance-type mac1.metal \
  --image-id  ami-023f74f1accd0b25b \
  --key-name keys-jbarr-us-east  --associate-public-ip-address

I took Luna for a very quick walk, and returned to find that my instance was ready to go. I used the console to give it an appropriate name:

Then I connected to my instance:

From here I can install my development tools, clone my code onto the instance, and initiate my builds.

I can also start a VNC server on the instance and use a VNC client to connect to it:

Note that the VNC protocol is not considered secure, and this feature should be used with care. I used a security group that allowed access only from my desktop’s IP address:

I can also tunnel the VNC traffic over SSH; this is more secure and would not require me to open up port 5900.

Things to Know
Here are a couple of fast-facts about the Mac instances:

AMI Updates – We expect to make new AMIs available each time Apple releases major or minor versions of each supported OS. We also plan to produce AMIs with updated Amazon packages every quarter.

Dedicated Hosts – The instances are launched as EC2 Dedicated Hosts with a minimum tenancy of 24 hours. This is largely transparent to you, but it does mean that the instances cannot be used as part of an Auto Scaling Group.

Purchase Models – You can run Mac instances On-Demand and you can also purchase a Savings Plan.

Apple M1 Chip – EC2 Mac instances with the Apple M1 chip are already in the works, and planned for 2021.

Launch one Today
You can start using Mac instances in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Singapore) Regions today, and check out this video for more information!

Jeff;

 

Automatically update security groups for Amazon CloudFront IP ranges using AWS Lambda

Post Syndicated from Yeshwanth Kottu original https://aws.amazon.com/blogs/security/automatically-update-security-groups-for-amazon-cloudfront-ip-ranges-using-aws-lambda/

Amazon CloudFront is a content delivery network that can help you increase the performance of your web applications and significantly lower the latency of delivering content to your customers. For CloudFront to access an origin (the source of the content behind CloudFront), the origin has to be publicly available and reachable. Anyone with the origin domain name or IP address could request content directly and bypass CloudFront. In this blog post, I describe an automated solution that uses security groups to permit only CloudFront to access the origin.

Amazon Simple Storage Service (Amazon S3) origins provide a feature called Origin Access Identity, which blocks public access to selected buckets, making them accessible only through CloudFront. When you use CloudFront to secure your web applications, it’s important to ensure that only CloudFront can access your origin (such as Amazon Elastic Cloud Compute (Amazon EC2) or Application Load Balancer (ALB)) and any direct access to origin is restricted. This blog post shows you how to create an AWS Lambda function to automatically update Amazon Virtual Private Cloud (Amazon VPC) security groups with CloudFront service IP ranges to permit only CloudFront to access the origin.

AWS publishes the IP ranges in JSON format for CloudFront and other AWS services. If your origin is an Elastic Load Balancer or an Amazon EC2 instance, you can use VPC security groups to allow only CloudFront IP ranges to access your applications. The IP ranges in the list are separated by service and Region, and you must specify only the IP ranges that correspond to CloudFront.

The IP ranges that AWS publishes change frequently and without an automated solution, you would need to retrieve this document frequently to understand the current IP ranges for CloudFront. Frequent polling is inefficient because there is no notice of when the IP ranges change, and if these IP ranges aren’t modified immediately, your client might see 504 errors when they access CloudFront. Additionally, there are numerous IP ranges for each service, performing the change manually isn’t an efficient way of updating these ranges. This means you need infrastructure to support the task. However, in that case you end up with another host to manage—complete with the typical patching, deployment, and monitoring. As you can see, a small task could quickly become more complicated than the problem you intended to solve.

An Amazon Simple Notification Service (Amazon SNS) message is sent to a topic whenever the AWS IP ranges change. Enabling you to build an event-driven, serverless solution that updates the IP ranges for your security groups, as needed by using a Lambda function that is triggered in response to the SNS notification.

Here are the steps we are going to take to implement the solution:

  1. Create your resources
    1. Create an IAM policy and execution role for the Lambda function
    2. Create your Lambda function
  2. Test your Lambda function
  3. Configure your Lambda function’s trigger

Create your resources

The first thing you need to do is create a Lambda function execution role and policy. Lambda function uses execution role to access or create AWS resources. This Lambda function is triggered by an SNS notification whenever there’s a change in the IP ranges document. Based on the number of IP ranges present for CloudFront and also the number of ports (for example, 80,443) that you want to whitelist on the origin, this Lambda function creates the required security groups. These security groups will allow only traffic from CloudFront to your ELB load balancers or EC2 instances.

Create an IAM policy and execution role for the Lambda function

When you create a Lambda function, it’s important to understand and properly define the security context for the Lambda function. Using AWS Identity and Access Management (IAM), you can create the Lambda execution role that determines the AWS service calls that the function is authorized to complete. (Learn more about the Lambda permissions model.)

To create the IAM policy for your role

  1. Log in to the IAM console with the user account that you will use to manage the Lambda function. This account must have administrator permissions.
  2. In the navigation pane, choose Policies.
  3. In the content pane, choose Create policy.
  4. Choose the JSON tab and copy the text from the following JSON policy document. Paste this text into the JSON text box.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "CloudWatchPermissions",
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:*:*:*"
        },
        {
          "Sid": "EC2Permissions",
          "Effect": "Allow",
          "Action": [
            "ec2:DescribeSecurityGroups",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:RevokeSecurityGroupIngress",
            "ec2:CreateSecurityGroup",
            "ec2:DescribeVpcs",
    		"ec2:CreateTags",
            "ec2:ModifyNetworkInterfaceAttribute",
            "ec2:DescribeNetworkInterfaces"
            
          ],
          "Resource": "*"
        }
      ]
    }
    

  5. When you’re finished, choose Review policy.
  6. On the Review page, enter a name for the policy name (e.g. LambdaExecRolePolicy-UpdateSecurityGroupsForCloudFront). Review the policy Summary to see the permissions granted by your policy, and then choose Create policy to save your work.

To understand what this policy allows, let’s look closely at both statements in the policy. The first statement allows the Lambda function to create and write to CloudWatch Logs, which is vital for debugging and monitoring our function. The second statement allows the function to get information about existing security groups, get existing VPC information, create security groups, and authorize and revoke ingress permissions. It’s an important best practice that your IAM policies be as granular as possible, to support the principal of least privilege.

Now that you’ve created your policy, you can create the Lambda execution role that will use the policy.

To create the Lambda execution role

  1. In the navigation pane of the IAM console, choose Roles, and then choose Create role.
  2. For Select type of trusted entity, choose AWS service.
  3. Choose the service that you want to allow to assume this role. In this case, choose Lambda.
  4. Choose Next: Permissions.
  5. Search for the policy name that you created earlier and select the check box next to the policy.
  6. Choose Next: Tags.
  7. (Optional) Add metadata to the role by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM Users and Roles.
  8. Choose Next: Review.
  9. For Role name (e.g. LambdaExecRole-UpdateSecurityGroupsForCloudFront), enter a name for your role.
  10. (Optional) For Role description, enter a description for the new role.
  11. Review the role, and then choose Create role.

Create your Lambda function

Now, create your Lambda function and configure the role that you created earlier as the execution role for this function.

To create the Lambda function

  1. Go to the Lambda console in N. Virginia region and choose Create function. On the next page, choose Author from scratch. (I’ll be providing the code for your Lambda function, but for other functions, the Use a blueprint option can be a great way to get started.)
  2. Give your Lambda function a name (e.g UpdateSecurityGroupsForCloudFront) and description, and select Python 3.8 from the Runtime menu.
  3. Choose or create an execution role: Select the execution role you created earlier by selecting the option Use an Existing Role.
  4. After confirming that your settings are correct, choose Create function.
  5. Paste the Lambda function code from here.
  6. Select Save.

Additionally, in the Basic Settings of the Lambda function, increase the timeout to 10 seconds.

To set the timeout value in the Lambda console

  1. In the Lambda console, choose the function you just created.
  2. Under Basic settings, choose Edit.
  3. For Timeout, select 10s.
  4. Choose Save.

By default, the Lambda function has these settings:

  • The Lambda function is configured to create security groups in the default VPC.
  • CloudFront IP ranges are updated as inbound rules on port 80.
  • The created security groups are tagged with the name prefix AUTOUPDATE.
  • Debug logging is turned off.
  • The service for which IP ranges are extracted is set to CloudFront.
  • The SDK client in the Lambda function set to us-east-1(N. Virginia).

If you want to customize these settings, set the following environment variables for the Lambda function. For more details, see Using AWS Lambda environment variables.

Action Key-value data
To create security groups in a specific VPC Key: VPC_ID
Value: vpc-id
To create security groups rules for a different port or multiple ports
 
The solution in this example supports a total of two ports. One can be used for HTTP and another for HTTPS.

Key: PORTS
Value: portnumber
or
Key: PORTS
Value: portnumber,portnumber
To customize the prefix name tag of your security groups Key: PREFIX_NAME
Value: custom-name
To enable debug logging to CloudWatch Key: DEBUG
Value: true
To extract IP ranges for a different service other than CloudFront Key: SERVICE
Value: servicename
To configure the Region for the SDK client used in the Lambda function
 
If the CloudFront origin is present in a different Region than N. Virginia, the security groups must be created in that region.
Key: REGION
Value: regionname

To set environment variables in the Lambda console

  1. In the Lambda console, choose the function you created.
  2. Under Environment variables, choose Edit.
  3. Choose Add environment variable.
  4. Enter a key and value.
  5. Choose Save.

Test your Lambda function

Now that you’ve created your function, it’s time to test it and initialize your security group.

To create your test event for the Lambda function

  1. In the Lambda console, on the Functions page, choose your function. In the drop-down menu next to Actions, choose Configure test events.
  2. Enter an Event Name (e.g. TriggerSNS)
  3. Replace the following as your sample event, which will represent an SNS notification and then select Create.
    {
        "Records": [
            {
                "EventVersion": "1.0",
                "EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
                "EventSource": "aws:sns",
                "Sns": {
                    "SignatureVersion": "1",
                    "Timestamp": "1970-01-01T00:00:00.000Z",
                    "Signature": "EXAMPLE",
                    "SigningCertUrl": "EXAMPLE",
                    "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e",
                    "Message": "{\"create-time\": \"yyyy-mm-ddThh:mm:ss+00:00\", \"synctoken\": \"0123456789\", \"md5\": \"7fd59f5c7f5cf643036cbd4443ad3e4b\", \"url\": \"https://ip-ranges.amazonaws.com/ip-ranges.json\"}",
                    "Type": "Notification",
                    "UnsubscribeUrl": "EXAMPLE",
                    "TopicArn": "arn:aws:sns:EXAMPLE",
                    "Subject": "TestInvoke"
                }
      		}
        ]
    }
    

  4. After you’ve added the test event, select Save and then select Test. Your Lambda function is then invoked, and you should see log output at the bottom of the console in Execution Result section, similar to the following.
    Updating from https://ip-ranges.amazonaws.com/ip-ranges.json
    MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b: Exception
    Traceback (most recent call last):
      File "/var/task/lambda_function.py", line 29, in lambda_handler
        ip_ranges = json.loads(get_ip_groups_json(message['url'], message['md5']))
      File "/var/task/lambda_function.py", line 50, in get_ip_groups_json
        raise Exception('MD5 Missmatch: got ' + hash + ' expected ' + expected_hash)
    Exception: MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b
    

  5. Edit the sample event again, and this time change the md5 value in the sample event to be the first MD5 hash provided in the log output. In this example, you would update the md5 value in the sample event configured earlier with the hash value seen in the error ‘2e967e943cf98ae998efeec05d4f351c’. Lambda code successfully executes only when the original hash of the IP ranges document and the hash received from the event trigger match. After you modify the hash value from the error message received earlier, the test event matches the hash of the IP ranges document.
  6. Select Save and test. This invokes your Lambda function.

After the function is invoked the second time with updated md5 has Lambda function should execute without any errors. You should be able to see the new security groups created and the IP ranges of CloudFront updated in the rules in the EC2 console, as shown in Figure 1.
 

Figure 1: EC2 console showing the security groups created

Figure 1: EC2 console showing the security groups created

In the initial successful run of this function, it created the total number of security groups required to update all the IP ranges of CloudFront for the ports mentioned. The function creates security groups based on the maximum number of rules that can be added to individual security groups. The new security groups can be identified from the EC2 console by the name AUTOUPDATE_random if you used the default configuration, or a custom name if you provided a PREFIX_NAME.

You can now attach these security groups to your Elastic LoadBalancer or EC2 instances. If your log output is different from what is described here, the output should help you identify the issue.

Configure your Lambda function’s trigger

After you’ve validated that your function is executing properly, it’s time to connect it to the SNS topic for IP changes. To do this, use the AWS Command Line Interface (CLI). Enter the following command, making sure to replace Lambda ARN with the Amazon Resource Name (ARN) of your Lambda function. You can find this ARN at the top right when viewing the configuration of your Lambda function.

aws sns subscribe - -topic-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged" - -region us-east-1 - -protocol lambda - -notification-endpoint "Lambda ARN"

You should receive the ARN of your Lambda function’s SNS subscription.

Now add a permission that allows the Lambda function to be invoked by the SNS topic. The following command also adds the Lambda trigger.

aws lambda add-permission - -function-name "Lambda ARN" - -statement-id lambda-sns-trigger - -region us-east-1 - -action lambda:InvokeFunction - -principal sns.amazonaws.com - -source-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged"

When AWS changes any of the IP ranges in the document, an SNS notification is sent and your Lambda function will be triggered. This Lambda function verifies the modified ranges in the document and efficiently updates the IP ranges on the existing security groups. Additionally, the function dynamically scales and creates additional security groups if the number of IP ranges for CloudFront is increased in future. Any newly created security groups are automatically attached to the network interface where the previous security groups are attached in order to avoid service interruption.

Summary

As you followed this blog post, you created a Lambda function to create a security groups and update the security group’s rules dynamically whenever AWS publishes new internal service IP ranges. This solution has several advantages:

  • The solution isn’t designed as a periodic poll, so it only runs when it needs to.
  • It’s automatic, so you don’t need to update security groups manually which lowers the operational cost.
  • It’s simple, because you have no extra infrastructure to maintain as the solution is completely serverless.
  • It’s cost effective, because the Lambda function runs only when triggered by the AmazonIpSpaceChanged SNS topic and only runs for a few seconds, this solution costs only pennies to operate.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon CloudFront forum. If you have any other use cases for using Lambda functions to dynamically update security groups, or even other networking configurations such as VPC route tables or ACLs, we’d love to hear about them!

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Yeshwanth Kottu

Yeshwanth is a Systems Development Engineer at AWS in Cupertino, CA. With a focus on CloudFront and [email protected], he enjoys helping customers tackle challenges through cloud-scale architectures. Yeshwanth has an MS in Computer Systems Networking and Telecommunications from Northeastern University. Outside of work, he enjoys travelling, visiting national parks, and playing cricket.

Tracking the latest server images in Amazon EC2 Image Builder pipelines

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/tracking-the-latest-server-images-in-amazon-ec2-image-builder-pipelines/

This post courtesy of Anoop Rachamadugu, Cloud Architect at AWS

The Amazon EC2 Image Builder service helps users to build and maintain server images. The images created by EC2 Image Builder can be used with Amazon Elastic Compute Cloud (EC2) and on-premises. Image Builder reduces the effort of keeping images up-to-date and secure by providing a graphical interface, built-in automation, and AWS-provided security settings. Customers have told us that they manage multiple server images and are looking for ways to track the latest server images created by the pipelines.

In this blog post, I walk through a solution that uses AWS Lambda and AWS Systems Manager (SSM) Parameter Store. It tracks and updates the latest Amazon Machine Image (AMI) IDs every time an Image Builder pipeline is run. With Lambda, you pay only for what you use. You are charged based on the number of requests for your functions and the time it takes for your code to run. In this case, the Lambda function is invoked upon the completion of the image builder pipeline. Standard SSM parameters are available at no additional charge.

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. Consider the use case of updating Amazon Machine Image (AMI) IDs for the EC2 instances in your CloudFormation templates. Normally, you might map AMI IDs to specific instance types and Regions. Then to update these, you would manually change them in each of your templates. With the SSM parameter integration, your code remains untouched and a CloudFormation stack update operation automatically fetches the latest Parameter Store value.

Overview

This solution uses a Lambda function written in Python that subscribes to an Amazon Simple Notification Service (SNS) topic. The Lambda function and the SNS topic are deployed using AWS SAM CLI. Once deployed, the SNS topic must be configured in an existing Image Builder pipeline. This results in the Lambda function being invoked at the completion of the Image Builder pipeline.

When a Lambda function subscribes to an SNS topic, it is invoked with the payload of the published messages. The Lambda function receives the message payload as an input parameter. The Lambda function first checks the message payload to see if the image status is available. If the image state is available, it retrieves the AMI ID from the message payload and updates the SSM parameter.

EC2 Image builder architecture diagram

EC2 Image builder architecture diagram

Prerequisites

To get started with this solution, the following is required:

Deploying the solution

The solution consists of two files, which can be downloaded from the amazon-ec2-image-builder GitHub repository.

  1. The Python file image-builder-lambda-update-ssm.py contains the code for the Lambda function. It first checks the SNS message payload to determine if the image is available. If it’s available, it extracts the AMI ID from the SNS message payload and updates the SSM parameter specified.The ‘ssm_parameter_name’ variable specifies the SSM parameter path where the AMI ID should be stored and updated. The Lambda function finishes by adding tags to the SSM parameter.
  2. The template.yaml file is an AWS SAM template. It deploys the Lambda function, SNS topic, and IAM role required for the Lambda function. I use Python 3.7 as the runtime and assign a memory of 256 MB for the Lambda function. The IAM policy gives the Lambda function permissions to retrieve and update SSM parameters. Deploy this application using the AWS SAM CLI guided deploy:
    sam deploy --guided

After deploying the application, note the ARN of the created SNS topic. Next, update the infrastructure settings of an existing Image Builder pipeline with this newly created SNS topic. This results in the Lambda function being invoked upon the completion of the image builder pipeline.

Configuration details

Configuration details

Verifying the solution

After the completion of the image builder pipeline, use the AWS CLI or check the AWS Management Console to verify the updated SSM parameter. To verify via AWS CLI, run the following commands to retrieve and list the tags attached to the SSM parameter:

aws ssm get-parameter --name ‘/ec2-imagebuilder/latest’
aws ssm list-tags-for-resource --resource-type "Parameter" --resource-id ‘/ec2-imagebuilder/latest’

To verify via the AWS Management Console, navigate to the Parameter Store under AWS Systems Manager. Search for the parameter /ec2-imagebuilder/latest:

AWS Systems Manager: Parameter Store

AWS Systems Manager: Parameter Store

Select the Tags tab to view the tags attached to the SSM parameter:

Image builder tags list

Image builder tags list

Referencing the SSM Parameter in CloudFormation templates

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. This sample code shows how to reference the SSM parameter in a CloudFormation template.

Parameters :
  LatestAmiId :
    Type : 'AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>'
    Default: ‘/ec2-imagebuilder/latest’

Resources :
  Instance :
    Type : 'AWS::EC2::Instance'
    Properties :
      ImageId : !Ref LatestAmiId

Conclusion

In this blog post, I demonstrate a solution that allows users to track and update the latest AMI ID created by the Image Builder pipelines. The Lambda function retrieves the AMI ID of the image created by a pipeline and update an AWS Systems Manager parameter. This Lambda function is triggered via an SNS topic configured in an Image Builder pipeline.

The solution is deployed using AWS SAM CLI. I also note how users can reference Systems Manager parameters in AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure.

The amazon-ec2-image-builder-samples GitHub repository provides a number of examples for getting started with EC2 Image Builder. Image Builder can make it easier for you to build virtual machine (VM) images.

Snowflake: Running Millions of Simulation Tests with Amazon EKS

Post Syndicated from Keith Joelner original https://aws.amazon.com/blogs/architecture/snowflake-running-millions-of-simulation-tests-with-amazon-eks/

This post was co-written with Brian Nutt, Senior Software Engineer and Kao Makino, Principal Performance Engineer, both at Snowflake.

Transactional databases are a key component of any production system. Maintaining data integrity while rows are read and written at a massive scale is a major technical challenge for these types of databases. To ensure their stability, it’s necessary to test many different scenarios and configurations. Simulating as many of these as possible allows engineers to quickly catch defects and build resilience. But the Holy Grail is to accomplish this at scale and within a timeframe that allows your developers to iterate quickly.

Snowflake has been using and advancing FoundationDB (FDB), an open-source, ACID-compliant, distributed key-value store since 2014. FDB, running on Amazon Elastic Cloud Compute (EC2) and Amazon Elastic Block Storage (EBS), has proven to be extremely reliable and is a key part of Snowflake’s cloud services layer architecture. To support its development process of creating high quality and stable software, Snowflake developed Project Joshua, an internal system that leverages Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Registry (ECR), Amazon EC2 Spot Instances, and AWS PrivateLink to run over one hundred thousand of validation and regression tests an hour.

About Snowflake

Snowflake is a single, integrated data platform delivered as a service. Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency that today’s organizations require. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.

Snowflake architecture

Developing a simulation-based testing and validation framework

Snowflake’s cloud services layer is composed of a collection of services that manage virtual warehouses, query optimization, and transactions. This layer relies on rich metadata stored in FDB.

Prior to the creation of the simulation framework, Project Joshua, FDB developers ran tests on their laptops and were limited by the number they could run. Additionally, there was a scheduled nightly job for running further tests.

Joshua at Snowflake

Amazon EKS as the foundation

Snowflake’s platform team decided to use Kubernetes to build Project Joshua. Their focus was on helping engineers run their workloads instead of spending cycles on the management of the control plane. They turned to Amazon EKS to achieve their scalability needs. This was a crucial success criterion for Project Joshua since at any point in time there could be hundreds of nodes running in the cluster. Snowflake utilizes the Kubernetes Cluster Autoscaler to dynamically scale worker nodes in minutes to support a tests-based queue of Joshua’s requests.

With the integration of Amazon EKS and Amazon Virtual Private Cloud (Amazon VPC), Snowflake is able to control access to the required resources. For example: the database that serves Joshua’s test queues is external to the EKS cluster. By using the Amazon VPC CNI plugin, each pod receives an IP address in the VPC and Snowflake can control access to the test queue via security groups.

To achieve its desired performance, Snowflake created its own custom pod scaler, which responds quicker to changes than using a custom metric for pod scheduling.

  • The agent scaler is responsible for monitoring a test queue in the coordination database (which, coincidentally, is also FDB) to schedule Joshua agents. The agent scaler communicates directly with Amazon EKS using the Kubernetes API to schedule tests in parallel.
  • Joshua agents (one agent per pod) are responsible for pulling tests from the test queue, executing, and reporting results. Tests are run one at a time within the EKS Cluster until the test queue is drained.

Achieving scale and cost savings with Amazon EC2 Spot

A Spot Fleet is a collection—or fleet—of Amazon EC2 Spot instances that Joshua uses to make the infrastructure more reliable and cost effective. ​ Spot Fleet is used to reduce the cost of worker nodes by running a variety of instance types.

With Spot Fleet, Snowflake requests a combination of different instance types to help ensure that demand gets fulfilled. These options make Fleet more tolerant of surges in demand for instance types. If a surge occurs it will not significantly affect tasks since Joshua is agnostic to the type of instance and can fall back to a different instance type and still be available.

For reservations, Snowflake uses the capacity-optimized allocation strategy to automatically launch Spot Instances into the most available pools by looking at real-time capacity data and predicting which are the most available. This helps Snowflake quickly switch instances reserved to what is most available in the Spot market, instead of spending time contending for the cheapest instances, at the cost of a potentially higher price.

Overcoming hurdles

Snowflake’s usage of a public container registry posed a scalability challenge. When starting hundreds of worker nodes, each node needs to pull images from the public registry. This can lead to a potential rate limiting issue when all outbound traffic goes through a NAT gateway.

For example, consider 1,000 nodes pulling a 10 GB image. Each pull request requires each node to download the image across the public internet. Some issues that need to be addressed are latency, reliability, and increased costs due to the additional time to download an image for each test. Also, container registries can become unavailable or may rate-limit download requests. Lastly, images are pulled through public internet and other services in the cluster can experience pulling issues.

​For anything more than a minimal workload, a local container registry is needed. If an image is first pulled from the public registry and then pushed to a local registry (cache), it only needs to pull once from the public registry, then all worker nodes benefit from a local pull. That’s why Snowflake decided to replicate images to ECR, a fully managed docker container registry, providing a reliable local registry to store images. Additional benefits for the local registry are that it’s not exclusive to Joshua; all platform components required for Snowflake clusters can be cached in the local ECR Registry. For additional security and performance Snowflake uses AWS PrivateLink to keep all network traffic from ECR to the workers nodes within the AWS network. It also resolved rate-limiting issues from pulling images from a public registry with unauthenticated requests, unblocking other cluster nodes from pulling critical images for operation.

Conclusion

Project Joshua allows Snowflake to enable developers to test more scenarios without having to worry about the management of the infrastructure. ​ Snowflake’s engineers can schedule thousands of test simulations and configurations to catch bugs faster. FDB is a key component of ​the Snowflake stack and Project Joshua helps make FDB more stable and resilient. Additionally, Amazon EC2 Spot has provided non-trivial cost savings to Snowflake vs. running on-demand or buying reserved instances.

If you want to know more about how Snowflake built its high performance data warehouse as a Service on AWS, watch the This is My Architecture video below.

Majority of Alexa Now Running on Faster, More Cost-Effective Amazon EC2 Inf1 Instances

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/majority-of-alexa-now-running-on-faster-more-cost-effective-amazon-ec2-inf1-instances/

Today, we are announcing that the Amazon Alexa team has migrated the vast majority of their GPU-based machine learning inference workloads to Amazon Elastic Compute Cloud (EC2) Inf1 instances, powered by AWS Inferentia. This resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads. The lower latency allows Alexa engineers to innovate with more complex algorithms and to improve the overall Alexa experience for our customers.

AWS built AWS Inferentia chips from the ground up to provide the lowest-cost machine learning (ML) inference in the cloud. They power the Inf1 instances that we launched at AWS re:Invent 2019. Inf1 instances provide up to 30% higher throughput and up to 45% lower cost per inference compared to GPU-based G4 instances, which were, before Inf1, the lowest-cost instances in the cloud for ML inference.

Alexa is Amazon’s cloud-based voice service that powers Amazon Echo devices and more than 140,000 models of smart speakers, lights, plugs, smart TVs, and cameras. Today customers have connected more than 100 million devices to Alexa. And every month, tens of millions of customers interact with Alexa to control their home devices (“Alexa, increase temperature in living room,” “Alexa, turn off bedroom’“), to listen to radios and music (“Alexa, start Maxi 80 on bathroom,” “Alexa, play Van Halen from Spotify“), to be informed (“Alexa, what is the news?” “Alexa, is it going to rain today?“), or to be educated, or entertained with 100,000+ Alexa Skills.

If you ask Alexa where she lives, she’ll tell you she is right here, but her head is in the cloud. Indeed, Alexa’s brain is deployed on AWS, where she benefits from the same agility, large-scale infrastructure, and global network we built for our customers.

How Alexa Works
When I’m in my living room and ask Alexa about the weather, I trigger a complex system. First, the on-device chip detects the wake word (Alexa). Once detected, the microphones record what I’m saying and stream the sound for analysis in the cloud. At a high level, there are two phases to analyze the sound of my voice. First, Alexa converts the sound to text. This is known as Automatic Speech Recognition (ASR). Once the text is known, the second phase is to understand what I mean. This is Natural Language Understanding (NLU). The output of NLU is an Intent (what does the customer want) and associated parameters. In this example (“Alexa, what’s the weather today ?”), the intent might be “GetWeatherForecast” and the parameter can be my postcode, inferred from my profile.

This whole process uses Artificial Intelligence heavily to transform the sound of my voice to phonemes, phonemes to words, words to phrases, phrases to intents. Based on the NLU output, Alexa routes the intent to a service to fulfill it. The service might be internal to Alexa or external, like one of the skills activated on my Alexa account. The fulfillment service processes the intent and returns a response as a JSON document. The document contains the text of the response Alexa must say.

The last step of the process is to generate the voice of Alexa from the text. This is known as Text-To-Speech (TTS). As soon as the TTS starts to produce sound data, it is streamed back to my Amazon Echo device: “The weather today will be partly cloudy with highs of 16 degrees and lows of 8 degrees.” (I live in Europe, these are Celsius degrees 🙂 ). This Text-To-Speech process also heavily involves machine learning models to build a phrase that sounds natural in terms of pronunciations, rhythm, connection between words, intonation etc.

Alexa is one of the most popular hyperscale machine learning services in the world, with billions of inference requests every week. Of Alexa’s three main inference workloads (ASR, NLU, and TTS), TTS workloads initially ran on GPU-based instances. But the Alexa team decided to move to the Inf1 instances as fast as possible to improve the customer experience and reduce the service compute cost.

What is AWS Inferentia?
AWS Inferentia is a custom chip, built by AWS, to accelerate machine learning inference workloads and optimize their cost. Each AWS Inferentia chip contains four NeuronCores. Each NeuronCore implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, dramatically reducing latency and increasing throughput.

AWS Inferentia can be used natively from popular machine-learning frameworks like TensorFlow, PyTorch, and MXNet, with AWS Neuron. AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable you to run high-performance and low latency inference.

Who Else is Using Amazon EC2 Inf1?
In addition to Alexa, Amazon Rekognition is also adopting AWS Inferentia. Running models such as object classification on Inf1 instances resulted in 8x lower latency and doubled throughput compared to running these models on GPU instances.

Customers, from Fortune 500 companies to startups, are using Inf1 instances for machine learning inference. For example, Snap Inc.​ incorporates machine learning (ML) into many aspects of Snapchat, and exploring innovation in this field is a key priority for them. Once they heard about AWS Inferentia, they collaborated with AWS to adopt Inf1 instances to help with ML deployment, including around performance and cost. They started with their recommendation models inference, and are now looking forward to deploying more models on Inf1 instances in the future.

Conde Nast, one of the world’s leading media companies, saw a 72% reduction in cost of inference compared to GPU-based instances for its recommendation engine. And Anthem, one of the leading healthcare companies in the US, observed 2x higher throughput compared to GPU-based instances for its customer sentiment machine learning workload.

How to Get Started with Amazon EC2 Inf1
You can start using Inf1 instances today.

If you prefer to manage your own machine learning application development platforms, you can get started by either launching Inf1 instances with AWS Deep Learning AMIs, which include the Neuron SDK, or you can use Inf1 instances via Amazon Elastic Kubernetes Service or Amazon ECS for containerized machine learning applications. To learn more about running containers on Inf1 instances, read this blog to get started on ECS and this blog to get started on EKS.

The easiest and quickest way to get started with Inf1 instances is via Amazon SageMaker, a fully managed service that enables developers to build, train, and deploy machine learning models quickly.

Get started with Inf1 on Amazon SageMaker today.

— seb

PS: The team just released this video, check it out!

Integrating AWS Outposts with existing security zones

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/integrating-aws-outposts-with-existing-security-zones/

This post is contributed by Santiago Freitas and Matt Lehwess.

AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to your on-premises facility. This blog post explains how the resources created on an Outpost can be integrated with security zones of an existing infrastructure. Integrating Outposts with existing security zones is important for customers that segment their environment into multiple domains based on their security policy.

Background

It’s common for customers to have different security zones within their infrastructure. For example, a customer might have a DMZ security zone where internet facing systems are located, an extra-net security zone where systems used to communicate with business partners are hosted, and an internal security zone where the systems accessible only by employees operate.

As illustrated in the following figure, customers often use firewalls and network ACLs to perform packet inspection and filtering for traffic that flows between security zones. Customers also usually use similar security controls for traffic flowing between different application tiers.

 

 

Example of firewall being used for security zone segmentation

Enabling connectivity from an Outposts VPC to your on-premises network

During your Outpost deployment, AWS works with you to establish connectivity to your local network. During this installation process, AWS creates an address pool based on an IP range you assign to the Outpost out of your local network’s IP ranges. This address pool is known as a customer-owned IP address pool or CoIP pool.

When resources running on Outposts (like an EC2 instance) need to communicate with existing on-premises systems, you must assign an Elastic IP address to them. This Elastic IP address comes from the CoIP pool. The resources in your local area network (LAN), including firewalls and network ACLs, see the Elastic IP address as the source IP of the packets sent from the resources running on Outposts. This Elastic IP address is also used as the destination IP for traffic from your on-premises systems to the instance on the Outpost.

How to enable connectivity from an Outposts VPC to your on-premises network

With a typical Outpost deployment, as shown in the following diagram, the following items are required to enable connectivity from the Outpost’s VPC to your on-premises network:

  1. VPC routing: Routes to your on-premises network in the route table of the VPC’s Outpost subnet.
  2. CoIP pool: Assigned by you, the customer, and created by AWS at time of install, for example, 192.168.1.0/26 in the diagram. It’s used to allocate Elastic IP addresses that can then be associated with instances or other resources on the Outpost.
  3. Elastic IP address association: Customer configured, association of Elastic IP addresses from the CoIP pool with EC2 instances and other resources on the Outpost. For example, VPC IP 10.0.3.112 from instance Y has Elastic IP address 192.168.1.15 associated with it.
  4. Routing advertisement: The Outpost advertises the assigned CoIP range to the local devices within your on-premises network via Border Gateway Protocol (BGP). This is configured by AWS during installation and based on the CoIP pool IP range you provide.

 

AWS Outposts – typical network topology

Integrating Outposts with existing security zones

CoIP pools, and AWS Resource Access Manager (RAM) are used to facilitate the integration of Outposts with your existing security zones. AWS RAM lets you share your resources with AWS account or through AWS Organizations.

When AWS deploys your Outposts, you can ask AWS to create multiple CoIP pools. Each CoIP pool is configured with an IP range assigned by you during initial installation. If after the initial installation you need AWS to create additional CoIP pools for an existing Outpost, you can open a case with AWS Support and request the creation of additional pools.

CoIP pools are owned initially by the AWS account that owns the Outpost. In a multi-account Outpost deployment, you can share your customer-owned IP pool(s) with other AWS accounts in your AWS Organization using AWS RAM.

VPCs that have subnets on the Outpost must be created in the same account that owns the Outpost. After creation of these subnets within the Outpost account, AWS RAM can then be used to share these subnets with other accounts or organizational units that are in the same organization from AWS Organizations.

After you’ve shared the CoIP pool with the account where you’d like to run your workloads, and you’ve shared an Outpost subnet with that same account, users of that account can deploy resources in the shared Outpost subnet. For the Outposts resources that must communicate with resources in your existing infrastructure, users can assign Elastic IP addresses from one or more of the shared CoIP pools to those Outposts resources.

Multiple CoIP pools and the ability to share them and Outpost subnets with a particular account, gives you granular control over the IP range used by Outpost resources requiring connectivity to your on-premises LAN.

The following figure illustrates the sharing of a subnet and CoIP pool from Account 1, which is the account where the Outpost was initially deployed, with Account 2. As the CoIP pool was shared with Account 2, the users in Account 2 are able to assign an Elastic IP address to the EC2 instance running within Account 2.

AWS Outposts and VPC Sharing

Creating an AWS RAM resource share

The following screenshots demonstrate how to use AWS RAM to share a CoIP pool and a subnet with another account in the same organization from AWS Organizations.

Step 1. Navigate to the AWS RAM service page within the AWS Management Console, and click Create Resource Share. This step is done within the AWS account that owns the Outpost.

Step 2. Next, give the share an appropriate name.

Step 3. Select one or more subnets you’d like to share with your application owner’s account. In this case, select a subnet that exists in your VPC on the Outpost.

Step 4. In this case, share the CoIP range. You can find that in the resources type list.

Step 5. Select the Customer-owned Ipv4Pool resource type, and then select the CoIP to be shared. Selecting the the CoIP adds it to the list of shared resources along with the subnet selected earlier.

Step 6. Add the account number for the account you’d like to share the Outposts subnet and CoIP pool with.

Step 7. Click Create Resource. After clicking the Create Resource share button, you should see the resource share listed and be in the state “active.”

Now that you’ve successfully shared the subnet and CoIP pool with your application account (that’s in the same organization), you can now go in to that account and allocate an Elastic IP address from the CoIP pool. You can then assign the Elastic IP address to an instance launched in the subnet previously shared, which is on the Outpost.

Allocating and associating the Elastic IP address from your CoIP pool

Step 1. From within the application account, allocate an Elastic IP address from the CoIP pool. This is done by navigating to the VPC console, then to Elastic IP addresses, and then click Allocate Elastic IP address.

Step 2. Select Customer owned pool of IPv4 addresses, select the CoIP pool that’s been shared with this account previously, then click Allocate. You can see this step in the following image.

Step 3. You can now see the new Elastic IP address that has been allocated from the shared CoIP pool. You can now associate that Elastic IP address with an instance in our shared subnet.

Note: When using the AWS Management Console to allocate an Elastic IP address, the ElasticIP address is automatically allocated from the CoIPpool. If you’re using the AWS CLI, you have the option to select a specific Elastic IP address from the CoIP pool by using the --customer-owned-ipv4-pool <value> option.

Step 4. You can now associate the Elastic IP address of type CoIP with an instance. Click Associate after selecting the instance that is running on your Outpost in the shared subnet. This step is represented in the following image.

Step 5 – Once the operation is complete, you can see the CoIP Elastic IP address in the Elastic IP address console, and that it’s assigned to the instance that’s running in your shared subnet.

The steps preceding demonstrated how to share the Outpost with other accounts through the use of AWS Resource Access Manager, subnet sharing, and CoIP sharing.

Use case

Let’s apply the capabilities described prior into a design. A customer is running a 3-tier web application and would like to host the web and application tiers on Outposts. The database tier is hosted on the existing infrastructure. The application’s user traffic comes from the internet, through an external firewall towards the web tier hosted on the Outpost. Traffic between web and application tiers is secured with security groups and remains within the Outpost. Application servers connect to the database servers via the existing internal firewalls. The customer uses independent external and internal firewalls.

During the Outposts installation, the customer asks AWS to create two CoIP pools. The first CoIP pool, referenced in the following image as CoIP Web, is assigned the IP range 172.0.1.0/24. The second CoIP pool, referenced in the following image as CoIP App, is assigned the IP range 172.0.2.0/24. The customer then creates two additional AWS accounts, one for the web tier and another for the app tier. Within the account that owns the Outpost, the customer creates two VPCs and within each of those VPCs a subnet is created.

The subnet with IP address 10.0.1.0/24 is shared with the web tier account and the subnet with IP address 10.1.2.0/24 is shared with the app tier account. The customer then configures VPC peering between the VPCs to enable communication between the web and app tiers. VPC peering configuration is performed within the account that owns the Outposts.

Note: With VPC peering traffic between the VPCs remains within the Outpost. However, data transfer charges for VPC peering applies. This use-case could also be built with the web tier and app tier using different subnets within the same VPC to save on data transfer costs over VPC peering.

The following image shows initial setup with two CoIP pools, two accounts, and two VPCs.

 

Initial setup with two CoIP pools, two accounts and two VPCs

After the initial setup the customer then shares each CoIP pool with the appropriate account using AWS RAM. The customer can then create their web and application servers within the respective accounts. As shown in the prior image, the web server is assigned IP address 10.0.1.5 and the application server is assigned IP address 10.1.2.10. Each IP address is part of their respective VPC IP range and are reachable only within the Outpost at this point.

To enable the integration of the web and application servers with the existing infrastructure, the customer assigns an Elastic IP address from the CoIP pool shared with each account to their respective Amazon EC2 instances.

With this configuration, the customer can integrate the web and application servers into the existing security zones. To integrate the web servers, path “A” in the following image, the customer creates a rule in their external firewall that allows communication from any source (internet) to Elastic IP address 172.0.1.5 on port 443 (HTTPS) which has been assigned to the web server.

For the communication between the web and application servers, path “B” in the following image, the customer has configured VPC peering between the two VPCs and created the required security groups.

To integrate the application servers into the Internal Security Zone, path “C” in the following image, the customer has assigned Elastic IP address 172.0.2.10 to the application server and has configured a rule on the Internal Firewall allowing the IP range 172.0.2.0/24 that is assigned to the app CoIP pool to communicate with the database server.

 

End to end traffic flow from users to web tier and from app tier to database tier

In addition to setting up their Outposts as covered in the preceding details, to enable the communication between the Outposts hosted resources and the existing infrastructure you must create a custom route table and associate it with an Outpost subnet.

Conclusion

AWS Outposts extends AWS infrastructure, services, APIs, and tools to your on-premises facility. During deployment of your Outposts, AWS works with you to establish connectivity to your local network. This blog post builds upon this initial deployment of an Outpost to allow the Outpost’s resources to integrate into your existing security zones. The design is applicable for customers that segment their environment into multiple security zones.

 

Santiago Freitas is AWS Head of Technology for Central Eastern Europe (CEE), Middle East, North Africa (MENA), Sub Saharan Africa (SSA), Turkey (TUR), and Russia and Commonwealth of Independent States (RUS-CIS). Previously he was an AWS Global Solutions Architect for financial services. Before joining AWS, Santiago was a Distinguished Engineer at Cisco. He is based in Dubai, United Arab Emirates.

Matt is a Principal Developer Advocate for Amazon Web Services where he has spent the last 7 years working with AWS customers and partners, helping them to build best-in-class AWS networking solutions. He is co-author of the AWS Certified Advanced Networking Official Study Guide, as well as an Amazon Re:Invent speaker extraordinaire (5 years and counting!). His primary focus at AWS is to help evangelize AWS networking solutions and more recently, AWS Outposts.