Tag Archives: Amazon EC2

Introducing queued purchases for Savings Plans

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/introducing-queued-purchases-for-savings-plans/

This blog post is contributed by Idan Maizlits, Sr. Product Manager, Savings Plans

AWS now provides the ability for you to queue purchases of Savings Plans by specifying a time, up to 3 years in the future, to carry out those purchases. This blog reviews how you can queue purchases of Savings Plans.

In November 2019, AWS launched Savings Plans. This is a new flexible pricing model that allows you to save up to 72% on Amazon EC2, AWS Fargate, and AWS Lambda in exchange for making a commitment to a consistent amount of compute usage measured in dollars per hour (for example $10/hour) for a 1- or 3-year term. Savings Plans is the easiest way to save money on compute usage while providing you the flexibility to use the compute options that best fits your needs as they change.

Queueing Savings Plans allows you to plan ahead for future events. Say, you want to purchase a Savings Plan three months into the future to cover a new workload. Now, with the ability to queue plans in advance, you can easily schedule the purchase to be carried out at the exact time you expect your workload to go live. This helps you plan in advance by eliminating the need to make “just-in-time” purchases, and benefit from low prices on your future workloads from the get-go. With the ability to queue purchases, you can also enjoy uninterrupted Savings Plans coverage by scheduling renewals of your plans ahead of their expiry. This makes it even easier to save money on your overall AWS bill.

So how do queued purchases for Savings Plans work? Queued purchases are similar to regular purchases in all aspects but one – the start date. With a regular purchase, a plan goes active immediately whereas with a queued purchase, you select a date in the future for a plan to start. Up until the said future date, the Savings Plan remains in a queued state, and on the future date any upfront payments are charged and the plan goes active.

Now, let’s look at this in more detail with a couple of examples. I walk through three scenarios – a) queuing Savings Plans to cover future usage b) renewing expiring Savings Plans and c) deleting a queued Savings plan.

How do I queue a Savings Plan?

If you are planning ahead and would like to queue a Savings Plan to support future needs such as new workloads or expiring Reserved Instances, head to the Purchase Savings Plans page on the AWS Cost Management Console. Then, select the type of Savings Plan you would like to queue, including the term length, purchase commitment, and payment option.

Select the type of Savings Plan

Now, indicate the start date and time for this plan (this is the date/time at which your Savings Plan becomes active). The time you indicate is in UTC, but is also shown in your browser’s local time zone. If you are looking to replace an existing Reserved Instance, you can provide the start date and time to align with the expiration of your existing Reserved Instances. You can find the expiration time of your Reserved Instances on the EC2 Reserved Instances Console (this is in your local time zone, convert it to UTC when you queue a Savings Plan).

After you have selected the start time and date for the Savings Plan, click “Add to cart”. When you are ready to complete the purchase, click “Submit Order,” which completes the purchase.

Once you have submitted the order, the Savings Plans Inventory page lists the queued Savings Plan with a “Queued” status and that purchase will be carried out on the date and time provided.

How can I replace an expiring plan?

If you have already purchased a Savings Plan, queuing purchases allow you to renew that Savings Plan upon expiry for continuous coverage. All you have to do is head to the AWS Cost Management Console, go to the Savings Plans Inventory page, and select the Savings Plan you would like to renew. Then, click on Actions and select “Renew Savings Plan” as seen in the following image.

This action automatically queues a Savings Plan in the cart with the same configuration (as your original plan) to replace the expiring one. The start time for the plan automatically sets to one second after expiration of the old Savings Plan. All you have to do now is submit the order and you are good to go.

If you would like to renew multiple Savings Plans, select each one and click “Renew Savings Plan,” which adds them to the Cart. When you are done adding new Savings Plans, your cart lists all of the Savings Plans that you added to the order. When you are ready to submit the order, click “Submit order.

How can I delete a queued Savings Plan?

If you have queued Savings Plans that you no longer need to purchase, or need to modify, you can do so by visiting the console. Head to the AWS Cost Management Console, select the Savings Plans Inventory page, and then select the Savings Plan you would like to delete. By selecting the Savings Plan and clicking on Actions, as seen in the following image, you can delete the queued purchase if you need to make changes or if you no longer need the plan to be purchased. If you need the Savings Plan at a different commitment value, you can make a new queued purchase.

Conclusion

AWS Savings Plans allow you to save up to 72% of On-demand prices by committing to a 1- or 3- year term. Starting today, with the ability to queue purchases of Savings Plans, you can easily plan for your future needs or renew expiring Savings Plan ahead of time, all with just a few clicks. In this blog, I walked through various scenarios. As you can see, it’s even easier to save money with AWS Savings Plans by queuing your purchases to meet your future needs and continue benefiting from uninterrupted coverage.

Click here to learn more about queuing purchases of Savings Plans and visit the AWS Cost Management Console to get started.

Creating an EC2 instance in the AWS Wavelength Zone

Post Syndicated from Bala Thekkedath original https://aws.amazon.com/blogs/compute/creating-an-ec2-instance-in-the-aws-wavelength-zone/

Creating an EC2 instance in the AWS Wavelength Zone

This blog post is contributed by Saravanan Shanmugam, Lead Solution Architect, AWS Wavelength

AWS announced Wavelength at re:Invent 2019 in partnership with Verizon in US, SK Telecom in South Korea, KDDI in Japan, and Vodafone in UK and Europe. Following the re:Invent 2019 announcement, on August 6, 2020, AWS announced GA of one Wavelength Zone with Verizon in Boston connected to US East (N.Virginia) Region and one in San Francisco connected to the US West (Oregon) Region.

In this blog, I walk you through the steps required to create an Amazon EC2 instance in an AWS Wavelength Zone from the AWS Management console. We also address the questions asked by our customers regarding the different protocol traffic allowed into and out of a AWS Wavelength Zones.

Customers who want to access AWS Wavelength Zones and deploy their applications to the Wavelength Zone can sign up using this link. Customers that opted in to access the AWS Wavelength Zone can confirm the status on the EC2 console Account Attribute section as shown in the following image.

 Services and features

AWS Wavelength Zones are Availability Zones inside the Carrier Service Provider network closer to the Edge of the Mobile Network. Wavelength Zones bring the AWS core compute and storage services like Amazon EC2 and Amazon EBS that can be used by other services like Amazon EKS and Amazon ECS. We look at Wavelength Zone(s) as a hub and spoke model, where developers can deploy latency sensitive, high-bandwidth applications at the Edge and non-latency sensitive and data persistent applications in the Region.

Wavelength Zones supports three Nitro based Amazon EC2 instance types t3 (t3.medium, t3.xlarge) r5 (r5.2xlarge) and g4 (g4dn.2xlarge) with EBS volume types gp2. Customers can also use Amazon ECS and Amazon EKS to deploy container applications at the Edge. Other AWS Services, like AWS CloudFormation templates, CloudWatch, IAM resources, and Organizations, continue to work as expected, providing you a consistent experience. You can also leverage the full suite of services like Amazon S3 in the parent Region over AWS’s private network backbone. Now that we have reviewed AWS wavelength, the services and features associated with it, let us talk about the steps to launch an EC2 instance in the AWS Wavelength zone.

Creating a Subnet in the Wavelength Zone

Once the Wavelength Zone is enabled for your AWS Account, you can extend your existing VPC from the parent Region to a Wavelength Zone by creating a new VPC subnet assigned to the AWS Wavelength Zone. Customers can also create a new VPC and then a Subnet to deploy their applications in the Wavelength zone. The following image shows the Subnet creation step, where you pick the Wavelength Zone as the Availability zone for the subnet

Carrier Gateway

We have introduced a new gateway type called Carrier Gateway, which allows you to route traffic from the Wavelength Zone subnet to the CSP network and to the Internet. Carrier Gateways are similar to the Internet gateway in the Region. Carrier Gateway is also responsible for NAT’ing the traffic from/to the Wavelength Zone subnets mapping it to the carrier ip address assigned to the instances.

Creating a Carrier Gateway

In the VPC console, you can now create Carrier Gateway and attach it to your VPC.

You select the VPC to which the Carrier Gateway must be attached. There is also option to select “Route subnet traffic to the Carrier Gateway” in the Carrier Gateway creation step. By selecting this option, you can pick the Wavelength subnets you want to default route to the Carrier Gateway. This option automatically deletes the existing route table to the subnets, creates a new route table, creates a default route entry, and attaches the new route table to the Subnets you selected. The following picture captures the necessary input required while creating a Carrier Gateway

 

Creating an EC2 instance in a Wavelength Zone with Private IP Address

Once a VPC subnet is created for the AWS Wavelength Zone, you can launch an EC2 instance with a Private address using the EC2 Launch Wizard. In the configure instance details step, you can select the Wavelength Zone Subnet that you created in the “Creating a Subnet” section.

Attach a IAM profile with SSM role included, which allows you to SSH into the console of the instance through SSM. This is a recommended practice for Wavelength Zone instances as there is no direct SSH access allowed from Public internet.

 Creating an EC2 instance in a Wavelength Zone with Carrier IP Address

The instances running in the Wavelength Zone subnets can obtain a Carrier IP address, which is allocated from a pool of IP addresses called Network Border group (NBG). To create an EC2 instance in the Wavelength Zone with a carrier routable IP address, you can use AWS CLI. You can use the following command to create EC2 instance in a Wavelength Zone subnet. Note the additional network interface (NIC) option “AssociateCarrierIpAddress: as part of the EC2 run instance command, as shown in the following command.

aws ec2 --region us-west-2 run-instances --network-interfaces '[{"DeviceIndex":0, "AssociateCarrierIpAddress": true, "SubnetId": "<subnet-0d3c2c317ac4a262a>"}]' --image-id <ami-0a07be880014c7b8e> --instance-type t3.medium --key-name <san-francisco-wavelength-sample-key>

 *To use “AssociateCarrierIpAddress” option in the ec2 run-instance command use the latest aws cli v2.

The carrier IP assigned to the EC2 instance can be obtained by running the following command.

 aws ec2 describe-instances --instance-ids <replace-with-your-instance-id> --region us-west-2

 Make necessary changes to the default security group that is attached to the EC2 instance after running the run-instance command to allow the necessary protocol traffic. If you allow ICMP traffic to your EC2 instance, you can test ICMP connectivity to your instance from the public internet.

The different protocols allowed in and out of the Wavelength Zone are captured in the following table.

 

TCP Connection FROMTCP Connection TOResult*
Region ZonesWL ZonesAllowed
Wavelength ZonesRegionAllowed
Wavelength ZonesInternetAllowed
Internet (TCP SYN)WL ZonesBlocked
Internet (TCP EST)WL ZonesAllowed
Wavelength ZonesUE (Radio)Allowed
UE(Radio)WL ZonesAllowed

 

UDP Packets FROMUDP Packets TOResult*
Wavelength ZonesWL ZonesAllowed
Wavelength ZonesRegionAllowed
Wavelength ZonesInternetAllowed
InternetWLBlocked
Wavelength ZonesUE (Radio)Allowed
UE(Radio)WL ZonesAllowed

 

ICMP FROMICMP TOResult*
Wavelength ZonesWL ZonesAllowed
Wavelength ZonesRegionAllowed
Wavelength ZonesInternetAllowed
InternetWLAllowed
Wavelength ZonesUE (Radio)Allowed
UE(Radio)WL ZonesAllowed

Conclusion

We have covered how to create and run an EC2 instance in the AWS Wavelength Zone, the core foundation for application deployments. We will continue to publish blogs helping customers to create ECS and EKS clusters in the AWS Wavelength Zones and deploy container applications at the Mobile Carriers Edge. We are really looking forward to seeing what all you can do with them. AWS would love to get your advice on additional local services/features or other interesting use cases, so feel free to leave us your comments!

 

EFA-enabled C5n instances to scale Simcenter STAR-CCM+

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/efa-enabled-c5n-instances-to-scale-simcenter-star-ccm/

This post was contributed by Dnyanesh Digraskar, Senior Partner SA, High Performance Computing; Linda Hedges, Principal SA, High Performance Computing

In this blog, we define and demonstrate the scalability metrics for a typical real-world application using Computational Fluid Dynamics (CFD) software from Siemens, Simcenter STAR-CCM+, running on a High Performance Computing (HPC) cluster on Amazon Web Services (AWS). This scenario demonstrates the scaling of an external aerodynamics CFD case with 97 million cells to over 4,000 cores of Amazon EC2 C5n.18xlarge instances using the Simcenter STAR-CCM+ software. We also discuss the effects of scaling on efficiency, simulation turn-around time, and total simulation costs. TLG Aerospace, a Seattle-based aerospace engineering services company, contributed the data used in this blog. For a detailed case study describing TLG Aerospace’s experience and the results they achieved, see the TLG Aerospace case study.

For HPC workloads that use multiple nodes, the cluster setup including the network is at the heart of scalability concerns. Some of the most common concerns from CFD or HPC engineers are “how well will my application scale on AWS?”, “how do I optimize the associated costs for best performance of my application on AWS?”, “what are the best practices in setting up an HPC cluster on AWS to reduce the simulation turn-around time and maintain high efficiency?” This post aims to answer these concerns by defining and explaining important scalability-related parameters by illustrating the results from the CFD case. For detailed HPC-specific information, see visit the High Performance Computing page and download the CFD whitepaper, Computational Fluid Dynamics on AWS.

CFD scaling on AWS

Scale-up

HPC applications, such as CFD, depend heavily on the applications’ ability to scale compute tasks efficiently in parallel across multiple compute resources. We often evaluate parallel performance by determining an application’s scale-up. Scale-up – a function of the number of processors used – is the time to complete a run on one processor, divided by the time to complete the same run on the number of processors used for the parallel run.

Scale-up formula

In addition to characterizing the scale-up of an application, scalability can be further characterized as “strong” or “weak”. Strong scaling offers a traditional view of application scaling, where a problem size is fixed and spread over an increasing number of processors. As more processors are added to the calculation, good strong scaling means that the time to complete the calculation decreases proportionally with increasing processor count. In comparison, weak scaling does not fix the problem size used in the evaluation, but purposely increases the problem size as the number of processors also increases. An application demonstrates good weak scaling when the time to complete the calculation remains constant as the ratio of compute effort to the number of processors is held constant. Weak scaling offers insight into how an application behaves with varying case size.

Figure 1, the following image, shows scale-up as a function of increasing processor count for the Simcenter STAR-CCM+ case data provided by TLG Aerospace. This is a demonstration of “strong” scalability. The blue line shows what ideal or perfect scalability looks like. The purple triangles show the actual scale-up for the case as a function of increasing processor count. The closeness of these two curves demonstrates excellent scaling to well over 3,000 processors for this mid-to-large-sized 97M cell case. This example was run on Amazon EC2 C5n.18xlarge Intel Skylake instances, 3.0 GHz, each providing 36 cores with Hyper-Threading disabled.

Figure 1. Strong scaling demonstrated for a 97M cell Simcenter STAR-CCM+ CFD calculation

Efficiency

Now that you understand the variation of scale-up with the number of processors, we discuss the relation of scale-up with number of grid cells per processor, which determines the efficiency of the parallel simulation. Efficiency is the scale-up divided by the number of processors used in the calculation. By plotting grid cells per processor, as in Figure 2, scaling estimates can be made for simulations with different grid sizes with Simcenter STAR-CCM+. The purple line in Figure 2 shows scale-up as a function of grid cells per processor. The vertical axis for scale-up is on the left-hand side of the graph as indicated by the purple arrow. The green line in Figure 2 shows efficiency as a function of grid cells per processor. The vertical axis for efficiency is on the right side of the graph and is indicated by the green arrow.

Figure 2. Scale-up and efficiency as a function of cells per processor.

Fewer grid cells per processor means reduced computational effort per processor. Maintaining efficiency while reducing cells per processor demonstrates the strong scalability of Simcenter STAR-CCM+ on AWS.

Efficiency remains at about 100% between approximately 700,000 cells per processor core and 60,000 cells per processor core. Efficiency starts to fall off at about 60,000 cells per core. An efficiency of at least 80% is maintained until 25,000 cells per core. Decreasing cells per core leads to decreased efficiency because the total computational effort per processor core is reduced. The goal of achieving more than 100% efficiency (here, at about 250,000 cells per core) is common in scaling studies, is case-specific, and often related to smaller effects such as timing variation and memory caching.

Turn-around time and cost

Case turn-around time and cost is what really matters to most HPC users. A plot of turn-around time versus CPU cost for this case is shown in Figure 3. As the number of cores increases, the total turn-around time decreases. But as the number of cores increases, the inefficiency also increases, which leads to increased costs. The cost, represented by solid blue curve, is based on the On-Demand price for the C5n.18xlarge, and only includes the computational costs. Small costs are also incurred for data storage. Minimum cost and turn-around time are achieved with approximately 60,000 cells per core.

Figure 3. Cost per run for: On-Demand pricing ($3.888 per hour for C5n.18xlarge in US-East-1) with and without the Simcenter STAR-CCM+ POD license cost as a function of turn-around time [Blue]; 3-yr all-upfront pricing ($1.475 per hour for C5n.18xlarge in US-East-1) [Green]

Many users choose a cell count per core count to achieve the lowest possible cost. Others may choose a cell count per core count to achieve the fastest turn-around time. If a run is desired in 1/3rd the time of the lowest price point, it can be achieved with approximately 25,000 cells per core.

Additional information about the test scenario

TLG Aerospace has used the Simcenter STAR-CCM+ Power-On-Demand (POD) license for running the simulations for this case. POD license enables flexible On-Demand usage of the software on unlimited cores for a fixed price of $22 per hour. The total cost per run, which includes the computational cost, plus the POD license cost is represented in Figure 3 by the dashed blue curve. As POD license is charged per hour, the total cost per run increases for higher turn-around times. Note that many users run Simcenter STAR-CCM+ with fewer cells per core than this case. While this increases the compute cost, other concerns—such as license costs or schedules—can be overriding factors. However, many find the reduced turn-around time well worth the price of the additional instances.

AWS also offers Savings Plans, which are a flexible pricing model offering substantially lower price on EC2 instances compared to On-Demand prices for a committed usage of 1- or 3-year term. For example, the 3-year all-upfront pricing of C5n.18xlarge instance is 62% cheaper than the On-Demand pricing. The total cost per run using the 3-year all-upfront pricing model is illustrated in Figure 3 by solid green line. The 3-year all-upfront pricing plan offers a substantial reduction in price for running the simulations.

Amazon Linux is optimized to run on AWS and offers excellent performance for running HPC applications. For the case presented here, the operating system used was Amazon Linux 2. While other Linux distributions are also performant, we strongly recommend that for Linux HPC applications, you use a current Linux kernel.

Amazon Elastic Block Store (Amazon EBS) is a persistent, block-level storage device that is often used for cluster storage on AWS. A standard EBS General Purpose SSD (gp2) volume was used for this scenario. For other HPC applications that may require faster I/O to prevent data writes from being a bottleneck to turn-around speed, we recommend FSx for Lustre. FSx for Lustre seamlessly integrates with Amazon S3, allowing users for efficient data interaction with Amazon S3.

AWS customers can choose to run their applications on either threads or cores. With hyper-threading, a single CPU physical core appears as two logical CPUs to the operating system. For an application like Simcenter STAR-CCM+, excellent linear scaling can be seen when using either threads or cores, though we generally recommend disabling hyper-threading. Most HPC applications benefit from disabling hyper-threading, and therefore, it tends to be the preferred environment for running HPC workloads. For more information, see Well-Architected Framework HPC Lens.

Elastic Fabric Adapter (EFA)

Elastic Fabric Adapter (EFA) is a network device that can be attached to Amazon EC2 instances to accelerate HPC applications by providing lower and consistent latency and higher throughput than the Transmission Control Protocol (TCP) transport. C5n.18xlarge instances used for running Simcenter STAR-CCM+ for this case support EFA technology, which is generally recommended for best scaling.

Summary

This post demonstrates the scalability of a commercial CFD software Simcenter STAR-CCM+ for an external aerodynamics simulation performed on the Amazon EC2 C5n.18xlarge instances. The availability of EFA, a high-performing network device on these instances result in excellent scalability of the application. The case turn-around time and associated costs of running Simcenter STAR-CCM+ on AWS hardware are discussed. In general, excellent performance can be achieved on AWS for most HPC applications. In addition to low cost and quick turn-around time, important considerations for HPC also include throughput and availability. AWS offers high throughput, scalability, security, cost-savings, and high availability, decreasing a long queue time and reducing the case turn-around time.

How to delete user data in an AWS data lake

Post Syndicated from George Komninos original https://aws.amazon.com/blogs/big-data/how-to-delete-user-data-in-an-aws-data-lake/

General Data Protection Regulation (GDPR) is an important aspect of today’s technology world, and processing data in compliance with GDPR is a necessity for those who implement solutions within the AWS public cloud. One article of GDPR is the “right to erasure” or “right to be forgotten” which may require you to implement a solution to delete specific users’ personal data.

In the context of the AWS big data and analytics ecosystem, every architecture, regardless of the problem it targets, uses Amazon Simple Storage Service (Amazon S3) as the core storage service. Despite its versatility and feature completeness, Amazon S3 doesn’t come with an out-of-the-box way to map a user identifier to S3 keys of objects that contain user’s data.

This post walks you through a framework that helps you purge individual user data within your organization’s AWS hosted data lake, and an analytics solution that uses different AWS storage layers, along with sample code targeting Amazon S3.

Reference architecture

To address the challenge of implementing a data purge framework, we reduced the problem to the straightforward use case of deleting a user’s data from a platform that uses AWS for its data pipeline. The following diagram illustrates this use case.

We’re introducing the idea of building and maintaining an index metastore that keeps track of the location of each user’s records and allows us locate to them efficiently, reducing the search space.

You can use the following architecture diagram to delete a specific user’s data within your organization’s AWS data lake.

For this initial version, we created three user flows that map each task to a fitting AWS service:

Flow 1: Real-time metastore update

The S3 ObjectCreated or ObjectDelete events trigger an AWS Lambda function that parses the object and performs an add/update/delete operation to keep the metadata index up to date. You can implement a simple workflow for any other storage layer, such as Amazon Relational Database Service (RDS), Amazon Aurora, or Amazon Elasticsearch Service (ES). We use Amazon DynamoDB and Amazon RDS for PostgreSQL as the index metadata storage options, but our approach is flexible to any other technology.

Flow 2: Purge data

When a user asks for their data to be deleted, we trigger an AWS Step Functions state machine through Amazon CloudWatch to orchestrate the workflow. Its first step triggers a Lambda function that queries the metadata index to identify the storage layers that contain user records and generates a report that’s saved to an S3 report bucket. A Step Functions activity is created and picked up by a Lambda Node JS based worker that sends an email to the approver through Amazon Simple Email Service (SES) with approve and reject links.

The following diagram shows a graphical representation of the Step Function state machine as seen on the AWS Management Console.

The approver selects one of the two links, which then calls an Amazon API Gateway endpoint that invokes Step Functions to resume the workflow. If you choose the approve link, Step Functions triggers a Lambda function that takes the report stored in the bucket as input, deletes the objects or records from the storage layer, and updates the index metastore. When the purging job is complete, Amazon Simple Notification Service (SNS) sends a success or fail email to the user.

The following diagram represents the Step Functions flow on the console if the purge flow completed successfully.

For the complete code base, see step-function-definition.json in the GitHub repo.

Flow 3: Batch metastore update

This flow refers to the use case of an existing data lake for which index metastore needs to be created. You can orchestrate the flow through AWS Step Functions, which takes historical data as input and updates metastore through a batch job. Our current implementation doesn’t include a sample script for this user flow.

Our framework

We now walk you through the two use cases we followed for our implementation:

  • You have multiple user records stored in each Amazon S3 file
  • A user has records stored in homogenous AWS storage layers

Within these two approaches, we demonstrate alternatives that you can use to store your index metastore.

Indexing by S3 URI and row number

For this use case, we use a free tier RDS Postgres instance to store our index. We created a simple table with the following code:

CREATE UNLOGGED TABLE IF NOT EXISTS user_objects (
				userid TEXT,
				s3path TEXT,
				recordline INTEGER
			);

You can index on user_id to optimize query performance. On object upload, for each row, you need to insert into the user_objects table a row that indicates the user ID, the URI of the target Amazon S3 object, and the row that corresponds to the record. For instance, when uploading the following JSON input, enter the following code:

{"user_id":"V34qejxNsCbcgD8C0HVk-Q","body":"…"}
{"user_id":"ofKDkJKXSKZXu5xJNGiiBQ","body":"…"}
{"user_id":"UgMW8bLE0QMJDCkQ1Ax5Mg","body ":"…"}

We insert the tuples into user_objects in the Amazon S3 location s3://gdpr-demo/year=2018/month=2/day=26/input.json. See the following code:

(“V34qejxNsCbcgD8C0HVk-Q”, “s3://gdpr-demo/year=2018/month=2/day=26/input.json”, 0)
(“ofKDkJKXSKZXu5xJNGiiBQ”, “s3://gdpr-demo/year=2018/month=2/day=26/input.json”, 1)
(“UgMW8bLE0QMJDCkQ1Ax5Mg”, “s3://gdpr-demo/year=2018/month=2/day=26/input.json”, 2)

You can implement the index update operation by using a Lambda function triggered on any Amazon S3 ObjectCreated event.

When we get a delete request from a user, we need to query our index to get some information about where we have stored the data to delete. See the following code:

SELECT s3path,
                ARRAY_AGG(recordline)
                FROM user_objects
                WHERE userid = ‘V34qejxNsCbcgD8C0HVk-Q’
                GROUP BY;

The preceding example SQL query returns rows like the following:

(“s3://gdpr-review/year=2015/month=12/day=21/review-part-0.json“, {2102,529})

The output indicates that lines 529 and 2102 of S3 object s3://gdpr-review/year=2015/month=12/day=21/review-part-0.json contain the requested user’s data and need to be purged. We then need to download the object, remove those rows, and overwrite the object. For a Python implementation of the Lambda function that implements this functionality, see deleteUserRecords.py in the GitHub repo.

Having the record line available allows you to perform the deletion efficiently in byte format. For implementation simplicity, we purge the rows by replacing the deleted rows with an empty JSON object. You pay a slight storage overhead, but you don’t need to update subsequent row metadata in your index, which would be costly. To eliminate empty JSON objects, we can implement an offline vacuum and index update process.

Indexing by file name and grouping by index key

For this use case, we created a DynamoDB table to store our index. We chose DynamoDB because of its ease of use and scalability; you can use its on-demand pricing model so you don’t need to guess how many capacity units you might need. When files are uploaded to the data lake, a Lambda function parses the file name (for example, 1001-.csv) to identify the user identifier and populates the DynamoDB metadata table. Userid is the partition key, and each different storage layer has its own attribute. For example, if user 1001 had data in Amazon S3 and Amazon RDS, their records look like the following code:

{"userid:": 1001, "s3":{"s3://path1", "s3://path2"}, "RDS":{"db1.table1.column1"}}

For a sample Python implementation of this functionality, see update-dynamo-metadata.py in the GitHub repo.

On delete request, we query the metastore table, which is DynamoDB, and generate a purge report that contains details on what storage layers contain user records, and storage layer specifics that can speed up locating the records. We store the purge report to Amazon S3. For a sample Lambda function that implements this logic, see generate-purge-report.py in the GitHub repo.

After the purging is approved, we use the report as input to delete the required resources. For a sample Lambda function implementation, see gdpr-purge-data.py in the GitHub repo.

Implementation and technology alternatives

We explored and evaluated multiple implementation options, all of which present tradeoffs, such as implementation simplicity, efficiency, critical data compliance, and feature completeness:

  • Scan every record of the data file to create an index – Whenever a file is uploaded, we iterate through its records and generate tuples (userid, s3Uri, row_number) that are then inserted to our metadata storing layer. On delete request, we fetch the metadata records for requested user IDs, download the corresponding S3 objects, perform the delete in place, and re-upload the updated objects, overwriting the existing object. This is the most flexible approach because it supports a single object to store multiple users’ data, which is a very common practice. The flexibility comes at a cost because it requires downloading and re-uploading the object, which introduces a network bottleneck in delete operations. User activity datasets such as customer product reviews are a good fit for this approach, because it’s unexpected to have multiple records for the same user within each partition (such as a date partition), and it’s preferable to combine multiple users’ activity in a single file. It’s similar to what was described in the section “Indexing by S3 URI and row number” and sample code is available in the GitHub repo.
  • Store metadata as file name prefix – Adding the user ID as the prefix of the uploaded object under the different partitions that are defined based on query pattern enables you to reduce the required search operations on delete request. The metadata handling utility finds the user ID from the file name and maintains the index accordingly. This approach is efficient in locating the resources to purge but assumes a single user per object, and requires you to store user IDs within the filename, which might require InfoSec considerations. Clickstream data, where you would expect to have multiple click events for a single customer on a single date partition during a session, is a good fit. We covered this approach in the section “Indexing by file name and grouping by index key” and you can download the codebase from the GitHub repo.
  • Use a metadata file – Along with uploading a new object, we also upload a metadata file that’s picked up by an indexing utility to create and maintain the index up to date. On delete request, we query the index, which points us to the records to purge. A good fit for this approach is a use case that already involves uploading a metadata file whenever a new object is uploaded, such as uploading multimedia data, along with their metadata. Otherwise, uploading a metadata file on every object upload might introduce too much of an overhead.
  • Use the tagging feature of AWS services – Whenever a new file is uploaded to Amazon S3, we use the Put Object Tagging Amazon S3 operation to add a key-value pair for the user identifier. Whenever there is a user data delete request, it fetches objects with that tag and deletes them. This option is straightforward to implement using the existing Amazon S3 API and can therefore be a very initial version of your implementation. However, it involves significant limitations. It assumes a 1:1 cardinality between Amazon S3 objects and users (each object only contains data for a single user), searching objects based on a tag is limited and inefficient, and storing user identifiers as tags might not be compliant with your organization’s InfoSec policy.
  • Use Apache Hudi – Apache Hudi is becoming a very popular option to perform record-level data deletion on Amazon S3. Its current version is restricted to Amazon EMR, and you can use it if you start to build your data lake from scratch, because you need to store your as Hudi datasets. Hudi is a very active project and additional features and integrations with more AWS services are expected.

The key implementation decision of our approach is separating the storage layer we use for our data and the one we use for our metadata. As a result, our design is versatile and can be plugged in any existing data pipeline. Similar to deciding what storage layer to use for your data, there are many factors to consider when deciding how to store your index:

  • Concurrency of requests – If you don’t expect too many simultaneous inserts, even something as simple as Amazon S3 could be a starting point for your index. However, if you get multiple concurrent writes for multiple users, you need to look into a service that copes better with transactions.
  • Existing team knowledge and infrastructure – In this post, we demonstrated using DynamoDB and RDS Postgres for storing and querying the metadata index. If your team has no experience with either of those but are comfortable with Amazon ES, Amazon DocumentDB (with MongoDB compatibility), or any other storage layer, use those. Furthermore, if you’re already running (and paying for) a MySQL database that’s not used to capacity, you could use that for your index for no additional cost.
  • Size of index – The volume of your metadata is orders of magnitude lower than your actual data. However, if your dataset grows significantly, you might need to consider going for a scalable, distributed storage solution rather than, for instance, a relational database management system.

Conclusion

GDPR has transformed best practices and introduced several extra technical challenges in designing and implementing a data lake. The reference architecture and scripts in this post may help you delete data in a manner that’s compliant with GDPR.

Let us know your feedback in the comments and how you implemented this solution in your organization, so that others can learn from it.

 


About the Authors

George Komninos is a Data Lab Solutions Architect at AWS. He helps customers convert their ideas to a production-ready data product. Before AWS, he spent 3 years at Alexa Information domain as a data engineer. Outside of work, George is a football fan and supports the greatest team in the world, Olympiacos Piraeus.

 

 

 

 

Sakti Mishra is a Data Lab Solutions Architect at AWS. He helps customers architect data analytics solutions, which gives them an accelerated path towards modernization initiatives. Outside of work, Sakti enjoys learning new technologies, watching movies, and travel.

New EC2 T4g Instances – Burstable Performance Powered by AWS Graviton2 – Try Them for Free

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-t4g-instances-burstable-performance-powered-by-aws-graviton2/

Two years ago Amazon Elastic Compute Cloud (EC2) T3 instances were first made available, offering a very cost effective way to run general purpose workloads. While current T3 instances offer sufficient compute performance for many use cases, many customers have told us that they have additional workloads that would benefit from increased peak performance and lower cost.

Today, we are launching T4g instances, a new generation of low cost burstable instance type powered by AWS Graviton2, a processor custom built by AWS using 64-bit Arm Neoverse cores. Using T4g instances you can enjoy a performance benefit of up to 40% at a 20% lower cost in comparison to T3 instances, providing the best price/performance for a broader spectrum of workloads.

T4g instances are designed for applications that don’t use CPU at full power most of the time, using the same credit model as T3 instances with unlimited mode enabled by default. Examples of production workloads that require high CPU performance only during times of heavy data processing are web/application servers, small/medium data stores, and many microservices. Compared to previous generations, the performance of T4g instances makes it possible to migrate additional workloads such as caching servers, search engine indexing, and e-commerce platforms.

T4g instances are available in 7 sizes providing up to 5 Gbps of network and up to 2.7 Gbps of Amazon Elastic Block Store (EBS) performance:

NamevCPUsBaseline Performance/vCPUCPU Credits Earned/HourMemory
t4g.nano25%60.5 GiB
t4g.micro210%121 GiB
t4g.small220%242 GiB
t4g.medium220%244 GiB
t4g.large230%368 GiB
t4g.xlarge440%9616 GiB
t4g.2xlarge840%19232 GiB

Free Trial
To make it easier to develop, test, and run your applications on T4g instances, all AWS customers are automatically enrolled in a free trial on the t4g.micro size. Starting September 2020 until December 31st 2020, you can run a t4g.micro instance and automatically get 750 free hours per month deducted from your bill, including any CPU credits during the free 750 hours of usage. The 750 hours are calculated in aggregate across all regions. For details on terms and conditions of the free trial, please refer to the EC2 FAQs.

During the free trial, have a look at this getting started guide on using the Arm-based AWS Graviton processors. There, you can find suggestions on how to build and optimize your applications, using different programming languages and operating systems, and on managing container-based workloads. Some of the tips are specific for the Graviton processor, but most of the content works generally for anyone using Arm to run their code.

Using T4g Instances
You can start an EC2 instance in different ways, for example using the EC2 console, the AWS Command Line Interface (CLI), AWS SDKs, or AWS CloudFormation. For my first T4g instance, I use the AWS CLI:

$ aws ec2 run-instances \
  --instance-type t4g.micro \
  --image-id ami-09a67037138f86e67 \
  --security-groups MySecurityGroup \
  --key-name my-key-pair

The Amazon Machine Image (AMI) I am using is based on Amazon Linux 2. Other platforms are available, such as Ubuntu 18.04 or newer, Red Hat Enterprise Linux 8.0 and newer, and SUSE Enterprise Server 15 and newer. You can find additional AMIs in the AWS Marketplace, for example Fedora, Debian, NetBSD, CentOS, and NGINX Plus. For containerized applications, Amazon ECS and Amazon Elastic Kubernetes Service optimized AMIs are available as well.

The security group I selected gives me SSH access to the instance. I connect to the instance and do a general update:

$ sudo yum update -y

Since the kernel has been updated, I reboot the instance.

I’d like to set up this instance as a development environment. I can use it to build new applications, or to recompile my existing apps to the 64-bit Arm architecture. To install most development tools, such as Git, GCC, and Make, I use this group of packages:

$ sudo yum groupinstall -y "Development Tools"

AWS is working with several open source communities to drive improvements to the performance of software stacks running on AWS Graviton2. For example, you can see our contributions to PHP for Arm64 in this post.

Using the latest versions helps you obtain maximum performance from your Graviton2-based instances. The amazon-linux-extras command enables new versions for some of my favorite programming environments:

$ sudo amazon-linux-extras enable golang1.11 corretto8 php7.4 python3.8 ruby2.6

The output of the amazon-linux-extras command tells me which packages to install with yum:

$ yum clean metadata
$ sudo yum install -y golang java-1.8.0-amazon-corretto \
  php-cli php-pdo php-fpm php-json php-mysqlnd \
  python38 ruby ruby-irb rubygem-rake rubygem-json rubygems

Let’s check the versions of the tools that I just installed:

$ go version
go version go1.13.14 linux/arm64
$ java -version
openjdk version "1.8.0_265"
OpenJDK Runtime Environment Corretto-8.265.01.1 (build 1.8.0_265-b01)
OpenJDK 64-Bit Server VM Corretto-8.265.01.1 (build 25.265-b01, mixed mode)
$ php -v
PHP 7.4.9 (cli) (built: Aug 21 2020 21:45:13) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
$ python3.8 -V
Python 3.8.5
$ ruby -v
ruby 2.6.3p62 (2019-04-16 revision 67580) [aarch64-linux]

It looks like I am ready to go! Many more packages are available with yum, such as MariaDB and PostgreSQL. If you’re interested in databases, you might also want to try the preview of Amazon RDS powered by AWS Graviton2 processors.

Available Now
T4g instances are available today in US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Tokyo, Mumbai), Europe (Frankfurt, Ireland).

You now have a broad choice of Graviton2-based instances to better optimize your workloads for cost and performance: low cost burstable general-purpose (T4g), general purpose (M6g), compute optimized (C6g) and memory optimized (R6g) instances. Local NVMe-based SSD storage options are also available.

You can use the free trial to develop new applications, or migrate your existing workloads to the AWS Graviton2 processor. Let me know how that goes!

Danilo

How to configure an LDAPS endpoint for Simple AD

Post Syndicated from Marco Sommella original https://aws.amazon.com/blogs/security/how-to-configure-ldaps-endpoint-for-simple-ad/

In this blog post, we show you how to configure an LDAPS (LDAP over SSL or TLS) encrypted endpoint for Simple AD so that you can extend Simple AD over untrusted networks. Our solution uses Network Load Balancer (NLB) as SSL/TLS termination. The data is then decrypted and sent to Simple AD. Network Load Balancer offers integrated certificate management, SSL/TLS termination, and the ability to use a scalable Amazon Elastic Compute Cloud (Amazon EC2) backend to process decrypted traffic. Network Load Balancer also tightly integrates with Amazon Route 53, enabling you to use a custom domain for the LDAPS endpoint. To simplify testing and deployment, we have provided an AWS CloudFormation template to provision the network load balancer (NLB).

Simple AD, which is powered by Samba 4, supports basic Active Directory (AD) authentication features such as users, groups, and the ability to join domains. Simple AD also includes an integrated Lightweight Directory Access Protocol (LDAP) server. LDAP is a standard application protocol for accessing and managing directory information. You can use the BIND operation from Simple AD to authenticate LDAP client sessions. This makes LDAP a common choice for centralized authentication and authorization for services such as Secure Shell (SSH), client-based virtual private networks (VPNs), and many other applications. Authentication, the process of confirming the identity of a principal, typically involves the transmission of highly sensitive information such as user names and passwords. To protect this information in transit over untrusted networks, companies often require encryption as part of their information security strategy.

This post assumes that you understand concepts such as Amazon Virtual Private Cloud (Amazon VPC) and its components, including subnets, routing, internet and network address translation (NAT) gateways, DNS, and security groups. If needed, you should familiarize yourself with these concepts and review the solution overview and prerequisites in the next section before proceeding with the deployment.

Note: This solution is intended for use by clients who require only an LDAPS endpoint. If your requirements extend beyond this, you should consider accessing the Simple AD servers directly or by using AWS Directory Service for Microsoft AD.

Solution overview

The following description explains the Simple AD LDAPS environment. The AWS CloudFormation template creates the network-load-balancer object.

  1. The LDAP client sends an LDAPS request to the NLB on TCP port 636.
  2. The NLB terminates the SSL/TLS session and decrypts the traffic using a certificate. The NLB sends the decrypted LDAP traffic to Simple AD on TCP port 389.
  3. The Simple AD servers send an LDAP response to the NLB. The NLB encrypts the response and sends it to the client.

The following diagram illustrates how the solution works and shows the prerequisites (listed in the following section).

Figure 1: LDAPS with Simple AD Architecture

Figure 1: LDAPS with Simple AD Architecture

Note: Amazon VPC prevents third parties from intercepting traffic within the VPC. Because of this, the VPC protects the decrypted traffic between the NLB and Simple AD. The NLB encryption provides an additional layer of security for client connections and protects traffic coming from hosts outside the VPC.

Prerequisites

  1. Our approach requires an Amazon VPC with one public and two private subnets. If you don’t have an Amazon VPC that meets that requirement, use the following instructions to set up a sample environment:
    1. Identify an AWS Region that supports Simple AD and network load balancing.
    2. Identify two Availability Zones in that Region to use with Simple AD. The Availability Zones are needed as parameters in the AWS CloudFormation template used later in this process.
    3. Create or choose an Amazon VPC in the region you chose.
    4. Enable DNS support within your VPC so you can use Route 53 to resolve the LDAPS endpoint.
    5. Create two private subnets, one per Availability Zone. The Simple AD servers use the subnets that you create.
    6. Create a public subnet in the same VPC.
    7. The LDAP service requires a DNS domain that resolves within your VPC and from your LDAP clients. If you don’t have an existing DNS domain, create a private hosted zone and associate it with your VPC. To avoid encryption protocol errors, you must ensure that the DNS domain name is consistent across your Route 53 zone and in the SSL/TLS certificate.
  2. Make sure you’ve completed the Simple AD prerequisites.
  3. You can use a certificate issued by your preferred certificate authority or a certificate issued by AWS Certificate Manager (ACM). If you don’t have a certificate authority, you can create a self-signed certificate by following the instructions in section 2 (Create a certificate).

Note: To prevent unauthorized direct connections to your Simple AD servers, you can modify the Simple AD security group on port 389 to block traffic from locations outside of the Simple AD VPC. You can find the security group in the Amazon EC2 console by creating a search filter for your Simple AD directory ID. It is also important to allow the Simple AD servers to communicate with each other as shown on Simple AD Prerequisites.

Solution deployment

This solution includes 5 main parts:

  1. Create a Simple AD directory.
  2. (Optional) Create a SSL/TLS certificate, if you don’t have already have one.
  3. Create the NLB by using the supplied AWS CloudFormation template.
  4. Create a Route 53 record.
  5. Test LDAPS access using an Amazon Linux 2 client.

1. Create a Simple AD directory

With the prerequisites completed, your first step is to create a Simple AD directory in your private VPC subnets.

To create a Simple AD directory:

  1. In the Directory Service console navigation pane, choose Directories and then choose Set up directory.
  2. Choose Simple AD.

    Figure 2: Select directory type

    Figure 2: Select directory type

  3. Provide the following information:
    1. Directory Size: The size of the directory. The options are Small or Large. Which you should choose depends on the anticipated size of your directory.
    2. Directory DNS: The fully qualified domain name (FQDN) of the directory, such as corp.example.com.

      Note: You will need the directory FQDN when you test your solution.

    3. NetBIOS name: The short name for the directory, such as corp.
    4. Administrator password: The password for the directory administrator. The directory creation process creates an administrator account with the user name Administrator and this password. Don’t lose this password, because it can’t be recovered. You also need this password for testing LDAPS access in a later step.
    5. Description: An optional description for the directory.
    Figure 3: Directory information

    Figure 3: Directory information

  4. Select the VPC and subnets, and then choose Next:
    • VPC: Use the dropdown list to select the VPC to install the directory in.
    • Subnets: Use the dropdown lists to select two private subnets for the directory servers. The two subnets must be in different Availability Zones. Make a note of the VPC and subnet IDs to use as input parameters for the AWS CloudFormation template. In the following example, the subnets are in the us-east-1a and us-east-1c Availability Zones.
    Figure 4: Choose VPC and subnets

    Figure 4: Choose VPC and subnets

  5. Review the directory information and make any necessary changes. When the information is correct, choose Create directory.

    Figure 5: Review and create the directory

    Figure 5: Review and create the directory

  6. It takes several minutes to create the directory. From the AWS Directory Service console, refresh the screen periodically and wait until the directory Status value changes to Active before continuing.
  7. When the status has changed to Active, choose your Simple AD directory and note the two IP addresses in the DNS address section. You will enter them in a later step when you run the AWS CloudFormation template.

Note: How to administer your Simple AD implementation is out of scope for this post. See the documentation to add users, groups, or instances to your directory. Also see the previous blog post, How to Manage Identities in Simple AD Directories.

2. Add a certificate

Now that you have a Simple AD directory, you need a SSL/TLS certificate. The certificate will be used with the NLB to secure the LDAPS endpoint. You then import the certificate into ACM, which is integrated with the NLB.

As mentioned earlier, you can use a certificate issued by your preferred certificate authority or a certificate issued by AWS Certificate Manager (ACM).

(Optional) Create a self-signed certificate

If you don’t already have a certificate authority, you can use the following instructions to generate a self-signed certificate using OpenSSL.

Note: OpenSSL is a standard, open source library that supports a wide range of cryptographic functions, including the creation and signing of x509 certificates.

Use the command line interface to create a certificate:

  1. You must have a system with OpenSSL installed to complete this step. If you don’t have OpenSSL, you can install it on Amazon Linux by running the command sudo yum install openssl. If you don’t have access to an Amazon Linux instance you can create one with SSH access enabled to proceed with this step. Use the command line to run the command openssl version to see if you already have OpenSSL installed.
    [[email protected] ~]$ openssl version
    OpenSSL 1.0.1k-fips 8 Jan 2015
    

  2. Create a private key using the openssl genrsa command.
    [[email protected] tmp]$ openssl genrsa 2048 > privatekey.pem
    Generating RSA private key, 2048 bit long modulus
    ......................................................................................................................................................................+++
    ..........................+++
    e is 65537 (0x10001)
    

  3. Generate a certificate signing request (CSR) using the openssl req command. Provide the requested information for each field. The Common Name is the FQDN for your LDAPS endpoint (for example, ldap.corp.example.com). The Common Name must use the domain name you will later register in Route 53. You will encounter certificate errors if the names do not match.
    [[email protected] tmp]$ openssl req -new -key privatekey.pem -out server.csr
    You are about to be asked to enter information that will be incorporated into your certificate request.
    

  4. Use the openssl x509 command to sign the certificate. The following example uses the private key from the previous step (privatekey.pem) and the signing request (server.csr) to create a public certificate named server.crt that is valid for 365 days. This certificate must be updated within 365 days to avoid disruption of LDAPS functionality.
    [[email protected] tmp]$ openssl x509 -req -sha256 -days 365 -in server.csr -signkey privatekey.pem -out server.crt
    Signature ok
    subject=/C=XX/L=Default City/O=Default Company Ltd/CN=ldap.corp.example.com
    Getting Private key
    

  5. You should see three files: privatekey.pem, server.crt, and server.csr.
    [[email protected] tmp]$ ls
    privatekey.pem server.crt server.csr
    

  6. Restrict access to the private key.
    [[email protected] tmp]$ chmod 600 privatekey.pem
    

Note: Keep the private key and public certificate to use later. You can discard the signing request, because you are using a self-signed certificate and not using a certificate authority. Always store the private key in a secure location, and avoid adding it to your source code.

Import a certificate

For this step, you can either use a certificate obtained from a certificate authority, or a self-signed certificate that you created using the optional procedure above.

  1. In the ACM console, choose Import a certificate.
  2. Using a Linux text editor, paste the contents of your certificate file (called server.crt if you followed the procedure above) file in the Certificate body box.
  3. Using a Linux text editor, paste the contents of your privatekey.pem file in the Certificate private key box. (For a self-signed certificate, you can leave the Certificate chain box blank.)
  4. Choose Review and import. Confirm the information and choose Import.
  5. Take note of the Amazon Resource Name (ARN) of the imported certificate.

3. Create the NLB by using the supplied AWS CloudFormation template

Now that you have a Simple AD directory and SSL/TLS certificate, you’re ready to use the AWS CloudFormation template to create the NLB.

Create the NLB:

  1. Load the AWS CloudFormation template to deploy an internal NLB. After you load the template, provide the input parameters from the following table:

    Input parameterInput parameter description
    VPCIdThe target VPC for this solution. Must be the VPC where you deployed Simple AD and available in your Simple AD directory details page.
    SubnetId1The Simple AD primary subnet. This information is available in your Simple AD directory details page.
    SubnetId2The Simple AD secondary subnet. This information is available in your Simple AD directory details page.
    SimpleADPriIPThe primary Simple AD Server IP. This information is available in your Simple AD directory details page.
    SimpleADSecIPThe secondary Simple AD Server IP. This information is available in your Simple AD directory details page.
    LDAPSCertificateARNThe Amazon Resource Name (ARN) for the SSL certificate. This information is available in the ACM console.
  2. Enter the input parameters and choose Next.
  3. On the Options page, accept the defaults and choose Next.
  4. On the Review page, confirm the details and choose Create. The stack will be created in approximately 5 minutes.
  5. Wait until the AWS Cloud formation stack status is CREATE_COMPLETE before starting the next procedure, Create a Route 53 record.
  6. Go to Outputs and note the FQDN of your new NLB. The FQDN is in the output variable named LDAPSURL.

    Note: You can find the parameters of your Simple AD on the directory details page by choosing your Simple AD in the Directory Service console.

4. Create a Route 53 record

The next step is to create a Route 53 record in your private hosted zone so that clients can resolve your LDAPS endpoint.

Note: Don’t start this procedure until the AWS CloudFormation stack status is CREATE_COMPLETE.

Create a Route 53 record:

  1. If you don’t have an existing DNS domain for use with LDAP, create a private hosted zone and associate it with your VPC. The hosted zone name should be consistent with your Simple AD (for example, corp.example.com).
  2. When the AWS CloudFormation stack is in CREATE_COMPLETE status, locate the value of the LDAPSURL on the Outputs tab of the stack. Copy this value for use in the next step.
  3. On the Route 53 console, choose Hosted Zones and then choose the zone you used for the Common Name value for your self-signed certificate. Choose Create Record Set and enter the following information:
    1. Name: A short name for the record set (remember that the FQDN has to match the Common Name of your certificate).
    2. Type: Leave as A – IPv4 address.
    3. Alias: Select Yes.
    4. Alias Target: Paste the value of the LDAPSURL from the Outputs tab of the stack.
  4. Leave the defaults for Routing Policy and Evaluate Target Health, and choose Create.
Figure 6: Create a Route 53 record

Figure 6: Create a Route 53 record

5. Test LDAPS access using an Amazon Linux 2 client

At this point, you’re ready to test your LDAPS endpoint from an Amazon Linux client.

Test LDAPS access:

  1. Create an Amazon Linux 2 instance with SSH access enabled to test the solution. Launch the instance on one of the public subnets in your VPC. Make sure the IP assigned to the instance is in the trusted IP range you specified in the security group associated with the Simple AD.
  2. Use SSH to sign in to the instance and complete the following steps to verify access.
    1. Install the openldap-clients package and any required dependencies:
      sudo yum install -y openldap-clients.
      

    2. Add the server.crt file to the /etc/openldap/certs/ directory so that the LDAPS client will trust your SSL/TLS certificate. You can download the file directly from the NLB the certificate and save it in the proper format, or copy the file using Secure Copy or create it using a text editor:
      openssl s_client -connect <LDAPSURL>:636 -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM > server.crt 
      

      Replace <LDAPSURL> with the FQDN of your NLB, the address can be found in the Outputs section of the stack created in CloudFormation.

    3. Edit the /etc/openldap/ldap.conf file to define the environment variables:
      • BASE: The Simple AD directory name.
      • URI: Your DNS alias.
      • TLS_CACERT: The path to your public certificate.
      • TLSCACertificateFile: The path to your self-signed certificate authority. If you used the instructions in section 2 (Create a certificate) to create a certificate, the path will be /etc/ssl/certs/ca-bundle.crt.

      Here’s an example of the file:

      BASE dc=corp,dc=example,dc=com
      URI ldaps://ldap.corp.example.com
      TLS_CACERT /etc/openldap/certs/server.crt
      TLSCACertificateFile /etc/ssl/certs/ca-bundle.crt
      

  3. To test the solution, query the directory through the LDAPS endpoint, as shown in the following command. Replace corp.example.com with your domain name and use the Administrator password that you configured in step 3 of section 1 (Create a Simple AD directory).
    $ ldapsearch -D "[email protected]" -W sAMAccountName=Administrator
    

  4. The response will include the directory information in LDAP Data Interchange Format (LDIF) for the administrator distinguished name (DN) from your Simple AD LDAP server.
    # extended LDIF
    #
    # LDAPv3
    # base <dc=corp,dc=example,dc=com> (default) with scope subtree
    # filter: sAMAccountName=Administrator
    # requesting: ALL
    #
    
    # Administrator, Users, corp.example.com
    dn: CN=Administrator,CN=Users,DC=corp,DC=example,DC=com
    objectClass: top
    objectClass: person
    objectClass: organizationalPerson
    objectClass: user
    description: Built-in account for administering the computer/domain
    instanceType: 4
    whenCreated: 20170721123204.0Z
    uSNCreated: 3223
    name: Administrator
    objectGUID:: l3h0HIiKO0a/ShL4yVK/vw==
    userAccountControl: 512
    …
    

You can now use the LDAPS endpoint for directory operations and authentication within your environment. Here are a few resources to learn more about how to interact with an LDAPS endpoint:

Troubleshooting

If the ldapsearch command returns something like the following error, there are a few things you can do to help identify issues.

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
  1. You might be able to obtain additional error details by adding the -d1 debug flag to the ldapsearch command.
    $ ldapsearch -D "[email protected]" -W sAMAccountName=Administrator –d1
    

  2. Verify that the parameters in ldap.conf match your configured LDAPS URI endpoint and that all parameters can be resolved by DNS. You can use the following dig command, substituting your configured endpoint DNS name.
    $ dig ldap.corp.example.com
    

  3. Confirm that the client instance you’re connecting from is in the trusted IP range you specified in the security associated with your Simple AD directory.
  4. Confirm that the path to your public SSL/TLS certificate in ldap.conf as TLS_CAERT is correct. You configured this as part of step 2 in section 5 (Test LDAPS access using an Amazon Linux 2 client). You can check your SSL/TLS connection with the following command, replacing ldap.corp.example.com with the DNS name of your endpoint.
    $ echo -n | openssl s_client -connect ldap.corp.example.com:636
    

  5. Verify that the status of your Simple AD IPs is Healthy in the Amazon EC2 console.
    1. Open the EC2 console and choose Load Balancing and then Target Groups in the navigation pane.
    2. Choose your LDAPS target and then choose Targets.

Conclusion

You can use NLB to provide an LDAPS endpoint for Simple AD and transport sensitive authentication information over untrusted networks. You can explore using LDAPS to authenticate SSH users or integrate with other software solutions that support LDAP authentication. The AWS CloudFormation template for this solution is available on GitHub.

If you have comments about this post, submit them in the Comments section below. If you have questions about or issues implementing this solution, start a new thread on the AWS Directory Service forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Marco Somella

Marco Sommella

Marco is a Cloud Support Engineer II in the Windows Team based in Dublin. He is a Subject Matter Expert on Directory Service and EC2 Windows. Marco has over 10 years experience as a Windows and Linux system administrator and is passionate about automation coding. He is actively involved in AWS Systems Manager public Automations released by AWS Support and AWS EC2.

Cameron Worrell

Cameron Worrell

Cameron is a Solutions Architect with a passion for security and enterprise transformation. He joined AWS in 2015.

Field Notes: Deploying UiPath RPA Software on AWS

Post Syndicated from Yuchen Lin original https://aws.amazon.com/blogs/architecture/field-notes-deploying-uipath-rpa-software-on-aws/

Running UiPath RPA software on AWS leverages the elasticity of the AWS Cloud, to set up, operate, and scale robotic process automation. It provides cost-efficient and resizable capacity, and scales the robots to meet your business workload. This reduces the need for administration tasks, such as hardware provisioning, environment setup, and backups. It frees you to focus on business process optimization by automating more processes.

This blog post guides you in deploying UiPath robotic processing automation (RPA) software on AWS. RPA software uses the user interface to capture data and manipulate applications just like humans do. It runs as a software robot to interpret, and trigger responses, as well as communicate with other systems to perform a variety of repetitive tasks.

UiPath Enterprise RPA Platform provides the full automation lifecycle including discover, build, manage, run, engage, and measure with different products. This blog post focuses on the Platform’s core products: build with UiPath Studio, manage with UiPath Orchestrator and run with UiPath Robots.

About UiPath software

UiPath Enterprise RPA Platform’s core products are:

UiPath Studio and UiPath Robot are individual products, you can deploy each on a standalone machine.

UiPath Orchestrator contains Web Servers, SQL Server and Indexer Server (Elasticsearch), you can use Single Machine deployment, or Multi-Node deployment, depends on the workload capacity and availability requirements.

For information on UiPath platform offerings, review UiPath platform products.

UiPath on AWS

You can deploy all UiPath products on AWS.

  • UiPath Studio is needed for automation design jobs and runs on single machine. You deploy it with Amazon EC2.
  • UiPath Robots are needed for automation tasks, runs on a single machine, and scales with the business workload. You deploy it with Amazon EC2 and scale with Amazon EC2 Auto Scaling.
  • UiPath Orchestrator is needed for automation administration jobs and contains three logical components that run on multiple machines. You deploy Web Server with Amazon EC2, SQL Server with Amazon RDS, and Indexer Server with Amazon Elasticsearch Service. For Multi-Node deployment, you deploy High Availability Add-On with Amazon EC2.

The architecture of UiPath Enterprise RPA Platform on AWS looks like the following diagram:

Figure 1 - UiPath Enterprise RPA Platform on AWS

Figure 1 – UiPath Enterprise RPA Platform on AWS

By deploying the UiPath Enterprise RPA Platform on AWS, you can set up, operate, and scale workloads. This controls the infrastructure cost to meet process automation workloads.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • AWS resources
  • UiPath Enterprise RPA Platform software
  • Basic knowledge of Amazon EC2, EC2 Auto Scaling, Amazon RDS, Amazon Elasticsearch Service.
  • Basic knowledge to set up Windows Server, IIS, SQL Server, Elasticsearch.
  • Basic knowledge of Redis Enterprise to set up High Availability Add-on.
  • Basic knowledge of UiPath Studio, UiPath Robot, UiPath Orchestrator.

Deployment Steps

Deploy UiPath Studio
UiPath Studio deploys on a single machine. Amazon EC2 instances provide secure and resizable compute capacity in the cloud, and the ability to launch applications when needed without upfront commitments.

  1. Download the UiPath Enterprise RPA Platform. UiPath Studio is integrated in the installation package.
  2. Launch an EC2 instance with a Windows OS-based Amazon Machine Image (AMI) that meets the UiPath Studio hardware requirements and software requirements.
  3. Install the UiPath Studio software. For UiPath Studio installation steps, review the UiPath Studio Guide.

Optionally, you can save the installation and pre-configuration work completed for UiPath Studio as a custom Amazon Machine Image (AMI). Then, you can launch more UiPath Studio instances from this AMI. For details, visit Launch an EC2 instance from a custom AMI tutorial.

UiPath Robot Deployment

Each UiPath Robot deploys one single machine with Amazon EC2. Amazon EC2 Auto Scaling helps you add or remove Robots to meet automation workload changes in demand.

  1. Download the UiPath Enterprise RPA Platform. The UiPath Robot is integrated in the installation package.
  2. Launch an EC2 instance with a Windows OS based Amazon Machine Image (AMI) that meets the UiPath Robot hardware requirements and software requirements.
  3. Install the business application (Microsoft Office, SAP, etc.) required for your business processes. Alternatively, select the business application AMI from the AWS Marketplace.
  4. Install the UiPath Robot software. For UiPath Robot installation steps, review Installing the Robot.

Optionally, you can save the installation and pre-configuration work completed for UiPath Robot as a custom Amazon Machine Image (AMI). Then you can create Launch templates with instance configuration information. With launch template, you can create Auto Scaling groups from launch templates and scale the Robots.

Scale the Robots’ Capacity

Amazon EC2 Auto Scaling groups help you use scaling policies to scale compute capacity based on resource use. By monitoring the process queue and creating a customized scaling policy, the UiPath Robot can automatically scale based on the workload. For details, review Scaling the size of your Auto Scaling group.

Use the Robot Logs

UiPath Robot generates multiple diagnostic and execution logs. Amazon CloudWatch provides the log collection, storage, and analysis, and enables the complete visibility of the Robots and automation tasks. For CloudWatch agent setup on Robot, review Quick Start: Enable Your Amazon EC2 Instances Running Windows Server to Send logs to CloudWatch Logs.

Monitor the Automation Jobs

UiPath Robot uses the user interface to capture data and manipulate applications. When UiPath Robot runs, it is important to capture processing screens for troubleshooting and auditing usage. This screen capture activity can be integrated with process in conjunction with UiPath Studio.

Amazon S3 provides cost-effective storage for retaining all Robot logs and processing screen captures. Amazon S3 Object Lifecycle Management automates the transition between different storage classes, and helps you manage the screenshots so that they are stored cost effectively throughout their lifecycle. For lifecycle policy creation, review How Do I Create a Lifecycle Policy for an S3 Bucket?.

UiPath Orchestrator Deployment

Deployment Components
UiPath Orchestrator Server Platform has many logical components, grouped in three layers:

  • presentation layer
  • web service layer
  • persistence layer

The presentation layer and web service layer are built into one ASP.NET website. The persistence layer contains SQL Server and Elasticsearch. There are three deployment components to be set up:

  • web application
  • SQL Server
  • Elasticsearch

The Web Server, SQL Server, and Elasticsearch Server require multiple different environments. Review the hardware requirements and software requirements for more details.

Note: set up the Web Server, SQL Server, Elasticsearch Server environments before running the UiPath Enterprise Platform installation wizard.

Set up Web Server with Amazon EC2

UiPath Orchestrator Web Server deploys on Windows Server with IIS 7.5 or later. For details, review the software requirements.

AWS provides various AMIs for Windows Server that can help you set up the environment required for the Web Server.

The Microsoft Windows Server 2019 Base AMI includes most prerequisites for installation except some features of Web Server (IIS) to be enabled. For configuration steps, review Server Roles and Features.

The Web Server should be put in correct subnet (Public or Private) and have proper security group (HTTPS visits) according to the business requirements. Review Allow user to connect EC2 on HTTP or HTTPS.

Set up SQL Server with Amazon RDS

Amazon Relational Database Service (Amazon RDS) provides you with a managed database service. With a few clicks, you can set up, operate, and scale a relational database in the AWS Cloud.

Amazon RDS support SQL Server Engine. For UiPath Orchestrator, both Standard Edition and Enterprise Edition are supported. For details, review software requirements.

Amazon RDS can be set up in multiple Available Zones to meet requirements for high availability.

UiPath Orchestrator can connect to the created Amazon RDS database with SQL Server Authentication.

Set up Elasticsearch Server with Amazon Elasticsearch Service (Amazon ES)

Amazon ES is a fully managed service for you to deploy, secure, and operate Elasticsearch at scale with generally zero down time.

Elasticsearch Service provides a managed ELS stack, with no upfront costs or usage requirements, and without the operational overhead.

All messages logged by UiPath Robots are sent through the Logging REST endpoint to the Indexer Server where they are indexed for future utilization.

Install UiPath Orchestrator on the Web Server

After Web Server, SQL Server, Elasticsearch Server environment are ready, download the UiPath Enterprise RPA Platform, and install it on the Web Server.

The UiPath Enterprise Platform installation wizard guides you in configuring and setting up each environment, including connecting to SQL Server and configuring the Elasticsearch API URL.

After you complete setup, the UiPath Orchestrator Portal is available for you to visit and manage processes, jobs, and robots.

The UiPath Orchestrator dashboard appears like in the following screenshot:

Figure UiPath Orchestrator Portal

Figure 2- UiPath Orchestrator Portal

Set up Orchestrator High Availability Architecture

One Orchestrator can handle many robots in a typical configuration, but any product running on a single server is vulnerable to failure if something happens to that server.

The High Availability add-on (HAA) enables you to add a second Orchestrator server to your environment that is generally fully synchronized with the first server.

To set up multi-node deployment, launch Amazon EC2 instances with a Linux OS-based Amazon Machine Image (AMI) that meets the HAA hardware and software requirements. Follow the installation guide to set up HAA.

Elastic Load Balancing automatically distributes incoming application traffic across multiple targets. Network Load Balancer should be set up to allow Robots to communicate with multi-node Orchestrators.

Cleaning up

To avoid incurring future charges, delete all the resources.

Conclusion

In this post, I showed you how to deploy the UiPath Enterprise RPA Platform on AWS to further optimize and automate your business processes. AWS Managed Services like Amazon EC2, Amazon RDS, and Amazon Elasticsearch Service help you set up the environment with high availability. This reduces the maintenance effort of backend services, as well as scaling Orchestrator capabilities. Amazon EC2 Auto Scaling helps you add or remove robots to meet automation workload changes in demand.

Learn more about how to integrate UiPath with AWS services, check out The UiPath and AWS partnership.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Seamlessly Join a Linux Instance to AWS Directory Service for Microsoft Active Directory

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/seamlessly-join-a-linux-instance-to-aws-directory-service-for-microsoft-active-directory/

Many customers I speak to use Active Directory to manage centralized user authentication and authorization for a variety of applications and services. For these customers, Active Directory is a critical piece of their IT Jigsaws.

At AWS, we offer the AWS Directory Service for Microsoft Active Directory that provides our customers with a highly available and resilient Active Directory service that is built on actual Microsoft Active Directory. AWS manages the infrastructure required to run Active Directory and handles all of the patching and software updates needed. It’s fully managed, so for example, if a domain controller fails, our monitoring will automatically detect and replace that failed controller.

Manually connecting a machine to Active Directory is a thankless task; you have to connect to the computer, make a series of manual changes, and then perform a reboot. While none of this is particularly challenging, it does take time, and if you have several machines that you want to onboard, then this task quickly becomes a time sink.

Today the team is unveiling a new feature which will enable a Linux EC2 instance, as it is launched, to connect to AWS Directory Service for Microsoft Active Directory seamlessly. This complements the existing feature that allows Windows EC2 instances to seamlessly domain join as they are launched. This capability will enable customers to move faster and improves the experience for Administrators.

Now you can have both your Windows and Linux EC2 instances seamlessly connect to AWS Directory Service for Microsoft Active Directory. The directory can be in your own account or shared with you from another account, the only caveat being that both the instance and the directory must be in the same region.

To show you how the process works, let’s take an existing AWS Directory Service for Microsoft Active Directory and work through the steps required to have a Linux EC2 instance seamlessly join that directory.

Create and Store AD Credentials
To seamlessly join a Linux machine to my AWS Managed Active Directory Domain, I will need an account that has permissions to join instances into the domain. While members of the AWS Delegated Administrators have sufficient privileges to join machines to the domain, I have created a service account that has the minimum privileges required. Our documentation explains how you go about creating this sort of service account.

The seamless domain join feature needs to know the credentials of my active directory service account. To achieve this, I need to create a secret using AWS Secrets Manager with specifically named secret keys, which the seamless domain feature will use to join instances to the directory.

In the AWS Secrets Manager console I click on the Store a new secret button, on the next screen, when asked to Select a secret type, I choose the option named Other type of secrets. I can now add two secret key/values. The first is called awsSeamlessDomainUsername, and in the value textbox, I enter the username for my Active Directory service account. The Second key is called awsSeamlessDomainPassword, and here I enter the password for my service account.

Since this is a demo, I chose to use the DefaultEncryptionKey for the secret, but you might decide to use your own key.

After clicking next, I am asked to give the secret a name. I add the following name, replacing d-xxxxxxxxx with my directory ID.

aws/directory-services/d-xxxxxxxxx/seamless-domain-join

The domain join will fail if you mistype this name or if you have any leading or ending spaces.

I take note down the Secret ARN as I will need it when I create my IAM Policy.

Create The Required IAM Policy and Role
Now I need to create an IAM policy that gives permission to read my seamless-domain-join secret.

I sign in to the IAM console and choose Policies. In the content pane, I select Create policy. I switch over to the JSON tab and copy the text from the following JSON policy document, replacing the Secrets Manager ARN with the one I noted down earlier.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:############:secret:aws/directory-service/d-xxxxxxxxxx/seamless-domain-join-example"
            ]
        }
    ]
}

On the Review page, I name the policy SeamlessDomainJoin-Secret-Readonly then choose Create policy to save my work.

Now I need to create an IAM Role that will use this policy (and a few others). In the IAM Console, I choose Roles, and then in the content pane, choose to Create role. Under Select type of trusted entity, I select AWS service and then select EC2 as a use case and click Next:Permissions.


I attach the following policies to my Role: AmazonSSMManagedInstanceCore, AmazonSSMDirectoryServiceAccess, and SeamlessDomainJoin-Secret-Readonly.

I click through to the Review screen where it asks for a Role name, I call the role EC2DomainJoin, but it could be called whatever you like. I then create the role by pressing the button at the bottom right of the screen.

Create an Amazon Machine Image
When I launch a Linux Instance later I will need to pick a Linux Amazon Machine Image (AMI) as a template. Currently, the default Linux AMIs do not contain the version of AWS Systems Manager agent (SSM agent) that this new seamless domain feature needs. Therefore I am going to have to create an AMI with an updated SSM agent. To do this, I first create a new Linux Instance in my account and then connect to it using my SSH client. I then follow the documentation to update the SSM agent to 2.3.1644.0 or newer. Once the instance has finished updating I am then able to create a new AMI based on this instance using the following documentation.

I now have a new AMI which I can use in the next step. In the future, the base AMIs will be updated to use the newer SSM agent, and then we can skip this section. If you are interested to know what version of the SSM agent an instance is using this documentation explains how you can check.

Seamless Join
To start, I need to create a Linux instance, and so I head over to the EC2 console and choose Launch Instance.

Next, I pick a Linux Amazon Machine Image (AMI). I select the AMI which I created earlier.

When configuring the instance, I am careful to choose the Amazon Virtual Private Cloud that contains my directory. Using the drop-down labeled Domain join directory I am able to select the directory that I want this instance to join.

In the IAM role, I select the EC2DomainJoin role that I created earlier.

When I launch this instance, it will seamlessly join my directory. Once the instance comes online, I can confirm everything is working correctly by using SSH to connect to the instance using the administrator credentials of my AWS Directory Service for Microsoft Active Directory.

This new feature is available from today, and we look forward to hearing your feedback about this new capability.

Happy Joining

— Martin

[email protected] infectious disease research with Spot Instances

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/foldinghome-infectious-disease-research-with-spot-instances/

This post was contributed by Jarman Hauser, Jessie Xie, and Kinnar Kumar Sen.

[email protected] (FAH) is a distributed computing project that uses computational modeling to simulate protein structure, stability, and shape (how it folds). These simulations help to advance drug discoveries and cures for diseases linked to protein dynamics within human cells. The FAH software crowdsources its distributed compute platform allowing anyone to contribute by donating unused computational resources from personal computers, laptops, and cloud servers.

Overview

In this post, I walk through deploying EC2 Spot Instances, optimized for the latest [email protected] client software. I describe how to be flexible across a combination of GPU-optimized Amazon EC2 Spot Instances configured in an EC2 Auto Scaling group. The Auto Scaling group handles launching and maintaining a desired capacity, and automatically request resources to replace any that are interrupted or manually shut down.

Spot Instances are spare EC2 capacity available at up to a 90% discount compared to On-Demand Instance prices. The only difference between On-Demand Instance and Spot Instances is that Spot Instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back. This makes Spot Instances a great fit for stateless, fault-tolerant workloads like big data, containers, batch processing, AI/ML training, CI/CD and test/dev. For more information, see Amazon EC2 Spot Instances.

In addition to being flexible across instance types, another best practice for using Spot Instances effectively is to select the appropriate allocation strategy. Allocation strategies in EC2 Auto Scaling help you automatically provision capacity according to your workload requirements. We recommend that using the capacity optimized strategy to automatically provision instances from the most-available Spot Instance pools by looking at real-time capacity data. Because your Spot Instance capacity is sourced from pools with optimal capacity, this decreases the possibility that your Spot Instances are reclaimed. For more information about allocation strategies, see Spot Instances in the EC2 Auto Scaling user guide and configuring Spot capacity optimization in this user guide.

What you’ll build

Amazon CloudWatch instance metrics and logs for real-time monitoring of the protein folding progress.

Folding@home and CFn template architecture

Image 1.1 – Architectural diagram of deploying the AWS Cloud Formation template

To complete the setup, you must have an AWS account with permissions to the listed resources above. When you sign up for AWS, your AWS account is automatically signed up for all services in AWS, including Amazon EC2. If you don’t have an AWS account, find more info about creating an account here.

Costs and licensing

The AWS CloudFormation (CFn) template includes customizable configuration parameters. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you are using. Prices are subject to change. You are responsible for the cost of the AWS services used. There is no additional cost for using the CFn template.

Note: There is no additional charge to use Deep Learning AMIs — you pay only for the AWS resources while they’re running. [email protected] client software is a free, open-source software that is distributed under the [email protected] EULA.

Tip: After you deploy the AWS CloudFormation template, we recommend that you enable AWS Cost Explorer. Cost Explorer is an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage e.g. you can break down costs to show hourly costs for your protein folding project.

How to deploy

Part one: Download and configure the CFn template

First thing you must do is download, then make a few edits to the template.

Once downloaded, open the template file in your favorite text editor to make a few edits to the configuration before deploying.

In the User Information section, you have the option to create a unique user name, join or create a new team, or contribute anonymously. For this example, I leave the values set to default and contribute as an anonymous user, the default team. More details about teams and leaderboards can be found here and details about PASSKEYs here.

<!-- User Information -->
<team v="Default"/>
<user v="0"/> 
<passkey v="PASSKEY_HERE"/>

Once edited and saved to a location you can easily find later, in the next section you’ll learn how to upload the template in the AWS CloudFormation console.

Part two: Launching the stack

Next, log into the AWS Management Console, choose the Region you want to run the solution in, then navigate to AWS CloudFormation to launch the template.

In the AWS CloudFormation console, click on Create stack. Upload the template we just configured and click on Next to specify stack details.

Create stack

Enter a stack name and adjust the capacity parameters as needed. In this example I set the desiredCapacity and minSize at 2 to handle protein folding jobs assigned to the client, and then the maxSize set at 12. Setting your maxSize to 12 ensures you have capacity for larger jobs that get assigned. These parameters can be adjusted based on your desired capacity.

specify Stack details

If breaking out usage and cost data is required,, you can optionally add additional configurations like tags, permissions, stack policies, rollback options, and more advanced options in the next stack configuration step. Click Next to Review and then create the stack.

Under the Events tab, you can see the status of the AWS resources being created. When the status is CREATE_COMPLETE (approx. 3–5 minutes), the environment with [email protected] is installed and ready. Once the stack is created, the GPU instances will begin protein simulation.

Check status of deployed AWS resources

Results

The AWS CloudFormation template creates a log group “fahlog” that each of the instances send log data to. This allows you to visualize the protein folding progress in near real time via the Amazon CloudWatch console. To see the log data, navigate over to the Resources tab and click on the cloudWatchLogGroup link for ‘fahlog’. Alternatively, you can navigate to the Amazon CloudWatch console and choose ‘fahlog’ under log groups. Note: Sometimes it takes a bit of time for [email protected] Work Units (WU) to be downloaded in the instances and allocate all the available GPUs.

fahlog log group

In the CloudWatch console, check out the Insights feature in the left navigation menu to see analytics for your protein folding logs. Select ‘fahlog’ in the search box and run the default query that is provided for you in the query editor window to see your protein folding results.

fahlog insights

Another thing you can do is create a dashboard in the CloudWatch console to automatically refresh based on the time intervals you set. Under Dashboards in the left navigation bar, I was able to quickly create a few widgets to visualize CPU utilization, network in/out, and protein folding completed steps. This is a nifty tool that, with a little more time, you could configure more detailed metrics like cost per fold, and GPU monitoring.

CloudWatch metrics

Part three: Clean up

You can let this run as long as you want to contribute to this project. When you’re ready to stop, AWS CloudFormation gives us the option to delete the stack and resources created. On the AWS CloudFormation console, select the stack, and select delete. When you delete a stack, you delete the stack and all of its resources.

Conclusion

In this post, I shared how to launch a cluster of EC2 GPU-optimized Spot Instances to aid in [email protected]’s protein dynamics research that could lead to therapeutics for infectious diseases. I leveraged Spot best practices by being flexible with instance selections across multiple families, sizes, and Availability Zones, and by choosing the capacity-optimized allocation strategy to ensure our cluster scales optimally and securely. Now you are ready to donate compute capacity with Spot Instances to aid disease research efforts on [email protected]

About [email protected]

[email protected] is currently based at the Washington University School of Medicine in St. Louis, under the directorship of Dr. Greg Bowman. The project was started by the Pande Laboratory at Stanford University, under the direction of Dr. Vijay Pande, who led the project until 2019.[4] Since 2019, [email protected] has been led by Dr. Greg Bowman of Washington University in St. Louis, a former student of Dr. Pande, in close collaboration with Dr. John Chodera of MSKCC and Vince Voelz of Temple University.[5]

With heightened interest in the project, [email protected] has grown to a community of 2M+ users, bringing together the compute power of over 600K GPUs and 1.6M CPUs.

This outpouring of support has made [email protected] one of the world’s fastest computing systems – achieving speeds of approximately 1.2 exaFLOPS, or 2.3 x86 exaFLOPS, by April 9, 2020 – making it the world’s first exaFLOP computing system[email protected]‘s COVID-19 effort specifically focuses on better understanding how the viral protein’s moving parts enable to infect a human host, evade an immune response, and create new copies of the virus. The project is leveraging this insight to help design new therapeutic antibodies and small molecules that might prevent infection. They are engaged with a number of experimental collaborators to quickly iterate between computational design and experimental testing.

Quickly build STIG-compliant Amazon Machine Images using Amazon EC2 Image Builder

Post Syndicated from Sepehr Samiei original https://aws.amazon.com/blogs/security/quickly-build-stig-compliant-amazon-machine-images-using-amazon-ec2-image-builder/

In this post, we discuss how to implement the operating system security requirements defined by the Defence Information Systems Agency (DISA) Security Technical Implementation Guides (STIGs).

As an Amazon Web Services (AWS) customer, you can use Amazon Machine Images (AMIs) published by AWS or APN partners. These AMIs, which are owned and published by AWS, are pre-configured based on a variety of standards to help you quickly get started with your deployments while helping you follow your required compliance guidelines. For example, you can use AMIs that have been preconfigured for you per STIG standards. You can also use Amazon Elastic Compute Cloud (Amazon EC2) Image Builder to automate configuration of any custom images imported from an on-premises system.

Organizations of all sizes are moving more and more of their workloads to AWS. In most enterprises and organizations, often starting with an AMI with a known configuration is the best way to address the organization’s security requirements for operating system configuration. You can take advantage of the tools available in AWS to ensure this is a consistent and repeatable process.

If you want to use your own custom AMI, you can follow the steps in this post to see how to build a golden Windows operating system image that follows STIG compliance guidelines using Amazon EC2 Image Builder.

Image Builder

We understand that keeping server images hardened and up to date can be time consuming, resource intensive, and subject to human error if performed manually. Currently, customers either manually build the automation scripts to implement STIG security measures to harden the server image, or procure, run, and maintain tools to automate the process to harden the golden image.

Image Builder significantly reduces the effort of keeping images STIG-compliant and updated by providing a simple graphical interface, built-in automation to match the STIG requirements, and AWS-provided security settings. With Image Builder, there are no manual steps needed to update an image, nor do you have to build your own automation pipeline.

Customers can use Image Builder to build an operating system image for use with Amazon EC2, as well as on-premises systems. It simplifies the creation, maintenance, validation, sharing, and deployment of Linux and Windows Server images. This blog post discusses how to build a Windows Server golden image.

Image Builder is provided at no cost to customers and is available in all commercial AWS regions. You’re charged only for the underlying AWS resources that are used to create, store, and share the images.

What is a STIG?

STIGs are the configuration standards submitted by OS or software vendors to DISA for approval. Once approved, the configuration standards are used to configure security hardened information systems and software. STIGs contain technical guidance to help secure information systems or software that might otherwise be vulnerable to a malicious attack.

DISA develops and maintains STIGs and defines the vulnerability Severity Category Codes (CAT) which are referred to as CAT I, II, and III.

Severity category codeDISA category code guidelines
CAT IAny vulnerability, the exploitation of which will directly and immediately result in loss of confidentiality, availability, or integrity.
CAT IIAny vulnerability, the exploitation of which has a potential to result in loss of confidentiality, availability, or integrity.
CAT IIIAny vulnerability, the existence of which degrades measures to protect against loss of confidentiality, availability, or integrity.

For a complete list of STIGs, see Windows 2019, 2016, and 2012. How to View SRGs and STIGs provides instructions for viewing the lists.

Image Builder STIG components

To make your systems compliant with STIG standards, you must install, configure, and test a variety of security settings. Image Builder provides STIG components that you can leverage to quickly build STIG-compliant images on standalone servers by applying local Group Policies. The STIG components of Image Builder scan for misconfigurations and run a remediation script. Image Builder defines the STIG components as low, medium, and high, which align with DISA CAT I, II, and III respectively (with some exceptions as outlined in Windows STIG Components).

Building a golden Windows Server image using STIG-compliance guidelines

Image Builder can be used with the AWS Management Console, AWS CLI, or APIs to create images in your AWS account. In this example, we use AWS console that provides a step-by-step wizard covering the four steps to build a golden image that follows STIG compliance guidelines. A graphical representation outlining the process is provided below, followed by a description of the steps to build the image.
 

Figure 1: Image Builder Process

Figure 1: Image Builder Process

Step 1: Define the source image

The first step is to define the base OS image to use as the foundation layer of the golden image. You can select an existing image that you own, an image owned by Amazon, or an image shared with you.

Define image recipe

Open the console and search for Image Builder service. Under EC2 Image Builder, select Recipe on the left pane. Select the Create recipe button on the top right corner. Enter a Name, Version, and Description for your recipe, as shown in Figure 2.
 

Figure 2: Name and describe the image recipe

Figure 2: Name and describe the image recipe

Select source image

Select a source image for your golden image.

  1. Under Source image, select Windows as the image operating system.
  2. For this example, choose Select managed images. A managed image is an Image-Builder-managed image created by you, shared with you, or provided by AWS.
  3. Select Browse images to choose from available images. In the screenshot below, I’ve selected a Windows Server 2016 image provided by AWS.

 

Figure 3: Select source image

Figure 3: Select source image

Step 2: Build components

You can create your own components using scripts to add or remove software or define the OS configuration along with the required answer files, scripts, and settings from registered repositories and Amazon Simple Storage Service (Amazon S3) buckets. AWS provides pre-defined components for regular updates as well as security settings: for example, STIG, Amazon Inspector and more.

Select Browse build components and then select the STIG component that has the latest version or the one that meets your requirements. You can choose more than one component to perform the desired changes to your golden image as shown in the screenshot below.
 

Figure 4: Select build components

Figure 4: Select build components

Step 3: Select tests

You can define your own tests based on the level of the compliance required for your specific workload. You can also use AWS-provided tests to validate images before deployment. At the time of writing this blog AWS-provided tests do not include pre-canned tests to validate STIG configuration. For Windows, custom tests are written in PowerShell. In the screenshot below, I’ve added an AWS-provided test to validate Windows activation.
 

Figure 5: Select tests

Figure 5: Select tests

Once done, select Create Recipe.

Step 4: Create pipeline and distribute images

The last step triggers creation of the golden image and distributes the output AMI to selected AWS Regions and AWS accounts.

Create pipeline

  1. Select the recipe that we just created and select Create pipeline from this recipe from the Actions menu in the upper right corner.
     
    Figure 6: Select create pipeline from Actions menu

    Figure 6: Select create pipeline from Actions menu

  2. Enter a pipeline Name and Description. For the IAM role, you can use the dropdown menu to select an existing IAM role. The best practice is to use an IAM role with least privileges necessary for the task.
     
    Figure 7: Pipeline details

    Figure 7: Pipeline details

    If you don’t want to use an existing IAM role, select Create new instance profile role and refer to the user guide to create a role. In the screenshot below I’ve defined a custom policy called ImageBuilder-S3Logs for Amazon S3 to perform limited operations. You can use an AWS managed policy to grant write access to S3 or customise the policy to fit to your organisation’s security requirements. If you choose to customize the policy, the instance profile specified in your infrastructure configuration must have s3:PutObject permission for the target bucket. A sample Amazon S3 policy that grants write access to imagebuilderlog bucket is provided below for your convenience. Please change the bucket name if you are going to use the sample policy.
     

    Figure 8: IAM Policy for SSM

    Figure 8: IAM Policy for SSM

    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::imagebuilderlog/*",
                    "arn:aws:s3:::imagebuilderlog"
                ]
            }
        ]
    }
    

  3. Build a schedule to define the frequency at which the pipeline produces new images with the specific customisation defined in steps 1 through 3. You can either choose to run it manually or define a schedule using schedule builder or CRON expression in the Create Pipeline wizard.
  4. Infrastructure Settings will allow Image Builder to launch an Amazon EC2 instance to customise the image. This is an optional step; however, it’s recommended to configure the infrastructure settings. If you don’t provide an entry, AWS chooses service specific defaults. Infrastructure setting allows you to specify the infrastructure within which to build and test your image. You can specify instance type, subnet, security group to associate with the instance that Image Builder uses to capture and build the image.

    Image Builder requires communication with AWS Systems Manager (SSM) Service endpoint to capture and build the image. The communication can happen over public internet or using a VPC endpoint. In both cases, the Security Group must allow SSM Agent running on the instance to talk to Systems Manager. In this example, I’ve used SSM endpoint for the Image Builder instance to communicate with Systems Manager. This article provides details on how to configure endpoints and security group to allow SSM Agent communication with Systems Manager.
     

    Figure 9: Optional infrastructure settings

    Figure 9: Optional infrastructure settings

Distribute image

Image distribution is configured in the Additional Settings section of the wizard which also provides options to associate license configuration using AWS License Manager and assign a name and tags to the output AMI.

To distribute the image to another AWS Region, choose the target Region from the drop-down menu. The current Region is included by default. In addition, you can add AWS user account numbers to set launch permission for the output AMI. Launch permissions allow specific AWS user account(s) to launch the image in the current Region as well as other Regions.
 

Figure 10: Image distribution settings

Figure 10: Image distribution settings

Optionally, you can leverage AWS License Manager to track the use of licenses and assist in preventing a licensing breach. You can do so by associating the license configuration with the image. License configuration can be defined in License Manager. Finally, define the output AMI details.

Select Review to review the pipeline configuration, and then select Create Pipeline to create the pipeline.
 

Figure 11: Review pipeline configuration

Figure 11: Review pipeline configuration

Once the pipeline is created you can create the golden image instantly by selecting Run Pipeline under Actions. You could also configure a schedule to create a new golden image at regular time intervals. Scheduling the pipeline allows Image Builder to automate your ongoing image maintenance process, for example, monthly patch updates.

Optionally, you can select an Amazon Simple Notification Service (Amazon SNS) topic in the configuration section of the pipeline. This way you can get automated notification alerts on progress of your build pipeline. These notification enables you to build further automation in your operations. For example, you could trigger automatic redeployment of application using the most recent golden image.
 

Figure 12: Run pipeline

Figure 12: Run pipeline

Summary

In this post, we showed you how to build a custom Windows server golden image that you can leverage to follow STIG compliance guidelines using Amazon EC2 Image Builder. We used the EC2 Image Builder console to define the source OS image, define software and STIG component, configure test cases, create and schedule an image build pipeline and distribute the image to AWS user and region. Alternatively, you can leverage AMI published by AWS or APN partners to help meet your STIG compliance standards. More details on AWS published AMIs can be found in this link.

Image Builder is a powerful tool that is offered at no cost, other than the cost of the underlying AWS resources used to create, store, and share the images. It automates tasks associated with the creation and maintenance of security hardened server images. In addition, it offers pre-configured components for Windows and Linux that customers can leverage to meet STIG compliance requirements.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sepehr Samiei

Sepehr is a Senior Microsoft Tech Specialized Solutions Architect at AWS. He started his professional career as a .Net developer, which continued for more than 10 years. Early on, he quickly became a fan of cloud computing and he loves to help customers use the power of Microsoft tech on AWS. His wife and daughter are the most precious parts of his life.

Author

Garry Singh

Garry Singh is a solutions architect at AWS. He provides guidance and technical assistance to help customers achieve the best outcomes for Microsoft workloads on AWS.

TensorFlow Serving on Kubernetes with Amazon EC2 Spot Instances

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/tensorflow-serving-on-kubernetes-spot-instances/

This post is contributed by Kinnar Sen – Sr. Specialist Solutions Architect, EC2 Spot

TensorFlow (TF) is a popular choice for machine learning research and application development. It’s a machine learning (ML) platform, which is used to build (train) and deploy (serve) machine learning models. TF Serving is a part of TF framework and is used for deploying ML models in production environments. TF Serving can be containerized using Docker and deployed in a cluster with Kubernetes. It is easy to run production grade workloads on Kubernetes using Amazon Elastic Kubernetes Service (Amazon EKS), a managed service for creating and managing Kubernetes clusters. To cost optimize the TF serving workloads, you can use Amazon EC2 Spot Instances. Spot Instances are spare EC2 capacity available at up to a 90% discount compared to On-Demand Instance prices.

In this post I will illustrate deployment of TensorFlow Serving using Kubernetes via Amazon EKS and Spot Instances to build a scalable, resilient, and cost optimized machine learning inference service.

Overview

About TensorFlow Serving (TF Serving)

TensorFlow Serving is the recommended way to serve TensorFlow models. A flexible and a high-performance system for serving models TF Serving enables users to quickly deploy models to production environments. It provides out-of-box integration with TF models and can be extended to serve other kinds of models and data. TF Serving deploys a model server with gRPC/REST endpoints and can be used to serve multiple models (or versions). There are two ways that the requests can be served, batching individual requests or one-by-one. Batching is often used to unlock the high throughput of hardware accelerators (if used for inference) for offline high volume inference jobs.

Amazon EC2 Spot Instances

Spot Instances are spare Amazon EC2 capacity that enables customers to save up to 90% over On-Demand Instance prices. The price of Spot Instances is determined by long-term trends in supply and demand of spare capacity pools. Capacity pools can be defined as a group of EC2 instances belonging to particular instance family, size, and Availability Zone (AZ). If EC2 needs capacity back for On-Demand usage, Spot Instances can be interrupted by EC2 with a two-minute notification. There are many graceful ways to handle the interruption to ensure that the application is well architected for resilience and fault tolerance. This can be automated via the application and/or infrastructure deployments. Spot Instances are ideal for stateless, fault tolerant, loosely coupled and flexible workloads that can handle interruptions.

TensorFlow Serving (TF Serving) and Kubernetes

Each pod in a Kubernetes cluster runs a TF Docker image with TF Serving-based server and a model. The model contains the architecture of TensorFlow Graph, model weights and assets. This is a deployment setup with configurable number of replicas. The replicas are exposed externally by a service and an External Load Balancer that helps distribute the requests to the service endpoints. To keep up with the demands of service, Kubernetes can help scale the number of replicated pods using Kubernetes Replication Controller.

Architecture

There are a couple of goals that we want to achieve through this solution.

  • Cost optimization – By using EC2 Spot Instances
  • High throughput – By using Application Load Balancer (ALB) created by Ingress Controller
  • Resilience – Ensuring high availability by replenishing nodes and gracefully handling the Spot interruptions
  • Elasticity – By using Horizontal Pod Autoscaler, Cluster Autoscaler, and EC2 Auto Scaling

This can be achieved by using the following components.

ComponentRoleDetailsDeployment Method
Cluster AutoscalerScales EC2 instances automatically according to pods running in the clusterOpen sourceA deployment on On-Demand Instances
EC2 Auto Scaling groupProvisions and maintains EC2 instance capacityAWSAWS CloudFormation via eksctl
AWS Node Termination HandlerDetects EC2 Spot interruptions and automatically drains nodesOpen sourceA DaemonSet on Spot and On-Demand Instances
AWS ALB Ingress ControllerProvisions and maintains Application Load BalancerOpen sourceA deployment on On-Demand Instances

You can find more details about each component in this AWS blog. Let’s go through the steps that allow the deployment to be elastic.

  1. HTTP requests flows in through the ALB and Ingress object.
  2. Horizontal Pod Autoscaler (HPA) monitors the metrics (CPU / RAM) and once the threshold is breached a Replica (pod) is launched.
  3. If there are sufficient cluster resources, the pod starts running, else it goes into pending state.
  4. If one or more pods are in pending state, the Cluster Autoscaler (CA) triggers a scale up request to Auto Scaling group.
    1. If HPA tries to schedule pods more than the current size of what the cluster can support, CA can add capacity to support that.
  5. Auto Scaling group provision a new node and the application scales up
  6. A scale down happens in the reverse fashion when requests start tapering down.

AWS ALB Ingress controller and ALB

We will be using an ALB along with an Ingress resource instead of the default External Load Balancer created by the TF Serving deployment. The open source AWS ALB Ingress controller triggers the creation of an ALB and the necessary supporting AWS resources whenever a Kubernetes user declares an Ingress resource in the cluster. The Ingress resource uses the ALB to route HTTP(S) traffic to different endpoints within the cluster. ALB is ideal for advanced load balancing of HTTP and HTTPS traffic. ALB provides advanced request routing targeted at delivery of modern application architectures, including microservices and container-based applications. This allows the deployment to maintain a high throughput and improve load balancing.

Spot Instance interruptions

To gracefully handle interruptions, we will use the AWS node termination handler. This handler runs a DaemonSet (one pod per node) on each host to perform monitoring and react accordingly. When it receives the Spot Instance 2-minute interruption notification, it uses the Kubernetes API to cordon the node. This is done by tainting it to ensure that no new pods are scheduled there, then it drains it, removing any existing pods from the ALB.

One of the best practices for using Spot is diversification where instances are chosen from across different instance types, sizes, and Availability Zone. The capacity-optimized allocation strategy for EC2 Auto Scaling provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics, thus lowering the chance of interruptions.

Tutorial

Set up the cluster

We are using eksctl to create an Amazon EKS cluster with the name k8-tf-serving in combination with a managed node group. The managed node group has two On-Demand t3.medium nodes and it will bootstrap with the labels lifecycle=OnDemand and intent=control-apps. Be sure to replace <YOUR REGION> with the Region you are launching your cluster into.

eksctl create cluster --name=TensorFlowServingCluster --node-private-networking --managed --nodes=3 --alb-ingress-access --region=<YOUR REGION> --node-type t3.medium --node-labels="lifecycle=OnDemand,intent=control-apps" --asg-access

Check the nodes provisioned by using kubectl get nodes.

Create the NodeGroups now. You create the eksctl configuration file first. Copy the nodegroup configuration below and create a file named spot_nodegroups.yml. Then run the command using eksctl below to add the new Spot nodes to the cluster.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
    name: TensorFlowServingCluster
    region: <YOUR REGION>
nodeGroups:
    - name: prod-4vcpu-16gb-spot
      minSize: 0
      maxSize: 15
      desiredCapacity: 10
      instancesDistribution:
        instanceTypes: ["m5.xlarge", "m5d.xlarge", "m4.xlarge","t3.xlarge","t3a.xlarge","m5a.xlarge","t2.xlarge"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: capacity-optimized
      labels:
        lifecycle: Ec2Spot
        intent: apps
        aws.amazon.com/spot: "true"
      tags:
        k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
        k8s.io/cluster-autoscaler/node-template/label/intent: apps
      iam:
        withAddonPolicies:
          autoScaler: true
          albIngress: true
    - name: prod-8vcpu-32gb-spot
      minSize: 0
      maxSize: 15
      desiredCapacity: 10
      instancesDistribution:
        instanceTypes: ["m5.2xlarge", "m5n.2xlarge", "m5d.2xlarge", "m5dn.2xlarge","m5a.2xlarge", "m4.2xlarge"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: capacity-optimized
      labels:
        lifecycle: Ec2Spot
        intent: apps
        aws.amazon.com/spot: "true"
      tags:
        k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
        k8s.io/cluster-autoscaler/node-template/label/intent: apps
      iam:
        withAddonPolicies:
          autoScaler: true
          albIngress: true
eksctl create nodegroup -f spot_nodegroups.yml

A few points to note here, for more technical details refer to the EC2 Spot workshop.

  • There are two diversified node groups created with a fixed vCPU:Memory ratio. This adheres to the Spot best practice of diversifying instances, and helps the Cluster Autoscaler function properly.
  • Capacity-optimized Spot allocation strategy is used in both the node groups.

Once the nodes are created, you can check the number of instances provisioned using the command below. It should display 20 as we configured each of our two node groups with a desired capacity of 10 instances.

kubectl get nodes --selector=lifecycle=Ec2Spot | expr $(wc -l) - 1

The cluster setup is complete.

Install the AWS Node Termination Handler

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.3.1/all-resources.yaml

This installs the Node Termination Handler to both Spot Instance and On-Demand Instance nodes. This helps the handler responds to both EC2 maintenance events and Spot Instance interruptions.

Deploy Cluster Autoscaler

For additional detail, see the Amazon EKS page here. Next, export the Cluster Autoscaler into a configuration file:

curl -o cluster_autoscaler.yml https://raw.githubusercontent.com/awslabs/ec2-spot-workshops/master/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml

Open the file created and edit.

Add AWS Region and the cluster name as depicted in the screenshot below.

Run the commands below to deploy Cluster Autoscaler.

<div class="hide-language"><pre class="unlimited-height-code"><code class="lang-yaml">kubectl apply -f cluster_autoscaler.yml</code></pre></div><div class="hide-language"><pre class="unlimited-height-code"><code class="lang-yaml">kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"</code></pre></div>

Use this command to see into the Cluster Autoscaler (CA) logs to find NodeGroups auto-discovered. Use Ctrl + C to abort the log view.

kubectl logs -f deployment/cluster-autoscaler -n kube-system --tail=10

Deploy TensorFlow Serving

TensorFlow Model Server is deployed in pods and the model will load from the model stored in Amazon S3.

Amazon S3 access

We are using Kubernetes Secrets to store and manage the AWS Credentials for S3 Access.

Copy the following and create a file called kustomization.yml. Add the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY details in the file.

namespace: default
secretGenerator:
- name: s3-credentials
  literals:
  - AWS_ACCESS_KEY_ID=<<AWS_ACCESS_KEY_ID>>
  - AWS_SECRET_ACCESS_KEY=<<AWS_SECRET_ACCESS_KEY>>
generatorOptions:
  disableNameSuffixHash: true

Create the secret file and deploy.

kubectl kustomize . > secret.yaml
kubectl apply -f secret.yaml

We recommend to use Sealed Secret for production workloads, Sealed Secret provides a mechanism to encrypt a Secret object thus making it more secure. For further details please take a look at the AWS workshop here.

ALB Ingress Controller

Deploy RBAC Roles and RoleBindings needed by the AWS ALB Ingress controller.

kubectl apply -f

https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/rbac-role.yaml

Download the AWS ALB Ingress controller YAML into a local file.

curl -sS "https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/alb-ingress-controller.yaml" &gt; alb-ingress-controller.yaml

Change the –cluster-name flag to ‘TensorFlowServingCluster’ and add the Region details under – –aws-region. Also add the lines below just before the ‘serviceAccountName’.

nodeSelector:
    lifecycle: OnDemand

Deploy the AWS ALB Ingress controller and verify that it is running.

kubectl apply -f alb-ingress-controller.yaml
kubectl logs -n kube-system $(kubectl get po -n kube-system | grep alb-ingress | awk '{print $1}')

Deploy the application

Next, download a model as explained in the TF official documentation, then upload in Amazon S3.

mkdir /tmp/resnet

curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | \
tar --strip-components=2 -C /tmp/resnet -xvz

RANDOM_SUFFIX=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 10 | head -n 1)

S3_BUCKET="resnet-model-k8serving-${RANDOM_SUFFIX}"
aws s3 mb s3://${S3_BUCKET}
aws s3 sync /tmp/resnet/1538687457/ s3://${S3_BUCKET}/resnet/1/

Copy the following code and create a file named tf_deployment.yml. Don’t forget to replace <AWS_REGION> with the AWS Region you plan to use.

A few things to note here:

  • NodeSelector is used to route the TF Serving replica pods to Spot Instance nodes.
  • ServiceType LoadBalancer is used.
  • model_base_path is pointed at Amazon S3. Replace the <S3_BUCKET> with the S3_BUCKET name you created in last instruction set.
apiVersion: v1
kind: Service
metadata:
  labels:
    app: resnet-service
  name: resnet-service
spec:
  ports:
  - name: grpc
    port: 9000
    targetPort: 9000
  - name: http
    port: 8500
    targetPort: 8500
  selector:
    app: resnet-service
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: resnet-service
  name: resnet-v1
spec:
  replicas: 25
  selector:
    matchLabels:
      app: resnet-service
  template:
    metadata:
      labels:
        app: resnet-service
        version: v1
    spec:
      nodeSelector:
        lifecycle: Ec2Spot
      containers:
      - args:
        - --port=9000
        - --rest_api_port=8500
        - --model_name=resnet
        - --model_base_path=s3://<S3_BUCKET>/resnet/
        command:
        - /usr/bin/tensorflow_model_server
        env:
        - name: AWS_REGION
          value: <AWS_REGION>
        - name: S3_ENDPOINT
          value: s3.<AWS_REGION>.amazonaws.com
   - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: AWS_SECRET_ACCESS_KEY        
image: tensorflow/serving
        imagePullPolicy: IfNotPresent
        name: resnet-service
        ports:
        - containerPort: 9000
        - containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "2"
            memory: 2Gi

Deploy the application.

kubectl apply -f tf_deployment.yml

Copy the code below and create a file named ingress.yml.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "resnet-service"
  namespace: "default"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
  labels:
    app: resnet-service
spec:
  rules:
    - http:
        paths:
          - path: "/v1/models/resnet:predict"
            backend:
              serviceName: "resnet-service"
              servicePort: 8500

Deploy the ingress.

kubectl apply -f ingress.yml

Deploy the Metrics Server and Horizontal Pod Autoscaler, which scales up when CPU/Memory exceeds 50% of the allocated container resource.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
kubectl autoscale deployment resnet-v1 --cpu-percent=50 --min=20 --max=100

Load testing

Download the Python helper file written for testing the deployed application.

curl -o submit_mc_tf_k8s_requests.py https://raw.githubusercontent.com/awslabs/ec2-spot-labs/master/tensorflow-serving-load-testing-sample/python/submit_mc_tf_k8s_requests.py

Get the address of the Ingress using the command below.

kubectl get ingress resnet-service

Install a Python Virtual Env. and install the library requirements.

pip3 install virtualenv
virtualenv venv
source venv/bin/activate
pip3 install tqdm
pip3 install requests

Run the following command to warm up the cluster after replacing the Ingress address. You will be running a Python application for predicting the class of a downloaded image against the ResNet model, which is being served by the TF Serving rest API. You are running multiple parallel processes for that purpose. Here “p” is the number of processes and “r” the number of requests for each process.

python submit_mc_tf_k8s_requests.py -p 100 -r 100 -u 'http://<INGRESS ADDRESS>:80/v1/models/resnet:predict'


You can use the command below to observe the scaling of the cluster.

kubectl get hpa -w

We ran the above again with 10,000 requests per process as to send 1 million requests to the application. The results are below:

The deployment was able to serve ~300 requests per second with an average latency of ~320 ms per requests.

Cleanup

Now that you’ve successfully deployed and ran TensorFlow Serving using Ec2 Spot it’s time to cleanup your environment. Remove the ingress, deployment, ingress-controller.

kubectl delete -f ingress.yml
kubectl delete -f tf_deployment.yml
kubectl delete -f alb-ingress-controller.yaml

Remove the model files from Amazon S3.

aws s3 rb s3://${S3_BUCKET}/ --force 

Delete the node groups and the cluster.

eksctl delete nodegroup -f spot_nodegroups.yml --approve
eksctl delete cluster --name TensorFlowServingCluster

Conclusion

In this blog, we demonstrated how TensorFlow Serving can be deployed onto Spot Instances based on a Kubernetes cluster, achieving both resilience and cost optimization. There are multiple optimizations that can be implemented on TensorFlow Serving that will further optimize the performance. This deployment can be extended and used for serving multiple models with different versions. We hope you consider running TensorFlow Serving using EC2 Spot Instances to cost optimize the solution.

New – Amazon EC2 Instances based on AWS Graviton2 with local NVMe-based SSD storage

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-graviton2-instance-types-c6g-r6g-and-their-d-variant/

A few weeks ago, I wrote a post to announce the new AWS Graviton2 Amazon Elastic Compute Cloud (EC2) instance type, the M6g. Since then, hundreds of customers have observed significant cost-performance benefits. These include Honeycomb.io, SmugMug, Redbox, and Valnet Inc.

On June 11, we announced two new families of instances based on AWS Graviton2 processors: the C6g and R6g, in addition of the existing M6g. The M family instances are work horses intended to address a broad array of general-purpose workloads such as application servers, gaming servers, midsize databases, caching fleets, web tier, and the likes. The C family of instances is well suited for compute-intensive workloads, such as high performance computing (HPC), batch processing, ad serving, video encoding, gaming, scientific modeling, distributed analytics, and CPU-based machine learning inference. The R family of instances is well suited for memory-intensive workloads, such as open-source databases, in-memory caches, or real-time big data analysis. To learn more about these instance types and why they are relevant for your workloads, you can hear more about them in this video from James Hamilton, VP & Distinguished Engineer at AWS.

Today, I have additional news to share with you: we are adding a “d” variant to all three families. The M6gd, C6gd, and R6gd instance types have NVM Express (NVMe) locally attached SSD drives, up to 2 x 1.9 TB. They offer 50% more storage GB/vCPU compared to M5d, C5d, and R5d instances. These are a great fit for applications that need access to high-speed, low latency local storage including those that need temporary storage of data for scratch space, temporary files, and caches. The data on an instance store volume persists only during the life of the associated EC2 instance.

Instance types based on Graviton2 processors deliver up to 40% better price performance than their equivalent x86-based M5, C5, and R5 families.

All AWS Graviton2 based instances are built on the AWS Nitro System, a collection of AWS-designed hardware and software that allows the delivery of isolated multi-tenancy, private networking, and fast local storage. These instances provide up to 19 Gbps Amazon Elastic Block Store (EBS) bandwidth and up to 25 Gbps network bandwidth.

Partner ecosystem support for these Arm-based AWS Graviton2 instances is robust, from Linux distributions (Amazon Linux 2, CentOS, Debian, Fedora, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Ubuntu, or FreeBSD), to language runtimes (Java with Amazon Corretto, Node.js, Python, Go,…), container services (Docker, Amazon Elastic Container Service, Amazon Elastic Kubernetes Service, Amazon Elastic Container Registry), agents (Amazon CloudWatch, AWS Systems Manager, Amazon Inspector), developer tools (AWS Code Suite, Jenkins, GitLab, Chef, Drone.io, Travis CI), and security & monitoring solutions (such as Datadog, Dynatrace, Crowdstrike, Qualys, Rapid7, Tenable, or Honeycomb.io).

Here is a table to summarize the technical characteristics of each instance type in these families.

Instance SizevCPUMemory
(GiB)
Local Storage
“d” variant
Network Bandwidth
(Gbps)
EBS Bandwidth
(Mbps)
M6g / R6g / C6gM6g / R6g / C6gM6gd / R6gd / C6gd
medium14/ 8 / 21 x 59 GB NVMeUp to 10Up to 4,750
large28 / 16 / 41 x 118 GB NVMeUp to 10Up to 4,750
xlarge416 / 32 / 81 x 237 GB NVMeUp to 10Up to 4,750
2xlarge832 / 64 / 161 x 474 GB NVMeUp to 10Up to 4,750
4xlarge1664 / 128 / 321 x 950 GB NVMeUp to 104,750
8xlarge32128 / 256 / 641 x 1900 GB NVMe129,000
12xlarge48192/ 384 / 962 x 1425 GB NVMe2013,500
16xlarge64256 / 512 / 1282 x 1900 GB NVMe2519,000
metal64256 / 512 / 1282 x 1900 GB NVMe2519,000

These M6g, C6g, and R6g families of instances are available for you today in the following AWS Regions: US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). Their disk-based variants M6gd, C6gd, and R6gd are available in US East (N. Virginia), US West (Oregon), US East (Ohio) and Europe (Ireland) AWS Regions.

If you are optimizing applications for Arm architecture, be sure to have a look at our Getting Started collection of resources or learn more about AWS Graviton2 based EC2 instances.

Let us know your feedback!

— seb

Must-know best practices for Amazon EBS encryption

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/must-know-best-practices-for-amazon-ebs-encryption/

This blog post covers common encryption workflows on Amazon EBS. Examples of these workflows are: setting up permissions policies, creating encrypted EBS volumes, running Amazon EC2 instances, taking snapshots, and sharing your encrypted data using customer-managed CMK.

Introduction

Amazon Elastic Block Store (Amazon EBS) service provides high-performance block-level storage volumes for Amazon EC2 instances. Customers have been using Amazon EBS for over a decade to support a broad range of applications including relational and non-relational databases, containerized applications, big data analytics engines, and many more. For Amazon EBS, security is always our top priority. One of the most powerful mechanisms we provide you to secure your data against unauthorized access is encryption.

Amazon EBS offers a straight-forward encryption solution of data at rest , data in transit, and all volume backups. Amazon EBS encryption is supported by all volume types, and includes built-in key management infrastructure without having you to build, maintain, and secure your own keys. We use AWS Key Management Service (AWS KMS) envelope encryption with customer master keys (CMK) for your encrypted volumes and snapshots. We also offer an easy way to ensure all your newly created Amazon EBS resources are always encrypted by simply selecting encryption by default. This means you no longer need to write IAM policies to require the use of encrypted volumes. All your new Amazon EBS volumes are automatically encrypted at creation.

You can choose from two types of CMKs: AWS managed and customer managed. AWS managed CMK is the default on Amazon EBS (unless you explicitly override it), and does not require you to create a key or manage any policies related to the key. Any user with EC2 permission in your account is able to encrypt/decrypt EBS resources encrypted with that key. If your compliance and security goals require more granular control over who can access your encrypted data- customer-managed CMK is the way to go.

In the following section, I dive into some best practices with your customer-managed CMK to accomplish your encryption workflows.

Defining permissions policies

To get started with encryption, using your own customer-manager CMK, you first need to create the CMK and set up the policies needed. For simplicity, I use a fictitious account ID 111111111111 and an AWS KMS customer master key (CMK) named with the alias cmk1 in Region us-east-1.
As you go through this post, be sure to change the account ID and the AWS KMS CMK to match your own.

  1. Log on to AWS Management Console with admin user. Navigate to AWS KMS service, and create a new KMS key in the desired Region.

kms console screenshot

      2. Go to the AWS Identity and Access Management (IAM) console and navigate to policies console. On create policy wizard, click on the JSON tab, and add the following policy:

{

    "Version": "2012-10-17",

    "Statement": [

            {

        "Sid": "VisualEditor0",

        "Effect": "Allow",

        "Action": [

            "kms:GenerateDataKeyWithoutPlaintext",

            "kms:ReEncrypt*",

            "kms:CreateGrant"

            ],

            "Resource": [

            "arn:aws:kms:us-east-1:<111111111111>:key/<key-id of cmk1>"

             ]

     }

  ]

}
  1. Go to IAM Users, click on Add permissions and Attach existing policies directly. Select the preceding policy you created along with AmazonEC2FullAccess policy.

You now have all the necessary policies to start encrypting data with you own CMK on Amazon EBS.

Enabling encryption by default

Encryption by default allows you to ensure that all new EBS volumes created in your account are always encrypted, even if you don’t specify encrypted=true request parameter. You have the option to choose the default key to be AWS managed or a key that you create. If you use IAM policies that require the use of encrypted volumes, you can use this feature to avoid launch failures that would occur if unencrypted volumes were inadvertently referenced when an instance is launched. Before turning on encryption by default, make sure to go through some of the limitations in the consideration section at the end of this blog.

Use the following steps to opt in to encryption by default:

  1. Logon to EC2 console in the AWS Management Console.
  2. Click on Settings- Amazon EBS encryption on the right side of the Dashboard console (note: settings are specific to individual AWS regions in your account).
  3. Check the box Always Encrypt new EBS volumes.
  4. By default, AWS managed key is used for Amazon EBS encryption. Click on Change the default key and select your desired key. In this blog, the desired key is cmk1.
  5. You’re done! Any new volume created from now on will be encrypted with the KMS key selected in the previous step.

Creating encrypted Amazon EBS volumes

To create an encrypted volume, simply go to Volumes under Amazon EBS in your EC2 console, and click Create Volume.

Then, select your preferred volume attributes and mark the encryption flag. Choose your designated master key (CMK) and voila- your volume is encrypted!

If you turned on encryption by default in the previous section, the encryption option is already selected and grayed out. Similarly, in the AWS CLI, your volume is always encrypted regardless if you set encrypted=True, and you can override the default encryption key by specifying a different one. The following image shows:

encryption and master key

Launching instances with encrypted volumes

When launching an EC2 instance, you can easily specify encryption with your CMK even if the Amazon Machine Image (AMI) you selected is not encrypted.

Follow the steps in the Launch Wizard under EC2 console, and select your CMK in the Add Storage section. If you previously set encryption by default, you see your selected default key, which can be changed to any other key of your choice as the following image shows:

adding encrypted storage to instance
Alternatively, using RunInstances API/CLI, you can provide the kmsKeyID for encrypting the volumes that are created from the AMI by specifying encryption in the block device mapping (BDM) object. If you don’t specify the kmsKeyID in BDM but set the encryption flag to “true”, then your default encryption key will be used for encrypting the volume. If you turned on encryption by default- any RunInstance call will result in encrypted volume, even if you haven’t set encryption flag to “true.”

For more detailed information on launch encrypted EBS-backed EC2 instances see this blog.

Auto Scaling Groups and Spot Instances

When you specify a customer-managed CMK, you must give the appropriate service-linked role access to the CMK so that EC2 Auto Scaling / Spot Instances can launch instances on your behalf (AWSServiceRoleForEC2Spot / AWSServiceRoleForAutoScaling). To do this, you must modify the CMK’s key policy. For more information, click here.

Creating and sharing encrypted snapshots

Now that you’ve launched an instance and have some encrypted EBS volumes, you may want to create snapshots to back up the data on your volumes. Whenever you create a snapshot from an encrypted volume, the snapshot is always be encrypted with the same key you provided for the volume. Other than create-snapshot permission, users do not need any additional key policy setting for creating encrypted snapshots.

Sharing encrypted snapshots

If you want another account at your org to create a volume from that snapshot (for use cases such as test/dev accounts, disaster recovery (DR) etc.), you can take that encrypted snapshot and share it with different accounts. To do that you need create a policy setting for the source (111111111111) and target (222222222222) accounts.

In the source account, complete the following steps:

  1. Select snapshots at the EC2 console.
  2. Click Actions- Modify Permissions
  3. Add the AWS Account Number of your target account
  4. Go to AWS KMS console and select the KMS key associated with your Snapshot
  5. In Other AWS accounts section click on Add other AWS Account and add the target account

Target account:
Users in the target account have several options with the shared snapshot. They can launch an instance directly or copy the snapshot to the target account. You can use the same CMK as in the original account (cmk1), or re-encrypt it with a different CMK.

I recommend that you re-encrypt the snapshot using a CMK owned by the target account. This protects you if the original CMK is compromised, or if the owner revokes permissions, which could cause you to lose access to any encrypted volumes that you created using the snapshot.
When re-encrypt with a different CMK (cmk2 in this example), you only need ReEncryptFrom permission on cmk1 (source). Also, make sure you have the required permissions on your target account for cmk2.

The following JSON policy document shows an example of these permissions:

{

    "Version": "2012-10-17",

    "Statement": [

    {

    "Effect": "Allow",

    "Action": [

            "kms:ReEncryptFrom"

            ],

    "Resource": [

    "arn:aws:kms:us-east-1:<111111111111>:key/<key-id of cmk1>"

    ]

  }

 ]

} ,

{

    "Version": "2012-10-17",

    "Statement": [

    {

        "Effect": "Allow",

        "Action": [

            "kms:GenerateDataKeyWithoutPlaintext",

            "kms:ReEncrypt*",

            "kms:CreateGrant"

        ],

        "Resource": [

        "arn:aws:kms:us-east-1:<222222222222>:key/<key-id of cmk2>"

        ]

   }

  ]

}

You can now select snapshots at the EC2 console in the target account. Locate the snapshot by ID or description.

If you want to copy the snapshot, you also must allow “kms:Describekey” policy. Keep in mind that changing the encryption status of a snapshot during a copy operation results in a full (not incremental) copy, which might incur greater data transfer and storage charges.

 

The same sharing capabilities can be apply to sharing AMI. Check out this blog for more information.

Considerations

  • A few old instance types don’t support Amazon EBS encryption. You won’t be able to launch new instances in the C1, M1, M2, or T1 families.
  • You won’t be able to share encrypted AMIs publicly, and any AMIs you share across accounts need access to your chosen KMS key.
  • You won’t be able to share snapshots / AMI if you encrypt with AWS managed CMK
  • Amazon EBS snapshots will encrypt with the key used by the volume itself.
  • The default encryption settings are per-region. As are the KMS keys.
  • Amazon EBS does not support asymmetric CMKs. For more information, see Using Symmetric and Asymmetric Keys

Conclusion

In this blog post, I discussed several best practices to use Amazon EBS encryption with your customer-managed CMK, which gives you more granular control to meet your compliance goals. I started with the policies needed, covered how to create encrypted volumes, launch encrypted instances, create encrypted backup, and share encrypted data. Now that you are an encryption expert – go ahead and turn on encryption by default so that you’ll have the peace of mind your new volumes are always encrypted on Amazon EBS. To learn more, visit the Amazon EBS landing page.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.

 

 

How to enable X11 forwarding from Red Hat Enterprise Linux (RHEL), Amazon Linux, SUSE Linux, Ubuntu server to support GUI-based installations from Amazon EC2

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/how-to-enable-x11-forwarding-from-red-hat-enterprise-linux-rhel-amazon-linux-suse-linux-ubuntu-server-to-support-gui-based-installations-from-amazon-ec2/

This post was written by Sivasamy Subramaniam, AWS Database Consultant.

In this post, I discuss enabling X11 forwarding from Red Hat Enterprise Linux (RHEL), Amazon Linux, SUSE Linux, Ubuntu servers running on Amazon EC2. This is helpful for system and database administrators, and application teams that want to perform software installations on Amazon EC2 using GUI method. This blog provides detailed steps around SSH and x11 tools, various network and operating system (OS) level settings, and best practices to achieve the X11 forwarding on Amazon EC2 when installing databases like Oracle using GUI.

There are several techniques to connect Amazon EC2 instances to manage OS level configurations. You typically use SSH clients (such as putty or SSH client) to establish the connection from the Windows or Mac-based laptop or bastion hosts or jump servers to connect with Amazon EC2 instances running OS. During the application installation or configuration, the application might require you to install software such as an Oracle database or a third-party database using GUI methods. This article talks about steps that must be done in order to forward the X11 screen to your highly secure Windows OS-based based bastion host.

 

Prerequisites

To complete this walkthrough the following is required:

 

  • Ensure that you have a bastion host running on Amazon EC2 with Windows OS. This OS must have access to the EC2 machines running Linux such as RHEL, Amazon Linux, SUSE Linux, and Ubuntu servers. If not, please configure a bastion host using Windows operating system with needed SSH access via port 22 to EC2 instance running Linux-based operating systems.
  • I recommend having Windows-based hosts in the same Availability Zone or Region as the EC2 Linux hosts that you plan to connect and forward X11 to. This is to avoid any high latency in X11 forwarding during your application installations.
  • Install tools such as putty and Xming in the Windows based bastion host that you want to SSH to Linux EC2 host and X11 forwarding.
  • Install Quartz if you want to redirect X11 to macOS. Quartz is package used in Mac for display management. To start using X11 forwarding to your Mac, use the -X switch. In other words, the SSH command looks like this “SSH -X -i “<ssh private key file name>” <user_name>@<ip-address>”. You must log out and log in back after installation of Quartz work properly.
  • In order to securely configure or install putty, refer to the section Configuring ssh-agent on Windows in the blog post Securely Connect to Linux Instances Running in a Private Amazon VPC.
  • I don’t recommend doing the X11 forwarding to the laptop because it’s not secure, and you must resolve latency issues if user is located in different Region than the EC2 instance hosted Region.
  • You may need sudo permission to run X11 forwarding commands as a root user in order to complete the setup.

 

Solution

Connect to your EC2 instance using SSH client, and perform following setup as needed.

Step 1: Verify or install required X11 packages.

To find out OS information and release, install required X11 packages and xclock or xterm. Installing xclock or xterm packages are optional as this is installed in this post to test the X11 forwarding.

  • List OS information and release with following command:

sudo cat /etc/os-release

  • List and install X11 packages with following command based on your operating system release and version:

Amazon Linux:

sudo yum list installed '*X11*'

sudo yum install xclock xterm

sudo yum install xorg-x11-xauth.x86_64 -y

sudo yum list installed '*X11*'

Red Hat Enterprise Linux:

sudo yum list installed '*X11*'

sudo yum install xterm

sudo yum install xorg-x11-xauth

sudo yum list installed '*X11*'

Note: The xorg-x11-apps package has been provided in the CodeReady Linux Builder Repository for RHEL8. So, I skipped installing this package, which has xclock and I used only xterm to test the X11 forwarding.

SUSE:

sudo zypper install xclock

sudo zypper install xauth

Ubuntu:

sudo apt list installed '*X11*'

sudo apt install x11-apps

sudo apt list installed '*X11*'

 

Step 2: Verify and configure X11 forwarding

  • On the Linux server, edit sshd_config file, set X11Forwarding parameter to yes, and restart the sshd service:

sudo  cat /etc/ssh/sshd_config |grep -i X11Forwarding

  • Enable “X11Forwarding yes” if this is set to “no”  and restart sshd service.

sudo  vi /etc/ssh/sshd_config

sudo  cat /etc/ssh/sshd_config |grep -i X11Forwarding

  • X11Forwarding yes

x110 forwarding yes and no

  • To Restart sshd:

sudo service sshd status

 

Step 3: Configure putty and Xming to perform X11 forwarding connect and verify X11 forwarding

Log in to your Windows bastion host. Then, open a fresh PuTTY session, and use a private key or password-based authentication per your organization setup. Then, test the xclock or xterm command to see x11 forwarding in action.

  • Click the xming utility you installed on Windows bastion host and have it running.

click on xming icon

  • Select Session from the Category pane on left. Set Host Name as your private IP, port 22 and Connection Type as SSH. Please note that you use the Private IP of EC2 instance later when you connect inside from the VPC/network.

putty config host ip

  • Go to Connection, and click Then, set Auto-login username as ec2-user, ubuntu, or whichever user you are allowed to logging in as.
  • Go to Connection, select SSH, and then click Then, click on Browse to select the private key generated earlier If you are using key based authentication.
  • Go to Connection, select SSH, and then click on Then, select enable X11 forwarding.
  • Set X display location as localhost:0.0

putty config with x11 forwarding

  • Go back to Session and click on Save after creating a session name in Saved session.

 Now that you set up PuTTY, xming and configuring the x11 settings, you can click on load button and then Open button. This opens up a new SSH terminal with x11 forwarding enabled. Now, I move on to the testing X11 forwarding.

  • Test the X11 from the use you logged in:

Example:

xauth list

export DISPLAY=localhost:10.0

xclock or xterm

X11 forwarding on Mac

  • It is simple to start using X11 forwarding to your Mac, use the -X switch.

Example:

“SSH -X -i "<ssh private key file name>" <user_name>@<ip-address>

xclock or xterm

 

You see that xclock or xterm window opened similar to the one following. This means your x11 forwarding setup working as expected, and you can start using GUI-based application installation or configuration by running the installer or configuration tools.

xclock to demonstrate this is correct

Step 4: Configure the EC2 Linux session to forward X11 if you are switching to different user after login to run GUI-based installation / commands

In this example: ec2-user is the user logged in with SSH and then switched to oracle user.

  • From the Logged User to identify the xauth details:

xauth list

env|grep DISPLAY

xauth list | grep unix`echo $DISPLAY | cut -c10-12` > /tmp/xauth

ll /tmp/xauth ; cat /tmp/xauth

  • Switch to the user where you want to run GUI-based installation or tools:

sudo su - oracle

xauth add `cat /tmp/xauth`

xauth list

env|grep DISPLAY

export DISPLAY=localhost:10.0

xclock

You see xclock or xterm window opened similar to the one following. This means your x11 forwarding setup working as expected even after switched to different user. You can start using GUI-based  application installation or configuration by running the installer or configuration tools.

another xclock to show the forwarding worked  

Conclusion

In this blog, I demonstrated how to configure Amazon EC2 instances running on various Linux-based operating systems to forward X11 to the Windows bastion host. This is helpful to any application installation that requires GUI-based installation methods. This is also helpful to any bastion hosts that provide highly secure and low latency environments to perform SSH related operations including GUI-based installations as this do not require any additional network configuration other than opening the port 22 for standard SSH authentication. Please try this walkthrough for yourself, and leave any comments following!

 

 

Executing Ansible playbooks in your Amazon EC2 Image Builder pipeline

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/executing-ansible-playbooks-in-your-amazon-ec2-image-builder-pipeline/

This post is contributed by Andrew Pearce – Sr. Systems Dev Engineer, AWS

Since launching Amazon EC2 Image Builder, many customers say they want to re-use existing investments in configuration management technologies (such as Ansible, Chef, or Puppet) with Image Builder pipelines.

In this blog, I walk through creating a document that can execute an Ansible playbook in an Image Builder pipeline. I then use the document to create an Image Builder component that can be added to an Image Builder image recipe. In addition to this blog, you can find further samples in the amazon-ec2-image-builder-samples GitHub repository.

Ansible playbook requirements

There are likely some changes required so your existing Ansible playbooks can work in Image Builder.

When executing Ansible within Image Builder, the playbook must be configured to execute using the localhost. In a traditional Ansible environment, the control server executes playbooks against remote hosts using a protocol like SSH or WinRM. In Image Builder, the host executing the playbook is also the host that must be configured.

There are a number of ways to accomplish this in Ansible. In this example, I set the value of hosts to 127.0.0.1, gather_facts to false, and connection to local. These values set execution at the localhost, prevent gathering Ansible facts about remote hosts, and force local execution.

In this walk through, I use the following sample playbook. This installs Apache, configures a default webpage, and ensures that Apache is enabled and running.

---
- name: Apache Hello World
  hosts: 127.0.0.1
  gather_facts: false
  connection: local
  tasks:
    - name: Install Apache
      yum:
        name: httpd
        state: latest
    - name: Create a default page
      shell: echo "<h1>Hello world from EC2 Image Builder and Ansible</h1>" > /var/www/html/index.html
    - name: Enable Apache
      service: name=httpd enabled=yes state=started

Creating the Ansible document

This example uses the build phase to install Ansible, download the playbook from an S3 bucket, execute the playbook and cleanup the system. It also ensures that Apache is returning the correct content in both the validate and test phases.

I use Amazon Linux 2 as the target operating system, and assume that the playbook is already uploaded to my S3 bucket.

Document design diagram

Document Design Diagram

To start, I must specify the top-level properties for the document. I specify a name and description to describe the document’s purpose, and a schemaVersion, which at the time of writing is 1.0. I also include a build phase, as I want this document to be used when building the image.

name: 'Ansible Playbook Execution on Amazon Linux 2'
description: 'This is a sample component that demonstrates how to download and execute an Ansible playbook against Amazon Linux 2.'
schemaVersion: 1.0
phases:
  - name: build
    steps:

The first required step is to install Ansible. With Amazon Linux 2, I can install Ansible using Amazon Linux Extras, so I use the ExecuteBash action to perform the work.

     - name: InstallAnsible
        action: ExecuteBash
        inputs:
          commands:
           - sudo amazon-linux-extras install -y ansible2

Once Ansible is installed, I must download the playbook for execution. I use the S3Download action to download the playbook and store it in a temporary location. Note that the S3Download action accepts a list of inputs, so a single S3Download step could download multiple files.

     - name: DownloadPlaybook
        action: S3Download
        inputs:
          - source: 's3://mybucket/my-playbook.yml'
            destination: '/tmp/my-playbook.yml'

Now Ansible is installed and the playbook is downloaded. I use the ExecuteBinary action to invoke Ansible and perform the work described in the playbook. This step uses a feature of the component management application called chaining.

Chaining allows you to reference the input or output values of a step in subsequent steps. In the following example, {{build.DownloadPlaybook.inputs[0].destination}} passes the string of the first “destination” property value in the DownloadPlaybook step, which executes in the build phase. The “[0]” indicates the first value in the array (position zero).

For more information about how chaining works, review the Component Manager documentation. In this example, I refer to the S3Download destination path to avoid mistyping the value, which causes the build to fail.

      - name: InvokeAnsible
        action: ExecuteBinary
        inputs:
          path: ansible-playbook
          arguments:
            - '{{build.DownloadPlaybook.inputs[0].destination}}'

Finally, I clean up the system, so I use another ExecuteBash action combined with chaining to delete the downloaded playbook.

      - name: DeletePlaybook
        action: ExecuteBash
        inputs:
          commands:
            - rm '{{build.DownloadPlaybook.inputs[0].destination}}'

Now, I have installed Ansible, downloaded the playbook, executed it, and cleaned up the system. The next step is to add a validate phase where I use curl to ensure that Apache responds with the desired content. Including a validate phase allows you to identify build issues more quickly. This fails the build before Image Builder creates an AMI and moves to the test phase.

In the validate phase, I use grep to confirm that Apache returned a value containing my required string. If the command fails, a non-zero exit code is returned, indicating a failure. This failure is returned to Image Builder and the image build is marked as “Failed”.

  - name: validate
    steps:
      - name: ValidateResponse
        action: ExecuteBash
        inputs:
          commands:
            - curl -s http://127.0.0.1 | grep "Hello world from EC2 Image Builder and Ansible"

After the validate phase executes, the EC2 instance is stopped and an Amazon Machine Image (AMI) is created. Image Builder moves to the test stage where a new EC2 instance is launched using the AMI that was created in the previous step. During this stage, the test phase of a document is executed.

To ensure that Apache is still functioning as required, I add a test phase to the document using the same curl command as the validate phase. This ensures that Apache is started and still returning the correct content.

  - name: test
    steps:
      - name: ValidateResponse
        action: ExecuteBash
        inputs:
          commands:
            - curl -s http://127.0.0.1 | grep "Hello world from EC2 Image Builder and Ansible"

Here is the complete document:

name: 'Ansible Playbook Execution on Amazon Linux 2'
description: 'This is a sample component that demonstrates how to download and execute an Ansible playbook against Amazon Linux 2.'
schemaVersion: 1.0
phases:
  - name: build
    steps:
      - name: InstallAnsible
        action: ExecuteBash
        inputs:
          commands:
           - sudo amazon-linux-extras install -y ansible2
      - name: DownloadPlaybook
        action: S3Download
        inputs:
          - source: 's3://mybucket/playbooks/my-playbook.yml'
            destination: '/tmp/my-playbook.yml'
      - name: InvokeAnsible
        action: ExecuteBinary
        inputs:
          path: ansible-playbook
          arguments:
            - '{{build.DownloadPlaybook.inputs[0].destination}}'
      - name: DeletePlaybook
        action: ExecuteBash
        inputs:
          commands:
            - rm '{{build.DownloadPlaybook.inputs[0].destination}}'
  - name: validate
    steps:
      - name: ValidateResponse
        action: ExecuteBash
        inputs:
          commands:
            - curl -s http://127.0.0.1 | grep "Hello world from EC2 Image Builder and Ansible"
  - name: test
    steps:
      - name: ValidateResponse
        action: ExecuteBash
        inputs:
          commands:
            - curl -s http://127.0.0.1 | grep "Hello world from EC2 Image Builder and Ansible"

I now have a document that installs Ansible, downloads a playbook from an S3 bucket, and invokes Ansible to perform the work described in the playbook. I am also validating the installation was successful by ensuring Apache is returning the desired content.

Next, I use this document to create a component in Image Builder and assign the component to an image recipe. The image recipe could then be used when creating an image pipeline. The blog post Automate OS Image Build Pipelines with EC2 Image Builder provides a tutorial demonstrating how to create an image pipeline with the EC2 Image Builder console. For more information, review Getting started with EC2 Image Builder.

Conclusion

In this blog, I walk through creating a document that can execute Ansible playbooks for use in Image Builder. I define the required changes for Ansible playbooks and when each phase of the document would execute. I also note the amazon-ec2-image-builder-samples GitHub repository, which provides sample documents for executing Ansible playbooks and Chef recipes.

We continue to use this repository for publishing sample content. We hope Image Builder is providing value and making it easier for you to build virtual machine (VM) images. As always, keep sending us your feedback!

Best practices for handling EC2 Spot Instance interruptions

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/best-practices-for-handling-ec2-spot-instance-interruptions/

This post is contributed by Scott Horsfield – Sr. Specialist Solutions Architect, EC2 Spot

Amazon EC2 Spot Instances are spare compute capacity in the AWS Cloud available to you at steep discounts compared to On-Demand Instance prices. The only difference between an On-Demand Instance and a Spot Instance is that a Spot Instance can be interrupted by Amazon EC2 with two minutes of notification when EC2 needs the capacity back. Fortunately, there is a lot of spare capacity available, and handling interruptions in order to build resilient workloads with EC2 Spot is simple and straightforward. In this post, I introduce and walk through several best practices that help you design your applications to be fault tolerant and resilient to interruptions.

By using EC2 Spot Instances, customers can access additional compute capacity between 70%-90% off of On-Demand Instance pricing. This allows customers to run highly optimized and massively scalable workloads that would not otherwise be possible. These benefits make interruptions an acceptable trade-off for many workloads.

When you follow the best practices, the impact of interruptions is insignificant because interruptions are infrequent and don’t affect the availability of your application. At the time of writing this blog post, less than 5% of Spot Instances are interrupted by EC2 before being terminated intentionally by a customer, because they are automatically handled through integrations with AWS services.

Many applications can run on EC2 Spot Instances with no modifications, especially applications that already adhere to modern software development best practices such as microservice architecture. With microservice architectures, individual components are stateless, scalable, and fault tolerant by design. Other applications may require some additional automation to properly handle being interrupted. Examples of these applications include those that must gracefully remove themselves from a cluster before termination to avoid impact to availability or performance of the workload.

In this post, I cover best practices around handing interruptions so that you too can access the massive scale and steep discounts that Spot Instances can provide.

Architect for fault tolerance and reliability

Ensuring your applications are architected to follow modern software development best practices is the first step in successfully adopting Spot Instances. Software development best practices, such as graceful degradation, externalizing state, and microservice architecture already enforce the same best practices that can allow your workloads to be resilient to Spot interruptions.

A great place to start when designing new workloads, or evaluating existing workloads for fault tolerance and reliability, is the Well-Architected Framework. The Well-Architected Framework is designed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization — the Framework provides a consistent approach for customers to evaluate architectures, and implement designs that will scale over time.

The Well-Architected Tool is a free tool available in the AWS Management Console. Using this tool, you can create self-assessments to identify and correct gaps in your current architecture that might affect your toleration to Spot interruptions.

Use Spot integrated services

Some AWS services that you might already use, such as Amazon ECS, Amazon EKS, AWS Batch, and AWS Elastic Beanstalk have built-in support for managing the lifecycle of EC2 Spot Instances. These services do so through integration with EC2 Auto Scaling. This service allows you to build fleets of compute using On-Demand Instance and Spot Instances with a mixture of instance types and launch strategies. Through these services, tasks such as replacing interrupted instances are handled automatically for you. For many fault-tolerant workloads, simply replacing the instance upon interruption is enough to ensure the reliability of your service. For advanced use cases, EC2 Auto Scaling groups provide notifications, auto-replacement of interrupted instances, lifecycle hooks, and weights. These more advanced features allow for more control by the user over the composition and lifecycle of your compute infrastructure.

Spot-integrated services automate processes for handling interruptions. This allows you to stay focused on building new features and capabilities, and avoid the additional cost that custom automation may accrue over time.

Most examples in this post use EC2 Auto Scaling groups to demonstrate best practices because of the built-in integration with other AWS services. Also, Auto Scaling brings flexibility when building scalable fault-tolerant applications with EC2 Spot Instances. However, these same best practices can also be applied to other services such as Amazon EMR, EC2 fleet, and your own custom automation and frameworks.

Choose the right Spot Allocation Strategy

It is a best practice to use the capacity-optimized Spot Allocation Strategy when configuring your EC2 Auto Scaling group. This allows Auto Scaling groups to launch instances from Spot Instance pools with the most available capacity. Since Spot Instances can be interrupted when EC2 needs the capacity back, launching instances optimized for available capacity is a key best practice for reducing the possibility of interruptions.

You can simply select the Spot allocation strategy when creating new Auto Scaling groups, or modifying an existing Auto Scaling group from the console.

capacity-optimized allocation

Choose instance types across multiple instance sizes, families, and Availability Zones

It is important that the Auto Scaling group has a diverse set of options to choose from so that services launch instances optimally based on capacity. You do this by configuring the Auto Scaling group to launch instances of multiple sizes, and families, across multiple Availability Zones. Each instance type of a particular size, family, and Availability Zone in each Region is a separate Spot capacity pool. When you provide the Auto Scaling group and the capacity-optimized Spot Allocation Strategy a diverse set of Spot capacity pools, your instances are launched from the deepest pools available.

When you create a new Auto Scaling group, or modify an existing Auto Scaling group, you can specify a primary instance type and secondary instance types. The Auto Scaling group also provides recommended options. The following image shows what this configuration looks like in the console.

instance type selection

When using the recommended options, the Auto Scaling group is automatically configured with a diverse list of instance types across multiple instance families, generations, or sizes. We recommend leaving a many of these instance types in place as possible.

Handle interruption notices

For many workloads, the replacement of interrupted instances from a diverse set of instance choices is enough to maintain the reliability of your application. In other cases, you can gracefully decommission an application on an instance that is being interrupted. You can do this by knowing that an instance is going to be interrupted and responding through automation to react to the interruption. The good news is, there are several ways you can capture an interruption warning, which is published two minutes before EC2 reclaims the instance.

Inside an instance

The Instance Metadata Service is a secure endpoint that you can query for information about an instance directly from the instance. When a Spot Instance interruption occurs, you can retrieve data about the interruption through this service. This can be useful if you must perform an action on the instance before the instance is terminated, such as gracefully stopping a process, and blocking further processing from a queue. Keep in mind that any actions you automate must be completed within two minutes.

To query this service, first retrieve an access token, and then pass this token to the Instance Metadata Service to authenticate your request. Information about interruptions can be accessed through http://169.254.169.254/latest/meta-data/spot/instance-action. This URI returns a 404 response code when the instance is not marked for interruption. The following code demonstrates how you could query the Instance Metadata Service to detect an interruption.

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" –v http://169.254.169.254/latest/meta-data/spot/instance-action

If the instance is marked for interruption, you receive a 200 response code. You also receive a JSON formatted response that includes the action that is taken upon interruption (terminate, stop or hibernate) and a time when that action will be taken (ie: the expiration of your 2-minute warning period).

{"action": "terminate", "time": "2020-05-18T08:22:00Z"}

The following sample script demonstrates this pattern.

#!/bin/bash

TOKEN=`curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`

while sleep 5; do

    HTTP_CODE=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s -w %{http_code} -o /dev/null http://169.254.169.254/latest/meta-data/spot/instance-action)

    if [[ "$HTTP_CODE" -eq 401 ]] ; then
        echo 'Refreshing Authentication Token'
        TOKEN=`curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30"`
    elif [[ "$HTTP_CODE" -eq 200 ]] ; then
        # Insert Your Code to Handle Interruption Here
    else
        echo 'Not Interrupted'
    fi

done

Developing automation with the Amazon EC2 Metadata Mock

When developing automation for handling Spot Interruptions within an EC2 Instance, it’s useful to mock Spot Interruptions to test your automation. The Amazon EC2 Metadata Mock (AEMM) project allows for running a mock endpoint of the EC2 Instance Metadata Service, and simulating Spot interruptions. You can use AEMM to serve a mock endpoint locally as you develop your automation, or within your test environment to support automated testing.

1. Download the latest Amazon EC2 Metadata Mock binary from the project repository.

2. Run the Amazon EC2 Metadata Mock configured to mock Spot Interruptions.

amazon-ec2-metadata-mock spotitn --instance-action terminate --imdsv2 

3. Test the mock endpoint with curl.

TOKEN=`curl -X PUT "http://localhost:1338/latest/api/token" -H "X-aws-ec2-  metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" –v http://localhost:1338/latest/meta-data/spot/instance-action

With default configuration, you receive a response for a simulated Spot Interruption with a time set to two minutes after the request was received.

{"action": "terminate", "time": "2020-05-13T08:22:00Z"}

Outside an instance

When a Spot Interruption occurs, a Spot instance interruption notice event is generated. You can create rules using Amazon CloudWatch Events or Amazon EventBridge to capture these events, and trigger a response such as invoking a Lambda Function. This can be useful if you need to take action outside of the instance to respond to an interruption, such as graceful removal of an interrupted instance from a load balancer to allow in-flight requests to complete, or draining containers running on the instance.

The generated event contains useful information such as the instance that is interrupted, in addition to the action (terminate, stop, hibernate) that is taken when that instance is interrupted. The following example event demonstrates a typical EC2 Spot Instance Interruption Warning.

{
    "version": "0",
    "id": "12345678-1234-1234-1234-123456789012",
    "detail-type": "EC2 Spot Instance Interruption Warning",
    "source": "aws.ec2",
    "account": "123456789012",
    "time": "yyyy-mm-ddThh:mm:ssZ",
    "region": "us-east-2",
    "resources": ["arn:aws:ec2:us-east-2:123456789012:instance/i-1234567890abcdef0"],
    "detail": {
        "instance-id": "i-1234567890abcdef0",
        "instance-action": "action"
    }
}

Capturing this event is as simple as creating a new event rule that matches the pattern of the event, where the source is aws.ec2 and the detail-type is EC2 Spot Interruption Warning.

aws events put-rule --name "EC2SpotInterruptionWarning" \
    --event-pattern "{\"source\":[\"aws.ec2\"],\"detail-type\":[\"EC2 Spot Interruption Warning\"]}" \  
    --role-arn "arn:aws:iam::123456789012:role/MyRoleForThisRule"

When responding to these events, it’s common to retrieve additional details about the instance that is being interrupted, such as Tags. When triggering additional API calls in response to Spot Interruption Warnings, keep in mind that AWS APIs implement Request Throttling. So, ensure that you are implementing exponential backoffs and retries as needed and using paginated API calls.

For example, if multiple EC2 Spot Instances were interrupted, you could aggregate the instance-ids by temporarily storing them in DynamoDB and then combine the instance-ids into a single DescribeInstances API call. This allows you to retrieve details about multiple instances rather than implementing individual DescribeInstances API calls that may exceed API limits and result in throttling.

The following script demonstrates describing multiple EC2 Instances with a reusable paginator that can accept API-specific arguments. Additional examples can be found here.

import boto3

ec2 = boto3.client('ec2')

def paginate(method, **kwargs):
    client = method.__self__
    paginator = client.get_paginator(method.__name__)
    for page in paginator.paginate(**kwargs).result_key_iters():
        for item in page:
            yield item

def describe_instances(instance_ids):
    described_instances = []

    response = paginate(ec2.describe_instances, InstanceIds=instance_ids)

    for item in response:
        described_instances.append(item)

    return described_instances

if __name__ == "__main__":

    instance_ids = ['instance1','instance2', '...']

    print(describe_instances(instance_ids))

Container services

When running containerized workloads, it’s common to want to let the container orchestrator know when a node is going to be interrupted. This allows for the node to be marked so that the orchestrator knows to stop placing new containers on an interrupted node, and drain running containers off of the interrupted node. Amazon EKS and Amazon ECS are two popular services that manage containers on AWS, and each have simple integrations that allow for handling interruptions.

Amazon EKS/Kubernetes

With Kubernetes workloads, including self-managed clusters and those running on Amazon EKS, use the AWS maintained AWS Node Termination Handler to monitor for Spot Interruptions and make requests to the Kubernetes API to mark the node as non-schedulable. This project runs as a Daemonset on your Kubernetes nodes. In addition to handling Spot Interruptions, it can also be configured to handle Scheduled Maintenance Events.

You can use kubectl to apply the termination handler with default options to your cluster with the following command.

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.5.0/all-resources.yaml

Amazon ECS

With Amazon ECS workloads you can enable Spot Instance draining by passing a configuration parameter to the ECS container agent. Once enabled, when a container instance is marked for interruption, ECS receives the Spot Instance interruption notice and places the instance in DRAINING status. This prevents new tasks from being scheduled for placement on the container instance. If there are container instances in the cluster that are available, replacement service tasks are started on those container instances to maintain your desired number of running tasks.

You can enable this setting by configuring the user data on your container instances to apply this the setting at boot time.

#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=MyCluster
ECS_ENABLE_SPOT_INSTANCE_DRAINING=true
ECS_CONTAINER_STOP_TIMEOUT=120s
EOF

Inside a Container Running on Amazon ECS

When an ECS Container Instance is interrupted, and the instance is marked as DRAINING, running tasks are stopped on the instance. When these tasks are stopped, a SIGTERM signal is sent to the running task, and ECS waits up to 2 minutes before forcefully stopping the task, resulting in a SIGKILL signal sent to the running container. This 2 minute window is configurable through a stopTimeout container timeout option, or through ECS Agent Configuration, as shown in the prior code, giving you flexibility within your container to handle the interruption. If you set this value to be greater than 120 seconds, it will not prevent your instance from being interrupted after the 2 minute warning. So, I recommend setting to be less than or equal to 120 seconds.

You can capture the SIGTERM signal within your containerized applications. This allows you to perform actions such as preventing the processing of new work, checkpointing the progress of a batch job, or gracefully exiting the application to complete tasks such as ensuring database connections are properly closed.

The following example Golang application shows how you can capture a SIGTERM signal, perform cleanup actions, and gracefully exit an application running within a container.

package main

import (
    "os"
    "os/signal"
    "syscall"
    "time"
    "log"
)

func cleanup() {
    log.Println("Cleaning Up and Exiting")
    time.Sleep(10 * time.Second)
    os.Exit(0)
}

func do_work() {
    log.Println("Working ...")
    time.Sleep(10 * time.Second)
}

func main() {
    log.Println("Application Started")

    signalChannel := make(chan os.Signal)

    interrupted := false
    for interrupted == false {

        signal.Notify(signalChannel, os.Interrupt, syscall.SIGTERM)
        go func() {
            sig := <-signalChannel
            switch sig {
                case syscall.SIGTERM:
                    log.Println("SIGTERM Captured - Calling Cleanup")
                    interrupted = true
                    cleanup()
            }
        }()

        do_work()
    }

}

EC2 Spot Instances are a great fit for containerized workloads because containers can flexibly run on instances from multiple instance families, generations, and sizes. Whether you’re using Amazon ECS, Amazon EKS, or your own self-hosted orchestrator, use the patterns and tools discussed in this section to handle interruptions. Existing projects such as the AWS Node Termination Handler are thoroughly tested and vetted by multiple customers, and remove the need for investment in your own customized automation.

Implement checkpointing

For some workloads, interruptions can be very costly. An example of this type of workload is if the application needs to restart work on a replacement instance, especially if your applications needs to restart work from the beginning. This is common in batch workloads, sustained load testing scenarios, and machine learning training.

To lower the cost of interruption, investigate patterns for implementing checkpointing within your application. Checkpointing means that as work is completed within your application, progress is persisted externally. So, if work is interrupted, it can be restarted from where it left off rather than from the beginning. This is similar to saving your progress in a video game and restarting from the last save point vs. the start of the level. When working with an interruptible service such as EC2 Spot, resuming progress from the latest checkpoint can significantly reduce the cost of being interrupted, and reduce any delays that interruptions could otherwise incur.

For some workloads, enabling checkpointing may be as simple as setting a configuration option to save progress externally (Amazon S3 is often a great place to store checkpointing data). For example, when running an object detection model training job on Amazon SageMaker with Managed Spot Training, enabling checkpointing is as simple as passing the right configuration to the training job.

The following example demonstrates how to configure a SageMaker Estimator to train an object detection model with checkpointing enabled. You can access the complete example notebook here.

od_model = sagemaker.estimator.Estimator(
    image_name=training_image,
    base_job_name='SPOT-object-detection-checkpoint',
    role=role, 
    train_instance_count=1, 
    train_instance_type='ml.p3.2xlarge',
    train_volume_size = 50,
    train_use_spot_instances=True,
    train_max_wait=3600,
    train_max_run=3600,
    input_mode= 'File',
    checkpoint_s3_uri="s3://{}/{}".format(s3_checkpoint_path, uuid.uuid4()),
    output_path=s3_output_location,
    sagemaker_session=sess)

For other workloads, enabling checkpointing may require extending your custom framework, or a public frameworks to persist data externally, and then reload that data when an instance is replaced and work is resumed.

For example, MXNet, a popular open source library for deep-learning, has a mechanism for executing Custom Callbacks during the execution of deep learning jobs. Custom Callbacks can be invoked during the completion of each training epoch, and can perform custom actions such as saving checkpointing data to S3. Your deep-learning training container could then be configured to look for any saved checkpointing data once restarted. Then load that training data locally so that training can be continued from the latest checkpoint.

By enabling checkpointing, you ensure that your workload is resilient to any interruptions that occur. Checkpointing can help to minimize data loss and the impact of an interruption on your workload by saving state and recovering from a saved state. When using existing frameworks, or developing your own framework, look for ways to integrate checkpointing so that your workloads have the flexibility to run on EC2 Spot Instances.

Track the right metrics

The best practices discussed in this blog can help you build reliable and scalable architectures that are resilient to Spot Interruptions. Architecting for fault tolerance is the best way you can ensure that your applications are resilient to interruptions, and is where you should focus most of your energy. Monitoring your services is important because it helps ensure that best practices are being effectively implemented. It’s critical to pick metrics that reflect availability and reliability, and not get caught up tracking metrics that do not directly correlate to service availability and reliability.

Some customers choose to track Spot Interruptions, which can be useful when done for the right reasons, such as evaluating tolerance to interruptions or conducing testing. However, frequency of interruption, or number of interruptions of a particular instance type, are examples of metrics that do not directly reflect the availability or reliability of your applications. Since Spot interruptions can fluctuate dynamically based on overall Spot Instance availability and demand for On-Demand Instances, tracking interruptions often results in misleading conclusions.

Rather than tracking interruptions, look to track metrics that reflect the true reliability and availability of your service including:

Conclusion

The best practices we’ve discussed, when followed, can help you run your stateless, scalable, and fault tolerant workloads at significant savings with EC2 Spot Instances. Architect your applications to be fault tolerant and resilient to failure by using the Well-Architected Framework. Lean on Spot Integrated Services such as EC2 Auto Scaling, AWS Batch, Amazon ECS, and Amazon EKS to handle the provisioning and replacement of interrupted instances. Be flexible with your instance selections by choosing instance types across multiple families, sizes, and Availability Zones. Use the capacity-optimized allocation strategy to launch instances from the Spot Instance pools with the most available capacity. Lastly, track the right metrics that represent the true availability of your application. These best practices help you safely optimize and scale your workloads with EC2 Spot Instances.

What is a cyber range and how do you build one on AWS?

Post Syndicated from John Formento, Jr. original https://aws.amazon.com/blogs/security/what-is-cyber-range-how-do-you-build-one-aws/

In this post, we provide advice on how you can build a current cyber range using AWS services.

Conducting security incident simulations is a valuable exercise for organizations. As described in the AWS Security Incident Response Guide, security incident response simulations (SIRS) are useful tools to improve how an organization handles security events. These simulations can be tabletop sessions, individualized labs, or full team exercises conducted using a cyber range.

A cyber range is an isolated virtual environment used by security engineers, researchers, and enthusiasts to practice their craft and experiment with new techniques. Traditionally, these ranges were developed on premises, but on-prem ranges can be expensive to build and maintain (and do not reflect the new realities of cloud architectures).

In this post, we share concepts for building a cyber range in AWS. First, we cover the networking components of a cyber range, then how to control access to the cyber range. We also explain how to make the exercise realistic and how to integrate your own tools into a cyber range on AWS. As we go through each component, we relate them back to AWS services.

Using AWS to build a cyber range provides flexibility since you only pay for services when in use. They can also be templated to a certain degree, to make creation and teardown easier. This allows you to iterate on your design and build a cyber range that is as identical to your production environment as possible.

Networking

Designing the network architecture is a critical component of building a cyber range. The cyber range must be an isolated environment so you have full control over it and keep the live environment safe. The purpose of the cyber range is to be able to play with various types of malware and malicious code, so keeping it separate from live environments is necessary. That being said, the range should simulate closely or even replicate real-world environments that include applications on the public internet, as well as internal systems and defenses.

There are several services you can use to create an isolated cyber range in AWS:

  • Amazon Virtual Private Cloud (Amazon VPC), which lets you provision a logically isolated section of AWS. This is where you can launch AWS resources in a virtual network that you define.
    • Traffic mirroring is an Amazon VPC feature that you can use to copy network traffic from an elastic network interface of Amazon Elastic Compute Cloud (Amazon EC2) instances.
  • AWS Transit Gateway, which is a service that enables you to connect your Amazon VPCs and your on-premises networks to a single gateway.
  • Interface VPC endpoints, which enable you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink. You’re able to do this without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

Amazon VPC provides the fundamental building blocks to create the isolated software defined network for your cyber range. With a VPC, you have fine grained control over IP CIDR ranges, subnets, and routing. When replicating real-world environments, you want the ability to establish communications between multiple VPCs. By using AWS Transit Gateway, you can add or subtract an environment from your cyber range and route traffic between VPCs. Since the cyber range is isolated from the public internet, you need a way to connect to the various AWS services without leaving the cyber range’s isolated network. VPC endpoints can be created for the AWS services so they can function without an internet connection.

A network TAP (test access point) is an external monitoring device that mirrors traffic passing between network nodes. A network TAP can be hardware or virtual; it sends traffic to security and monitoring tools. Though its unconventional in a typical architecture, routing all traffic through an EC2 Nitro instance enables you to use traffic mirroring to provide the network TAP functionality.

Accessing the system

Due to the isolated nature of a cyber range, administrators and participants cannot rely on typical tools to connect to resources, such as the secure shell (SSH) protocol and Microsoft remote desktop protocol (RDP). However, there are several AWS services that help achieve this goal, depending on the type of access role.

There are typically two types of access roles for a cyber range: an administrator and a participant. The administrator – sometimes called the Black Team – is responsible for designing, building, and maintaining the environment. The participant – often called the Red Team (attacking), Blue Team (defending), or Purple Team (both) – then plays within the simulated environment.

For the administrator role, the following AWS services would be useful:

  • Amazon EC2, which provides scalable computing capacity on AWS.
  • AWS Systems Manager, which gives you visibility into and control of your infrastructure on AWS

The administrator can use Systems Manager to establish SSH, RDP, or run commands manually or on a schedule. Files can be transferred in and out of the environment using a combination of S3 and VPC endpoints. Amazon EC2 can host all the web applications and security tools used by the participants, but are managed by the administrators with Systems Manager.

For the participant role, the most useful AWS service is Amazon WorkSpaces, which is a managed, secure cloud desktop service.

The participants are provided with virtual desktops that are contained in the isolated cyber range. Here, they either initiate an attack on resources in the environment or defend the attack. With Amazon WorkSpaces, participants can use the same operating system environment they are accustomed to in the real-world, while still being fully controlled and isolated within the cyber range.

Realism

A cyber range must be realistic enough to provide a satisfactory experience for its participants. This realism must factor tactics, techniques, procedures, communications, toolsets, and more. Constructing a cyber range on AWS enables the builder full control of the environment. It also means the environment is repeatable and auditable.

It is important to replicate the toolsets that are used in real life. By creating re-usable “Golden” Amazon Machine Images (AMIs) that contain tools and configurations that would typically be installed on machines, a builder can easily slot the appropriate systems into an environment. You can also use AWS PrivateLink – a feature of Amazon VPC – to establish connections from your isolated environment to customer-defined outside tools.

To emulate the tactics of an adversary, you can replicate a functional copy of the internet that is scaled down. Using private hosted zones in Amazon Route 53, you can use a “.” record to control name resolution for the entire “internet”. Alternatively, you can use a Route 53 resolver to forward all requests for a specific domain to your name servers. With these strategies, you can create adversaries that act from known malicious domains to test your defense capabilities. You can also use Amazon VPC route tables to send all traffic to a centralized web server under your control. This web server can respond as variety of websites to emulate the internet.

Logging and monitoring

It is important for responders to exercise using the same tools and techniques that they would use in the real world. AWS VPC traffic mirroring is an effective way to utilize many IDS products. Our documentation provides guidance for using popular open source tool such as Zeek and Suricata. Additionally, responders can leverage any network monitoring tools that support VXLAN.

You can interact with the tools outside of the VPC by using AWS PrivateLink. PrivateLink provides secure connectivity to resources outside of a network, without the need for firewall rules or route tables. PrivateLink also enables integration with many AWS Partner offerings.

You can use Amazon CloudWatch Logs to aggregate operating system logs, application logs, and logs from AWS resources. You can also easily share CloudWatch Logs with a dedicated security logging account.

In addition, if there are third party tools that you currently use, you can leverage the AWS Marketplace to easily procure the specific tools and install within your AWS account.

Summary

In this post, I covered what a cyber range is, the value of using one, and how to think about creating one using AWS services. AWS provides a great platform to build a cost effective cyber range that can be used to bolster your security practices.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

John Formento, Jr.

John is a Solutions Architect at Amazon Web Services. He helps large enterprises achieve their goals by architecting secure and scalable solutions on the AWS Cloud. John holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and DevOps Engineer – Professional.

Author

Adam Cerini

Adam is a Senior Solutions Architect with Amazon Web Services. He focuses on helping Public Sector customers architect scalable, secure, and cost effective systems. Adam holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Serverless Architecture for a Web Scraping Solution

Post Syndicated from Dzidas Martinaitis original https://aws.amazon.com/blogs/architecture/serverless-architecture-for-a-web-scraping-solution/

If you are interested in serverless architecture, you may have read many contradictory articles and wonder if serverless architectures are cost effective or expensive. I would like to clear the air around the issue of effectiveness through an analysis of a web scraping solution. The use case is fairly simple: at certain times during the day, I want to run a Python script and scrape a website. The execution of the script takes less than 15 minutes. This is an important consideration, which we will come back to later. The project can be considered as a standard extract, transform, load process without a user interface and can be packed into a self-containing function or a library.

Subsequently, we need an environment to execute the script. We have at least two options to consider: on-premises (such as on your local machine, a Raspberry Pi server at home, a virtual machine in a data center, and so on) or you can deploy it to the cloud. At first glance, the former option may feel more appealing — you have the infrastructure available free of charge, why not to use it? The main concern of an on-premises hosted solution is the reliability — can you assure its availability in case of a power outage or a hardware or network failure? Additionally, does your local infrastructure support continuous integration and continuous deployment (CI/CD) tools to eliminate any manual intervention? With these two constraints in mind, I will continue the analysis of the solutions in the cloud rather than on-premises.

Let’s start with the pricing of three cloud-based scenarios and go into details below.

Pricing table of three cloud-based scenarios

*The AWS Lambda free usage tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month. Review AWS Lambda pricing.

Option #1

The first option, an instance of a virtual machine in AWS (called Amazon Elastic Cloud Compute or EC2), is the most primitive one. However, it definitely does not resemble any serverless architecture, so let’s consider it as a reference point or a baseline. This option is similar to an on-premises solution giving you full control of the instance, but you would need to manually spin an instance, install your environment, set up a scheduler to execute your script at a specific time, and keep it on for 24×7. And don’t forget the security (setting up a VPC, route tables, etc.). Additionally, you will need to monitor the health of the instance and maybe run manual updates.

Option #2

The second option is to containerize the solution and deploy it on Amazon Elastic Container Service (ECS). The biggest advantage to this is platform independence. Having a Docker file (a text document that contains all the commands you could call on the command line to assemble an image) with a copy of your environment and the script enables you to reuse the solution locally—on the AWS platform, or somewhere else. A huge advantage to running it on AWS is that you can integrate with other services, such as AWS CodeCommit, AWS CodeBuild, AWS Batch, etc. You can also benefit from discounted compute resources such as Amazon EC2 Spot instances.

Architecture of CloudWatch, Batch, ECR

The architecture, seen in the diagram above, consists of Amazon CloudWatch, AWS Batch, and Amazon Elastic Container Registry (ECR). CloudWatch allows you to create a trigger (such as starting a job when a code update is committed to a code repository) or a scheduled event (such as executing a script every hour). We want the latter: executing a job based on a schedule. When triggered, AWS Batch will fetch a pre-built Docker image from Amazon ECR and execute it in a predefined environment. AWS Batch is a free-of-charge service and allows you to configure the environment and resources needed for a task execution. It relies on ECS, which manages resources at the execution time. You pay only for the compute resources consumed during the execution of a task.

You may wonder where the pre-built Docker image came from. It was pulled from Amazon ECR, and now you have two options to store your Docker image there:

  • You can build a Docker image locally and upload it to Amazon ECR.
  • You just commit few configuration files (such as Dockerfile, buildspec.yml, etc.) to AWS CodeCommit (a code repository) and build the Docker image on the AWS platform.This option, shown in the image below, allows you to build a full CI/CD pipeline. After updating a script file locally and committing the changes to a code repository on AWS CodeCommit, a CloudWatch event is triggered and AWS CodeBuild builds a new Docker image and commits it to Amazon ECR. When a scheduler starts a new task, it fetches the new image with your updated script file. If you feel like exploring further or you want actually implement this approach please take a look at the example of the project on GitHub.

CodeCommit. CodeBuild, ECR

Option #3

The third option is based on AWS Lambda, which allows you to build a very lean infrastructure on demand, scales continuously, and has generous monthly free tier. The major constraint of Lambda is that the execution time is capped at 15 minutes. If you have a task running longer than 15 minutes, you need to split it into subtasks and run them in parallel, or you can fall back to Option #2.

By default, Lambda gives you access to standard libraries (such as the Python Standard Library). In addition, you can build your own package to support the execution of your function or use Lambda Layers to gain access to external libraries or even external Linux based programs.

Lambda Layer

You can access AWS Lambda via the web console to create a new function, update your Lambda code, or execute it. However, if you go beyond the “Hello World” functionality, you may realize that online development is not sustainable. For example, if you want to access external libraries from your function, you need to archive them locally, upload to Amazon Simple Storage Service (Amazon S3), and link it to your Lambda function.

One way to automate Lambda function development is to use AWS Cloud Development Kit (AWS CDK), which is an open source software development framework to model and provision your cloud application resources using familiar programming languages. Initially, the setup and learning might feel strenuous; however the benefits are worth of it. To give you an example, please take a look at this Python class on GitHub, which creates a Lambda function, a CloudWatch event, IAM policies, and Lambda layers.

In a summary, the AWS CDK allows you to have infrastructure as code, and all changes will be stored in a code repository. For a deployment, AWS CDK builds an AWS CloudFormation template, which is a standard way to model infrastructure on AWS. Additionally, AWS Serverless Application Model (SAM) allows you to test and debug your serverless code locally, meaning that you can indeed create a continuous integration.

See an example of a Lambda-based web scraper on GitHub.

Conclusion

In this blog post, we reviewed two serverless architectures for a web scraper on AWS cloud. Additionally, we have explored the ways to implement a CI/CD pipeline in order to avoid any future manual interventions.

Using Amazon EFS for AWS Lambda in your serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/

Serverless applications are event-driven, using ephemeral compute functions to integrate services and transform data. While AWS Lambda includes a 512-MB temporary file system for your code, this is an ephemeral scratch resource not intended for durable storage.

Amazon EFS is a fully managed, elastic, shared file system designed to be consumed by other AWS services, such as Lambda. With the release of Amazon EFS for Lambda, you can now easily share data across function invocations. You can also read large reference data files, and write function output to a persistent and shared store. There is no additional charge for using file systems from your Lambda function within the same VPC.

EFS for Lambda makes it simpler to use a serverless architecture to implement many common workloads. It opens new capabilities, such as building and importing large code libraries directly into your Lambda functions. Since the code is loaded dynamically, you can also ensure that the latest version of these libraries is always used by every new execution environment. For appending to existing files, EFS is also a preferred option to using Amazon S3.

This blog post shows how to enable EFS for Lambda in your AWS account, and walks through some common use-cases.

Capabilities and behaviors of Lambda with EFS

EFS is built to scale on demand to petabytes of data, growing and shrinking automatically as files are written and deleted. When used with Lambda, your code has low-latency access to a file system where data is persisted after the function terminates.

EFS is a highly reliable NFS-based regional service, with all data stored durably across multiple Availability Zones. It is cost-optimized, due to no provisioning requirements, and no purchase commitments. It uses built-in lifecycle management to optimize between SSD-performance class and an infrequent access class that offer 92% lower cost.

EFS offers two performance modes – general purpose and MaxIO. General purpose is suitable for most Lambda workloads, providing lower operational latency and higher performance for individual files.

You also choose between two throughput modes – bursting and provisioned. The bursting mode uses a credit system to determine when a file system can burst. With bursting, your throughput is calculated based upon the amount of data you are storing. Provisioned throughput is useful when you need more throughout than provided by the bursting mode. Total throughput available is divided across the number of concurrent Lambda invocations.

The Lambda service mounts EFS file systems when the execution environment is prepared. This adds minimal latency when the function is invoked for the first time, often within hundreds of milliseconds. When the execution environment is already warm from previous invocations, the EFS mount is already available.

EFS can be used with Provisioned Concurrency for Lambda. When the reserved capacity is prepared, the Lambda service also configures and mounts EFS file system. Since Provisioned Concurrency executes any initialization code, any libraries or packages consumed from EFS at this point are downloaded. In this use-case, it’s recommended to use provisioned throughout when configuring EFS.

The EFS file system is shared across Lambda functions as it scales up the number of concurrent executions. As files are written by one instance of a Lambda function, all other instances can access and modify this data, depending upon the access point permissions. The EFS file system scales with your Lambda functions, supporting up to 25,000 concurrent connections.

Creating an EFS file system

Configuring EFS for Lambda is straight-forward. I show how to do this in the AWS Management Console but you can also use the AWS CLI, AWS SDK, AWS Serverless Application Model (AWS SAM), and AWS CloudFormation. EFS file systems are always created within a customer VPC, so Lambda functions using the EFS file system must all reside in the same VPC.

To create an EFS file system:

  1. Navigate to the EFS console.
  2. Choose Create File System.
    EFS: Create File System
  3. On the Configure network access page, select your preferred VPC. Only resources within this VPC can access this EFS file system. Accept the default mount targets, and choose Next Step.
  4. On Configure file system settings, you can choose to enable encryption of data at rest. Review this setting, then accept the other defaults and choose Next Step. This uses bursting mode instead of provisioned throughout.
  5. On the Configure client access page, choose Add access point.
    EFS: Add access point
  6. Enter the following parameters. This configuration creates a file system with open read/write permissions – read more about settings to secure your access points. Choose Next Step.EFS: Access points
  7. On the Review and create page, check your settings and choose Create File System.
  8. In the EFS console, you see the new file system and its configuration. Wait until the Mount target state changes to Available before proceeding to the next steps.

Alternatively, you can use CloudFormation to create the EFS access point. With the AWS::EFS::AccessPoint resource, the preceding configuration is defined as follows:

  AccessPointResource:
    Type: 'AWS::EFS::AccessPoint'
    Properties:
      FileSystemId: !Ref FileSystemResource
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        CreationInfo:
          OwnerGid: "1000"
          OwnerUid: "1000"
          Permissions: "0777"
        Path: "/lambda"

For more information, see the example setup template in the code repository.

Working with AWS Cloud9 and Amazon EC2

You can mount EFS access points on Amazon EC2 instances. This can be useful for browsing file systems contents and downloading files from other locations. The EFS console shows customized mount instructions directly under each created file system:

EFS customized mount instructions

The instance must have access to the same security group and reside in the same VPC as the EFS file system. After connecting via SSH to the EC2 instance, you mount the EFS mount target to a directory. You can also mount EFS in AWS Cloud9 instances using the terminal window.

Any files you write into the EFS file system are available to any Lambda functions using the same EFS file system. Similarly, any files written by Lambda functions are available to the EC2 instance.

Sharing large code packages with Lambda

EFS is useful for sharing software packages or binaries that are otherwise too large for Lambda layers. You can copy these to EFS and have Lambda use these packages as if there are installed in the Lambda deployment package.

For example, on EFS you can install Puppeteer, which runs a headless Chromium browser, using the following script run on an EC2 instance or AWS Cloud9 terminal:

  mkdir node && cd node
  npm init -y
  npm i puppeteer --save

Building packages in EC2 for EFS

You can then use this package from a Lambda function connected to this folder in the EFS file system. You include the Puppeteer package with the mount path in the require declaration:

const puppeteer = require ('/mnt/efs/node/node_modules/puppeteer')

In Node.js, to avoid changing declarations manually, you can add the EFS mount path to the Node.js module search path by using app-module-path. Lambda functions support a range of other runtimes, including Python, Java, and Go. Many other runtimes offer similar ways to add the EFS path to the list of default package locations.

There is an important difference between using packages in EFS compared with Lambda layers. When you use Lambda layers to include packages, these are downloaded to an immutable code package. Any changes to the underlying layer do not affect existing functions published using that layer.

Since EFS is a dynamic binding, any changes or upgrades to packages are available immediately to the Lambda function when the execution environment is prepared. This means you can output a build process to an EFS mount, and immediately consume any new versions of the build from a Lambda function.

Configuring AWS Lambda to use EFS

Lambda functions that access EFS must run from within a VPC. Read this guide to learn more about setting up Lambda functions to access resources from a VPC. There are also sample CloudFormation templates you can use to configure private and public VPC access.

The execution role for Lambda function must provide access to the VPC and EFS. For development and testing purposes, this post uses the AWSLambdaVPCAccessExecutionRole and AmazonElasticFileSystemClientFullAccess managed policies in IAM. For production systems, you should use more restrictive policies to control access to EFS resources.

Once your Lambda function is configured to use a VPC, next configure EFS in Lambda:

  1. Navigate to the Lambda console and select your function from the list.
  2. Scroll down to the File system panel, and choose Add file system.
    EFS: Add file system
  3. In the File system configuration:
  • From the EFS file system dropdown, select the required file system. From the Access point dropdown, choose the required EFS access point.
  • In the Local mount path, enter the path your Lambda function uses to access this resource. Enter an absolute path.
  • Choose Save.
    EFS: Add file system

The File system panel now shows the configuration of the EFS mount, and the function is ready to use EFS. Alternatively, you can use an AWS Serverless Application Model (SAM) template to add the EFS configuration to a function resource:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  MyLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
	...
      FileSystemConfigs:
      - Arn: arn:aws:elasticfilesystem:us-east-1:xxxxxx:accesspoint/
fsap-123abcdef12abcdef
        LocalMountPath: /mnt/efs

To learn more, see the SAM documentation on this feature.

Example applications

You can view and download these examples from this GitHub repository. To deploy, follow the instructions in the repo’s README.md file.

1. Processing large video files

The first example uses EFS to process a 60-minute MP4 video and create screenshots for each second of the recording. This uses to the FFmpeg Linux package to process the video. After copying the MP4 to the EFS file location, invoke the Lambda function to create a series of JPG frames. This uses the following code to execute FFmpeg and pass the EFS mount path and input file parameters:

const os = require('os')

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH

const { exec } = require('child_process')

const execPromise = async (command) => {
	console.log(command)
	return new Promise((resolve, reject) => {
		const ls = exec(command, function (error, stdout, stderr) {
		  if (error) {
		    console.log('Error: ', error)
		    reject(error)
		  }
		  console.log('stdout: ', stdout);
		  console.log('stderr: ' ,stderr);
		})
		
		ls.on('exit', function (code) {
		  console.log('Finished: ', code);
		  resolve()
		})
	})
}

// The Lambda handler
exports.handler = async function (eventObject, context) {
	await execPromise(`/opt/bin/ffmpeg -loglevel error -i ${efsPath}/${inputFile} -s 240x135 -vf fps=1 ${efsPath}/%d.jpg`)
}

In this example, the process writes more than 2000 individual JPG files back to the EFS file system during a single invocation:

Console output from sample application

2. Archiving large numbers of files

Using the output from the first application, the second example creates a single archive file from the JPG files. The code uses the Node.js archiver package for processing:

const outputFile = process.env.OUTPUT_FILE
const efsPath = process.env.EFS_PATH

const fs = require('fs')
const archiver = require('archiver')

// The Lambda handler
exports.handler = function (event) {

  const output = fs.createWriteStream(`${efsPath}/${outputFile}`)
  const archive = archiver('zip', {
    zlib: { level: 9 } // Sets the compression level.
  })
  
  output.on('close', function() {
    console.log(archive.pointer() + ' total bytes')
  })
  
  output.on('end', function() {
    console.log('Data has been drained')
  })
  
  archive.pipe(output)  

  // append files from a glob pattern
  archive.glob(`${efsPath}/*.jpg`)
  archive.finalize()
}

After executing this Lambda function, the resulting ZIP file is written back to the EFS file system:

Console output from second sample application.

3. Unzipping archives with a large number of files

The last example shows how to unzip an archive containing many files. This uses the Node.js unzipper package for processing:

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH
const destinationDir = process.env.DESTINATION_DIR

const fs = require('fs')
const unzipper = require('unzipper')

// The Lambda handler
exports.handler = function (event) {

  fs.createReadStream(`${efsPath}/${inputFile}`)
    .pipe(unzipper.Extract({ path: `${efsPath}/${destinationDir}` }))

}

Once this Lambda function is executed, the archive is unzipped into a destination direction in the EFS file system. This example shows the screenshots unzipped into the frames subdirectory:

Console output from third sample application.

Conclusion

EFS for Lambda allows you to share data across function invocations, read large reference data files, and write function output to a persistent and shared store. After configuring EFS, you provide the Lambda function with an access point ARN, allowing you to read and write to this file system. Lambda securely connects the function instances to the EFS mount targets in the same Availability Zone and subnet.

EFS opens a range of potential new use-cases for Lambda. In this post, I show how this enables you to access large code packages and binaries, and process large numbers of files. You can interact with the file system via EC2 or AWS Cloud9 and pass information to and from your Lambda functions.

EFS for Lambda is supported at launch in APN Partner solutions, including Epsagon, Lumigo, Datadog, HashiCorp Terraform, and Pulumi. To learn more about how to use EFS for Lambda, see the AWS News Blog post and read the documentation.

Introducing Instance Refresh for EC2 Auto Scaling

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/introducing-instance-refresh-for-ec2-auto-scaling/

This post is contributed to by: Ran Sheinberg – Principal EC2 Spot SA, and Isaac Vallhonrat – Sr. EC2 Spot Specialist SA

Today, we are launching Instance Refresh. This is a new feature in EC2 Auto Scaling that enables automatic deployments of instances in Auto Scaling Groups (ASGs), in order to release new application versions or make infrastructure updates.

Amazon EC2 Auto Scaling is used for a wide variety of workload types and applications. EC2 Auto Scaling helps you maintain application availability through a rich feature set. This feature set includes integration into Elastic Load Balancing, automatically replacing unhealthy instances, balancing instances across Availability Zones, provisioning instances across multiple pricing options and instance types, dynamically adding and removing instances, and more.

Many customers use an immutable infrastructure approach. This approach encourages replacing EC2 instances to update the application or configuration, instead of deploying into EC2 instances that are already running. This can be done by baking code and software in golden Amazon Machine Images (AMIs), and rolling out new EC2 Instances that use the new AMI version. Another common pattern for rolling out application updates is changing the package version that the instance pulls when it boots (via updates to instance user data). Or, keeping that pointer static, and pushing a new version to the code repository or another type of artifact (container, package on Amazon S3) to be fetched by an instance when it boots and gets provisioned.

Until today, EC2 Auto Scaling customers used different methods for replacing EC2 instances inside EC2 Auto Scaling groups when a deployment or operating system update was needed. For example, UpdatePolicy within AWS CloudFormation, create_before_destroy lifecycle in Hashicorp Terraform, using AWS CodeDeploy, or even custom scripts that call the EC2 Auto Scaling API.

Customers told us that they want native deployment functionality built into EC2 Auto Scaling to take away the heavy lifting of custom solutions, or deployments that are initiated from outside of Auto Scaling groups.

Introducing Instance Refresh in EC2 Auto Scaling

You can trigger an Instance Refresh using the EC2 Auto Scaling groups Management Console, or use the new StartInstanceRefresh API in AWS CLI or any AWS SDK. All you need to do is specify the percentage of healthy instances to keep in the group while ASG terminates and launches instances. Also specify the warm-up time, which is the time period that ASG waits between groups of instances, that refreshes via Instance Refresh. If your ASG is using Health Checks, the ASG waits for the instances in the group to be healthy before it continues to the next group of instances.

Instance Refresh in action

To get started with Instance Refresh in the AWS Management Console, click on an existing ASG in the EC2 Auto Scaling Management Console. Then click the Instance refresh tab.

When clicking the Start instance refresh button, I am presented with the following options:

start instance refresh

With the default configuration, ASG works to keep 90% of the instances running and does not proceed to the next group of instances if that percentage is not kept. After each group, ASG waits for the newly launched instances to transition into the healthy state, in addition to the 300-second warm-up time to pass before proceeding to the next group of instances.

I can also initiate the same action from the AWS CLI by using the following code:

aws autoscaling start-instance-refresh --auto-scaling-group-name ASG-Instance-Refresh —preferences MinHealthyPercentage=90,InstanceWarmup=300

After initializing the instance refresh process, I can see ongoing instance refreshes in the console:

initialize instance refresh

The following image demonstrates how an active Instance refresh looks in the EC2 Instances console. Moreover, ASG strives to keep the capacity balanced between Availability Zones by terminating and launching instances in different Availability Zones in each group.

active instance refresh

Automate your workflow with Instance Refresh

You can now use this new functionality to create automations that work for your use-case.

To get started quickly, we created an example solution based on AWS Lambda. Visit the solution page on Github and see the deployment instructions.

Here’s an overview of what the solution contains and how it works:

  • An EC2 Auto Scaling group with two instances running
  • An EC2 Image Builder pipeline, set up to build and test an AMI
  • An SNS topic that would get notified when the image build completes
  • A Lambda function that is subscribed to the SNS topic, which gets triggered when the image build completes
  • The Lambda function gets the new AMI ID from the SNS notification, creates a new Launch Template version, and then triggers an Instance Refresh in the ASG, which starts the rolling update of instances.
  • Because you can configure the ASG with LaunchTemplateVersion = $Latest, every new instance that is launched by the Instance Refresh process uses the new AMI from the latest version of the Launch Template.

See the automation flow in the following diagram.

instance refresh automation flow

Conclusion

We hope that the new Instance Refresh functionality in your ASGs allow for a more streamlined approach to launching and updating your application deployments running on EC2. You can now create automations that fit your use case. This allows you to more easily refresh the EC2 instances running in your Auto Scaling groups, when deploying a new version of your application or when you must replace the AMI being used. Visit the user-guide to learn more and get started.