Tag Archives: EC2

UI Testing at Scale with AWS Lambda

Post Syndicated from Stas Neyman original https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

This is a guest blog post by Wes Couch and Kurt Waechter from the Blackboard Internal Product Development team about their experience using AWS Lambda.

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Here’s how we did it.

The backstory:

Blackboard is a global leader in delivering robust and innovative education software and services to clients in higher education, government, K12, and corporate training. We have a large product development team working across the globe in at least 10 different time zones, with an internal tools team providing support for quality and workflows. We have been using Selenium Webdriver to perform automated cross-browser UI testing since 2007. Because we are now practicing continuous delivery, the automated UI testing challenge has grown due to the faster release schedule. On top of that, every commit made to each branch triggers an execution of our automated UI test suite. If you have ever implemented an automated UI testing infrastructure, you know that it can be very challenging to scale and maintain. Although there are services that are useful for testing different browser/OS combinations, they don’t meet our scale needs.

It used to take three hours to synchronously run our functional UI suite, which revealed the obvious need for parallel execution. Previously, we used Mesos to orchestrate a Selenium Grid Docker container for each test run. This way, we were able to run eight concurrent threads for test execution, which took an average of 16 minutes. Although this setup is fine for a single workflow, the cracks started to show when we reached the scale required for Blackboard’s mature product lines. Going beyond eight concurrent sessions on a single container introduced performance problems that impact the reliability of tests (for example, issues in Webdriver or the browser popping up frequently). We tried Mesos and considered Kubernetes for Selenium Grid orchestration, but the answer to scaling a Selenium Grid was to think smaller, not larger. This led to our breakthrough with AWS Lambda.

The solution:

We started using AWS Lambda for UI testing because it doesn’t require costly infrastructure or countless man hours to maintain. The steps we outline in this blog post took one work day, from inception to implementation. By simply packaging the UI test suite into a Lambda function, we can execute these tests in parallel on a massive scale. We use a custom JUnit test runner that invokes the Lambda function with a request to run each test from the suite. The runner then aggregates the results returned from each Lambda test execution.

Selenium is the industry standard for testing UI at scale. Although there are other options to achieve the same thing in Lambda, we chose this mature suite of tools. Selenium is backed by Google, Firefox, and others to help the industry drive their browsers with code. This makes Lambda and Selenium a compelling stack for achieving UI testing at scale.

Making Chrome Run in Lambda

Currently, Chrome for Linux will not run in Lambda due to an absent mount point. By rebuilding Chrome with a slight modification, as Marco Lüthy originally demonstrated, you can run it inside Lambda anyway! It took about two hours to build the current master branch of Chromium to build on a c4.4xlarge. Unfortunately, the current version of ChromeDriver, 2.33, does not support any version of Chrome above 62, so we’ll be using Marco’s modified version of version 60 for the near future.

Required System Libraries

The Lambda runtime environment comes with a subset of common shared libraries. This means we need to include some extra libraries to get Chrome and ChromeDriver to work. Anything that exists in the java resources folder during compile time is included in the base directory of the compiled jar file. When this jar file is deployed to Lambda, it is placed in the /var/task/ directory. This allows us to simply place the libraries in the java resources folder under a folder named lib/ so they are right where they need to be when the Lambda function is invoked.

To get these libraries, create an EC2 instance and choose the Amazon Linux AMI.

Next, use ssh to connect to the server. After you connect to the new instance, search for the libraries to find their locations.

sudo find / -name libgconf-2.so.4
sudo find / -name libORBit-2.so.0

Now that you have the locations of the libraries, copy these files from the EC2 instance and place them in the java resources folder under lib/.

Packaging the Tests

To deploy the test suite to Lambda, we used a simple Gradle tool called ShadowJar, which is similar to the Maven Shade Plugin. It packages the libraries and dependencies inside the jar that is built. Usually test dependencies and sources aren’t included in a jar, but for this instance we want to include them. To include the test dependencies, add this section to the build.gradle file.

shadowJar {
   from sourceSets.test.output
   configurations = [project.configurations.testRuntime]
}

Deploying the Test Suite

Now that our tests are packaged with the dependencies in a jar, we need to get them into a running Lambda function. We use  simple SAM  templates to upload the packaged jar into S3, and then deploy it to Lambda with our settings.

{
   "AWSTemplateFormatVersion": "2010-09-09",
   "Transform": "AWS::Serverless-2016-10-31",
   "Resources": {
       "LambdaTestHandler": {
           "Type": "AWS::Serverless::Function",
           "Properties": {
               "CodeUri": "./build/libs/your-test-jar-all.jar",
               "Runtime": "java8",
               "Handler": "com.example.LambdaTestHandler::handleRequest",
               "Role": "<YourLambdaRoleArn>",
               "Timeout": 300,
               "MemorySize": 1536
           }
       }
   }
}

We use the maximum timeout available to ensure our tests have plenty of time to run. We also use the maximum memory size because this ensures our Lambda function can support Chrome and other resources required to run a UI test.

Specifying the handler is important because this class executes the desired test. The test handler should be able to receive a test class and method. With this information it will then execute the test and respond with the results.

public LambdaTestResult handleRequest(TestRequest testRequest, Context context) {
   LoggerContainer.LOGGER = new Logger(context.getLogger());
  
   BlockJUnit4ClassRunner runner = getRunnerForSingleTest(testRequest);
  
   Result result = new JUnitCore().run(runner);

   return new LambdaTestResult(result);
}

Creating a Lambda-Compatible ChromeDriver

We provide developers with an easily accessible ChromeDriver for local test writing and debugging. When we are running tests on AWS, we have configured ChromeDriver to run them in Lambda.

To configure ChromeDriver, we first need to tell ChromeDriver where to find the Chrome binary. Because we know that ChromeDriver is going to be unzipped into the root task directory, we should point the ChromeDriver configuration at that location.

The settings for getting ChromeDriver running are mostly related to Chrome, which must have its working directories pointed at the tmp/ folder.

Start with the default DesiredCapabilities for ChromeDriver, and then add the following settings to enable your ChromeDriver to start in Lambda.

public ChromeDriver createLambdaChromeDriver() {
   ChromeOptions options = new ChromeOptions();

   // Set the location of the chrome binary from the resources folder
   options.setBinary("/var/task/chrome");

   // Include these settings to allow Chrome to run in Lambda
   options.addArguments("--disable-gpu");
   options.addArguments("--headless");
   options.addArguments("--window-size=1366,768");
   options.addArguments("--single-process");
   options.addArguments("--no-sandbox");
   options.addArguments("--user-data-dir=/tmp/user-data");
   options.addArguments("--data-path=/tmp/data-path");
   options.addArguments("--homedir=/tmp");
   options.addArguments("--disk-cache-dir=/tmp/cache-dir");
  
   DesiredCapabilities desiredCapabilities = DesiredCapabilities.chrome();
   desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
  
   return new ChromeDriver(desiredCapabilities);
}

Executing Tests in Parallel

You can approach parallel test execution in Lambda in many different ways. Your approach depends on the structure and design of your test suite. For our solution, we implemented a custom test runner that uses reflection and JUnit libraries to create a list of test cases we want run. When we have the list, we create a TestRequest object to pass into the Lambda function that we have deployed. In this TestRequest, we place the class name, test method, and the test run identifier. When the Lambda function receives this TestRequest, our LambdaTestHandler generates and runs the JUnit test. After the test is complete, the test result is sent to the test runner. The test runner compiles a result after all of the tests are complete. By executing the same Lambda function multiple times with different test requests, we can effectively run the entire test suite in parallel.

To get screenshots and other test data, we pipe those files during test execution to an S3 bucket under the test run identifier prefix. When the tests are complete, we link the files to each test execution in the report generated from the test run. This lets us easily investigate test executions.

Pro Tip: Dynamically Loading Binaries

AWS Lambda has a limit of 250 MB of uncompressed space for packaged Lambda functions. Because we have libraries and other dependencies to our test suite, we hit this limit when we tried to upload a function that contained Chrome and ChromeDriver (~140 MB). This test suite was not originally intended to be used with Lambda. Otherwise, we would have scrutinized some of the included libraries. To get around this limit, we used the Lambda functions temporary directory, which allows up to 500 MB of space at runtime. Downloading these binaries at runtime moves some of that space requirement into the temporary directory. This allows more room for libraries and dependencies. You can do this by grabbing Chrome and ChromeDriver from an S3 bucket and marking them as executable using built-in Java libraries. If you take this route, be sure to point to the new location for these executables in order to create a ChromeDriver.

private static void downloadS3ObjectToExecutableFile(String key) throws IOException {
   File file = new File("/tmp/" + key);

   GetObjectRequest request = new GetObjectRequest("s3-bucket-name", key);

   FileUtils.copyInputStreamToFile(s3client.getObject(request).getObjectContent(), file);
   file.setExecutable(true);
}

Lambda-Selenium Project Source

We have compiled an open source example that you can grab from the Blackboard Github repository. Grab the code and try it out!

https://blackboard.github.io/lambda-selenium/

Conclusion

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Thanks to AWS Lambda, we can reduce our build times and perform automated UI testing at scale!

Serverless Automated Cost Controls, Part1

Post Syndicated from Shankar Ramachandran original https://aws.amazon.com/blogs/compute/serverless-automated-cost-controls-part1/

This post courtesy of Shankar Ramachandran, Pubali Sen, and George Mao

In line with AWS’s continual efforts to reduce costs for customers, this series focuses on how customers can build serverless automated cost controls. This post provides an architecture blueprint and a sample implementation to prevent budget overruns.

This solution uses the following AWS products:

  • AWS Budgets – An AWS Cost Management tool that helps customers define and track budgets for AWS costs, and forecast for up to three months.
  • Amazon SNS – An AWS service that makes it easy to set up, operate, and send notifications from the cloud.
  • AWS Lambda – An AWS service that lets you run code without provisioning or managing servers.

You can fine-tune a budget for various parameters, for example filtering by service or tag. The Budgets tool lets you post notifications on an SNS topic. A Lambda function that subscribes to the SNS topic can act on the notification. Any programmatically implementable action can be taken.

The diagram below describes the architecture blueprint.

In this post, we describe how to use this blueprint with AWS Step Functions and IAM to effectively revoke the ability of a user to start new Amazon EC2 instances, after a budget amount is exceeded.

Freedom with guardrails

AWS lets you quickly spin up resources as you need them, deploying hundreds or even thousands of servers in minutes. This means you can quickly develop and roll out new applications. Teams can experiment and innovate more quickly and frequently. If an experiment fails, you can always de-provision those servers without risk.

This improved agility also brings in the need for effective cost controls. Your Finance and Accounting department must budget, monitor, and control the AWS spend. For example, this could be a budget per project. Further, Finance and Accounting must take appropriate actions if the budget for the project has been exceeded, for example. Call it “freedom with guardrails” – where Finance wants to give developers freedom, but with financial constraints.

Architecture

This section describes how to use the blueprint introduced earlier to implement a “freedom with guardrails” solution.

  1. The budget for “Project Beta” is set up in Budgets. In this example, we focus on EC2 usage and identify the instances that belong to this project by filtering on the tag Project with the value Beta. For more information, see Creating a Budget.
  2. The budget configuration also includes settings to send a notification on an SNS topic when the usage exceeds 100% of the budgeted amount. For more information, see Creating an Amazon SNS Topic for Budget Notifications.
  3. The master Lambda function receives the SNS notification.
  4. It triggers execution of a Step Functions state machine with the parameters for completing the configured action.
  5. The action Lambda function is triggered as a task in the state machine. The function interacts with IAM to effectively remove the user’s permissions to create an EC2 instance.

This decoupled modular design allows for extensibility.  New actions (serially or in parallel) can be added by simply adding new steps.

Implementing the solution

All the instructions and code needed to implement the architecture have been posted on the Serverless Automated Cost Controls GitHub repo. We recommend that you try this first in a Dev/Test environment.

This implementation description can be broken down into two parts:

  1. Create a solution stack for serverless automated cost controls.
  2. Verify the solution by testing the EC2 fleet.

To tie this back to the “freedom with guardrails” scenario, the Finance department performs a one-time implementation of the solution stack. To simulate resources for Project Beta, the developers spin up the test EC2 fleet.

Prerequisites

There are two prerequisites:

  • Make sure that you have the necessary IAM permissions. For more information, see the section titled “Required IAM permissions” in the README.
  • Define and activate a cost allocation tag with the key Project. For more information, see Using Cost Allocation Tags. It can take up to 12 hours for the tags to propagate to Budgets.

Create resources

The solution stack includes creating the following resources:

  • Three Lambda functions
  • One Step Functions state machine
  • One SNS topic
  • One IAM group
  • One IAM user
  • IAM policies as needed
  • One budget

Two of the Lambda functions were described in the previous section, to a) receive the SNS notification and b) trigger the Step Functions state machine. Another Lambda function is used to create the budget, as a custom AWS CloudFormation resource. The SNS topic connects Budgets with Lambda function A. Lambda function B is configured as a task in Step Functions. A budget for $2 is created which is filtered by Service: EC2 and Tag: Project, Beta. A test IAM group and user is created to enable you to validate this Cost Control Solution.

To create the serverless automated cost control solution stack, choose the button below. It takes few minutes to spin up the stack. You can monitor the progress in the CloudFormation console.

When you see the CREATE_COMPLETE status for the stack you had created, choose Outputs. Copy the following four values that you need later:

  • TemplateURL
  • UserName
  • SignInURL
  • Password

Verify the stack

The next step is to verify the serverless automated cost controls solution stack that you just created. To do this, spin up an EC2 fleet of t2.micro instances, representative of the resources needed for Project Beta, and tag them with Project, Beta.

  1. Browse to the SignInURL, and log in using the UserName and Password values copied on from the stack output.
  2. In the CloudFormation console, choose Create Stack.
  3. For Choose a template, select Choose an Amazon S3 template URL and paste the TemplateURL value from the preceding section. Choose Next.
  4. Give this stack a name, such as “testEc2FleetForProjectBeta”. Choose Next.
  5. On the Specify Details page, enter parameters such as the UserName and Password copied in the previous section. Choose Next.
  6. Ignore any errors related to listing IAM roles. The test user has a minimal set of permissions that is just sufficient to spin up this test stack (in line with security best practices).
  7. On the Options page, choose Next.
  8. On the Review page, choose Create. It takes a few minutes to spin up the stack, and you can monitor the progress in the CloudFormation console. 
  9. When you see the status “CREATE_COMPLETE”, open the EC2 console to verify that four t2.micro instances have been spun up, with the tag of Project, Beta.

The hourly cost for these instances depends on the region in which they are running. On the average (irrespective of the region), you can expect the aggregate cost for this EC2 fleet to exceed the set $2 budget in 48 hours.

Verify the solution

The first step is to identify the test IAM group that was created in the previous section. The group should have “projectBeta” in the name, prepended with the CloudFormation stack name and appended with an alphanumeric string. Verify that the managed policy associated is: “EC2FullAccess”, which indicates that the users in this group have unrestricted access to EC2.

There are two stages of verification for this serverless automated cost controls solution: simulating a notification and waiting for a breach.

Simulated notification

Because it takes at least a few hours for the aggregate cost of the EC2 fleet to breach the set budget, you can verify the solution by simulating the notification from Budgets.

  1. Log in to the SNS console (using your regular AWS credentials).
  2. Publish a message on the SNS topic that has “budgetNotificationTopic” in the name. The complete name is appended by the CloudFormation stack identifier.  
  3. Copy the following text as the body of the notification: “This is a mock notification”.
  4. Choose Publish.
  5. Open the IAM console to verify that the policy for the test group has been switched to “EC2ReadOnly”. This prevents users in this group from creating new instances.
  6. Verify that the test user created in the previous section cannot spin up new EC2 instances.  You can log in as the test user and try creating a new EC2 instance (via the same CloudFormation stack or the EC2 console). You should get an error message indicating that you do not have the necessary permissions.
  7. If you are proceeding to stage 2 of the verification, then you must switch the permissions back to “EC2FullAccess” for the test group, which can be done in the IAM console.

Automatic notification

Within 48 hours, the aggregate cost of the EC2 fleet spun up in the earlier section breaches the budget rule and triggers an automatic notification. This results in the permissions getting switched out, just as in the simulated notification.

Clean up

Use the following steps to delete your resources and stop incurring costs.

  1. Open the CloudFormation console.
  2. Delete the EC2 fleet by deleting the appropriate stack (for example, delete the stack named “testEc2FleetForProjectBeta”).                                               
  3. Next, delete the “costControlStack” stack.                                                                                                                                                    

Conclusion

Using Lambda in tandem with Budgets, you can build Serverless automated cost controls on AWS. Find all the resources (instructions, code) for implementing the solution discussed in this post on the Serverless Automated Cost Controls GitHub repo.

Stay tuned to this series for more tips about building serverless automated cost controls. In the next post, we discuss using smart lighting to influence developer behavior and describe a solution to encourage cost-aware development practices.

If you have questions or suggestions, please comment below.

 

The 10 Most Viewed Security-Related AWS Knowledge Center Articles and Videos for November 2017

Post Syndicated from Maggie Burke original https://aws.amazon.com/blogs/security/the-10-most-viewed-security-related-aws-knowledge-center-articles-and-videos-for-november-2017/

AWS Knowledge Center image

The AWS Knowledge Center helps answer the questions most frequently asked by AWS Support customers. The following 10 Knowledge Center security articles and videos have been the most viewed this month. It’s likely you’ve wondered about a few of these topics yourself, so here’s a chance to learn the answers!

  1. How do I create an AWS Identity and Access Management (IAM) policy to restrict access for an IAM user, group, or role to a particular Amazon Virtual Private Cloud (VPC)?
    Learn how to apply a custom IAM policy to restrict IAM user, group, or role permissions for creating and managing Amazon EC2 instances in a specified VPC.
  2. How do I use an MFA token to authenticate access to my AWS resources through the AWS CLI?
    One IAM best practice is to protect your account and its resources by using a multi-factor authentication (MFA) device. If you plan use the AWS Command Line Interface (CLI) while using an MFA device, you must create a temporary session token.
  3. Can I restrict an IAM user’s EC2 access to specific resources?
    This article demonstrates how to link multiple AWS accounts through AWS Organizations and isolate IAM user groups in their own accounts.
  4. I didn’t receive a validation email for the SSL certificate I requested through AWS Certificate Manager (ACM)—where is it?
    Can’t find your ACM validation emails? Be sure to check the email address to which you requested that ACM send validation emails.
  5. How do I create an IAM policy that has a source IP restriction but still allows users to switch roles in the AWS Management Console?
    Learn how to write an IAM policy that not only includes a source IP restriction but also lets your users switch roles in the console.
  6. How do I allow users from another account to access resources in my account through IAM?
    If you have the 12-digit account number and permissions to create and edit IAM roles and users for both accounts, you can permit specific IAM users to access resources in your account.
  7. What are the differences between a service control policy (SCP) and an IAM policy?
    Learn how to distinguish an SCP from an IAM policy.
  8. How do I share my customer master keys (CMKs) across multiple AWS accounts?
    To grant another account access to your CMKs, create an IAM policy on the secondary account that grants access to use your CMKs.
  9. How do I set up AWS Trusted Advisor notifications?
    Learn how to receive free weekly email notifications from Trusted Advisor.
  10. How do I use AWS Key Management Service (AWS KMS) encryption context to protect the integrity of encrypted data?
    Encryption context name-value pairs used with AWS KMS encryption and decryption operations provide a method for checking ciphertext authenticity. Learn how to use encryption context to help protect your encrypted data.

The AWS Security Blog will publish an updated version of this list regularly going forward. You also can subscribe to the AWS Knowledge Center Videos playlist on YouTube.

– Maggie

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 2

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-2/

Yesterday in Part 1 of this blog post, I showed you how to:

  1. Launch an Amazon EC2 instance with an AWS Identity and Access Management (IAM) role, an Amazon Elastic Block Store (Amazon EBS) volume, and tags that Amazon EC2 Systems Manager (Systems Manager) and Amazon Inspector use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.

Today in Steps 3 and 4, I show you how to:

  1. Take Amazon EBS snapshots using Amazon EBS Snapshot Scheduler to automate snapshots based on instance tags.
  2. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

To catch up on Steps 1 and 2, see yesterday’s blog post.

Step 3: Take EBS snapshots using EBS Snapshot Scheduler

In this section, I show you how to use EBS Snapshot Scheduler to take snapshots of your instances at specific intervals. To do this, I will show you how to:

  • Determine the schedule for EBS Snapshot Scheduler by providing you with best practices.
  • Deploy EBS Snapshot Scheduler by using AWS CloudFormation.
  • Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.

In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.

Determine the schedule

As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.

In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.

Deploy EBS Snapshot Scheduler

In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:

  1. Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
  2. On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
  3. Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.

Tag your EC2 instances

EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:

  • <snapshot time> – Time in 24-hour format with no colon.
  • <retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
  • <time zone> – The time zone of the times specified in <snapshot time>.
  • <active day(s)>all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.

Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.

Tag Value
scheduler:ebs-snapshot:0900 0900;3;utc;weekdays
scheduler:ebs-snapshot:1000 1000;3;utc;weekdays
scheduler:ebs-snapshot:1100 1100;3;utc;weekdays
scheduler:ebs-snapshot:1200 1200;3;utc;weekdays
scheduler:ebs-snapshot:1300 1300;3;utc;weekdays
scheduler:ebs-snapshot:1400 1400;3;utc;weekdays
scheduler:ebs-snapshot:1500 1500;3;utc;weekdays
scheduler:ebs-snapshot:1600 1600;3;utc;weekdays

Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:

  1. Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
  2. Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
  3. Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
    Screenshot of how your EC2 instance should look in the console
  4. After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.Screenshot of snapshots beginning to populate on the Snapshots page of the EC2 console
  5. To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.Screenshot of checking to see if EBS Snapshot Scheduler is active
  1. You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.Screenshot of monitoring when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule

If you want to restore and attach an EBS volume, see Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Step 4: Use Amazon Inspector

In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:

  • Install the Amazon Inspector agent by using EC2 Run Command.
  • Set up notifications using Amazon SNS to notify you of any findings.
  • Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
  • Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.

Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:

  • Common Vulnerabilities and Exposures
  • Center for Internet Security Benchmarks
  • Runtime Behavior Analysis

In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.

Install the Amazon Inspector agent

To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.

  1. Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
    Screenshot of choosing "Run a command"
  2. To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
    Screenshot of installing the Amazon Inspector agent
  3. Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
    Screenshot showing that the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances

Set up notifications using Amazon SNS

Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.

To set up an SNS topic:

  1. In the AWS Management Console, choose Simple Notification Service under Messaging in the Services menu.
  2. Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
    "Create new topic" page
  3. To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
  4. For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
    Screenshot of editing the topic policy
  5. To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
    Screenshot of subscribing to the new topic

Define an Amazon Inspector target and template

Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.

To create an Amazon Inspector target:

  1. Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
  2. For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
    Screenshot of creating an IAM service role for Amazon Inspector
  3. Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
    Screenshot of defining the Amazon Inspector target
  4. Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
    Screenshot of defining an assessment template
  5. Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
    Screenshot of choosing an assessment template
  6. A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
    Screenshot of choosing the previously created topic and the events about which you want SNS to notify you

Schedule Amazon Inspector assessment runs

The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.

Image of Cron schedule

All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.

To create the CloudWatch Events rule:

  1. Navigate to the CloudWatch console, choose Events, and choose Create rule.
    Screenshot of starting to create a rule in the CloudWatch Events console
  2. On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
  3. You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
    Screenshot of scheduling an Amazon Inspector assessment with a Cron expression
  4. Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
    Screenshot of adding a target
  5. Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
  6. Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
    Screenshot of completing the creation of the rule
  7. To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
    Screenshot of confirming the CloudWatch Events rule works
  8. Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
    Screenshot of confirming the launch of the first assessment run

You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.

Conclusion

In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

If you have comments about today’s or yesterday’s post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

Now You Can Use AWS Shield Advanced to Help Protect Your Amazon EC2 Instances and Network Load Balancers

Post Syndicated from Ritwik Manan original https://aws.amazon.com/blogs/security/now-you-can-use-aws-shield-advanced-to-protect-your-amazon-ec2-instances-and-network-load-balancers/

AWS Shield image

Starting today, AWS Shield Advanced can help protect your Amazon EC2 instances and Network Load Balancers against infrastructure-layer Distributed Denial of Service (DDoS) attacks. Enable AWS Shield Advanced on an AWS Elastic IP address and attach the address to an internet-facing EC2 instance or Network Load Balancer. AWS Shield Advanced automatically detects the type of AWS resource behind the Elastic IP address and mitigates DDoS attacks.

AWS Shield Advanced also ensures that all your Amazon VPC network access control lists (ACLs) are automatically executed on AWS Shield at the edge of the AWS network, giving you access to additional bandwidth and scrubbing capacity as well as mitigating large volumetric DDoS attacks. You also can customize additional mitigations on AWS Shield by engaging the AWS DDoS Response Team, which can preconfigure the mitigations or respond to incidents as they happen. For every incident detected by AWS Shield Advanced, you also get near-real-time visibility via Amazon CloudWatch metrics and details about the incident, such as the geographic origin and source IP address of the attack.

AWS Shield Advanced for Elastic IP addresses extends the coverage of DDoS cost protection, which safeguards against scaling charges as a result of a DDoS attack. DDoS cost protection now allows you to request service credits for Elastic Load Balancing, Amazon CloudFront, Amazon Route 53, and your EC2 instance hours in the event that these increase as the result of a DDoS attack.

Get started protecting EC2 instances and Network Load Balancers

To get started:

  1. Sign in to the AWS Management Console and navigate to the AWS WAF and AWS Shield console.
  2. Activate AWS Shield Advanced by choosing Activate AWS Shield Advanced and accepting the terms.
  3. Navigate to Protected Resources through the navigation pane.
  4. Choose the Elastic IP addresses that you want to protect (these can point to EC2 instances or Network Load Balancers).

If AWS Shield Advanced detects a DDoS attack, you can get details about the attack by checking CloudWatch, or the Incidents tab on the AWS WAF and AWS Shield console. To learn more about this new feature and AWS Shield Advanced, see the AWS Shield home page.

If you have comments or questions about this post, submit them in the “Comments” section below, start a new thread in the AWS Shield forum, or contact AWS Support.

– Ritwik

Access Resources in a VPC from AWS CodeBuild Builds

Post Syndicated from John Pignata original https://aws.amazon.com/blogs/devops/access-resources-in-a-vpc-from-aws-codebuild-builds/

John Pignata, Startup Solutions Architect, Amazon Web Services

In this blog post we’re going to discuss a new AWS CodeBuild feature that is available starting today. CodeBuild builds can now access resources in a VPC directly without these resources being exposed to the public internet. These resources include Amazon Relational Database Service (Amazon RDS) databases, Amazon ElastiCache clusters, internal services running on Amazon Elastic Compute Cloud (Amazon EC2), and Amazon EC2 Container Service (Amazon ECS), or any service endpoints that are only reachable from within a specific VPC.

CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. As part of the build process, developers often require access to resources that should be isolated from the public Internet. Now CodeBuild builds can be optionally configured to have VPC connectivity and access these resources directly.

Accessing Resources in a VPC

You can configure builds to have access to a VPC when you create a CodeBuild project or you can update an existing CodeBuild project with VPC configuration attributes. Here’s how it looks in the console:

 

To configure VPC connectivity: select a VPC, one or more subnets within that VPC, and one or more VPC security groups that CodeBuild should apply when attaching to your VPC. Once configured, commands running as part of your build will be able to access resources in your VPC without transiting across the public Internet.

Use Cases

The availability of VPC connectivity from CodeBuild builds unlocks many potential uses. For example, you can:

  • Run integration tests from your build against data in an Amazon RDS instance that’s isolated on a private subnet.
  • Query data in an ElastiCache cluster directly from tests.
  • Interact with internal web services hosted on Amazon EC2, Amazon ECS, or services that use internal Elastic Load Balancing.
  • Retrieve dependencies from self-hosted, internal artifact repositories such as PyPI for Python, Maven for Java, npm for Node.js, and so on.
  • Access objects in an Amazon S3 bucket configured to allow access only through a VPC endpoint.
  • Query external web services that require fixed IP addresses through the Elastic IP address of the NAT gateway associated with your subnet(s).

… and more! Your builds can now access any resource that’s hosted in your VPC without any compromise on network isolation.

Internet Connectivity

CodeBuild requires access to resources on the public Internet to successfully execute builds. At a minimum, it must be able to reach your source repository system (such as AWS CodeCommit, GitHub, Bitbucket), Amazon Simple Storage Service (Amazon S3) to deliver build artifacts, and Amazon CloudWatch Logs to stream logs from the build process. The interface attached to your VPC will not be assigned a public IP address so to enable Internet access from your builds, you will need to set up a managed NAT Gateway or NAT instance for the subnets you configure. You must also ensure your security groups allow outbound access to these services.

IP Address Space

Each running build will be assigned an IP address from one of the subnets in your VPC that you designate for CodeBuild to use. As CodeBuild scales to meet your build volume, ensure that you select subnets with enough address space to accommodate your expected number of concurrent builds.

Service Role Permissions

CodeBuild requires new permissions in order to manage network interfaces on your VPCs. If you create a service role for your new projects, these permissions will be included in that role’s policy automatically. For existing service roles, you can edit the policy document to include the additional actions. For the full policy document to apply to your service role, see Advanced Setup in the CodeBuild documentation.

For more information, see VPC Support in the CodeBuild documentation. We hope you find the ability to access internal resources on a VPC useful in your build processes! If you have any questions or feedback, feel free to reach out to us through the AWS CodeBuild forum or leave a comment!

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 1

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-1/

Most malware tries to compromise your systems by using a known vulnerability that the maker of the operating system has already patched. To help prevent malware from affecting your systems, two security best practices are to apply all operating system patches to your systems and actively monitor your systems for missing patches. In case you do need to recover from a malware attack, you should make regular backups of your data.

In today’s blog post (Part 1 of a two-part post), I show how to keep your Amazon EC2 instances that run Microsoft Windows up to date with the latest security patches by using Amazon EC2 Systems Manager. Tomorrow in Part 2, I show how to take regular snapshots of your data by using Amazon EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

What you should know first

To follow along with the solution in this post, you need one or more EC2 instances. You may use existing instances or create new instances. For the blog post, I assume this is an EC2 for Microsoft Windows Server 2012 R2 instance installed from the Amazon Machine Images (AMIs). If you are not familiar with how to launch an EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. A private subnet is not directly accessible via the internet, and access to it requires either a VPN connection to your on-premises network or a jump host in a public subnet (a subnet with access to the internet). You must make sure that the EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager and Amazon Inspector. The following diagram shows how you should structure your Amazon Virtual Private Cloud (VPC). You should also be familiar with Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Later on, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the AWS Identity and Access Management (IAM) user you are using for this post must have the iam:PassRole permission. This permission allows this IAM user to assign tasks to pass their own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. This safeguard ensures that the user cannot use the creation of tasks to elevate their IAM privileges because their own IAM privileges limit which tasks they can run against an EC2 instance. You should also authorize your IAM user to use EC2, Amazon Inspector, Amazon CloudWatch, and Systems Manager. You can achieve this by attaching the following AWS managed policies to the IAM user you are using for this example: AmazonInspectorFullAccess, AmazonEC2FullAccess, and AmazonSSMFullAccess.

Architectural overview

The following diagram illustrates the components of this solution’s architecture.

Diagram showing the components of this solution's architecture

For this blog post, Microsoft Windows EC2 is Amazon EC2 for Microsoft Windows Server 2012 R2 instances with attached Amazon Elastic Block Store (Amazon EBS) volumes, which are running in your VPC. These instances may be standalone Windows instances running your Windows workloads, or you may have joined them to an Active Directory domain controller. For instances joined to a domain, you can be using Active Directory running on an EC2 for Windows instance, or you can use AWS Directory Service for Microsoft Active Directory.

Amazon EC2 Systems Manager is a scalable tool for remote management of your EC2 instances. You will use the Systems Manager Run Command to install the Amazon Inspector agent. The agent enables EC2 instances to communicate with the Amazon Inspector service and run assessments, which I explain in detail later in this blog post. You also will create a Systems Manager association to keep your EC2 instances up to date with the latest security patches.

You can use the EBS Snapshot Scheduler to schedule automated snapshots at regular intervals. You will use it to set up regular snapshots of your Amazon EBS volumes. EBS Snapshot Scheduler is a prebuilt solution by AWS that you will deploy in your AWS account. With Amazon EBS snapshots, you pay only for the actual data you store. Snapshots save only the data that has changed since the previous snapshot, which minimizes your cost.

You will use Amazon Inspector to run security assessments on your EC2 for Windows Server instance. In this post, I show how to assess if your EC2 for Windows Server instance is vulnerable to any of the more than 50,000 CVEs registered with Amazon Inspector.

In today’s and tomorrow’s posts, I show you how to:

  1. Launch an EC2 instance with an IAM role, Amazon EBS volume, and tags that Systems Manager and Amazon Inspector will use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.
  3. Take EBS snapshots by using EBS Snapshot Scheduler to automate snapshots based on instance tags.
  4. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

Step 1: Launch an EC2 instance

In this section, I show you how to launch your EC2 instances so that you can use Systems Manager with the instances and use instance tags with EBS Snapshot Scheduler to automate snapshots. This requires three things:

  • Create an IAM role for Systems Manager before launching your EC2 instance.
  • Launch your EC2 instance with Amazon EBS and the IAM role for Systems Manager.
  • Add tags to instances so that you can automate policies for which instances you take snapshots of and when.

Create an IAM role for Systems Manager

Before launching your EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the EC2 instance you will launch. AWS already provides a preconfigured policy that you can use for your new role, and it is called AmazonEC2RoleforSSM.

  1. Sign in to the IAM console and choose Roles in the navigation pane. Choose Create new role.
    Screenshot of choosing "Create role"
  2. In the role-creation workflow, choose AWS service > EC2 > EC2 to create a role for an EC2 instance.
    Screenshot of creating a role for an EC2 instance
  3. Choose the AmazonEC2RoleforSSM policy to attach it to the new role you are creating.
    Screenshot of attaching the AmazonEC2RoleforSSM policy to the new role you are creating
  4. Give the role a meaningful name (I chose EC2SSM) and description, and choose Create role.
    Screenshot of giving the role a name and description

Launch your EC2 instance

To follow along, you need an EC2 instance that is running Microsoft Windows Server 2012 R2 and that has an Amazon EBS volume attached. You can use any existing instance you may have or create a new instance.

When launching your new EC2 instance, be sure that:

  • The operating system is Microsoft Windows Server 2012 R2.
  • You attach at least one Amazon EBS volume to the EC2 instance.
  • You attach the newly created IAM role (EC2SSM).
  • The EC2 instance can connect to the internet through a network address translation (NAT) gateway or a NAT instance.
  • You create the tags shown in the following screenshot (you will use them later).

If you are using an already launched EC2 instance, you can attach the newly created role as described in Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console.

Add tags

The final step of configuring your EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this blog post and to configure Amazon Inspector in Part 2. For this example, I add a tag key, Patch Group, and set the value to Windows Servers. I could have other groups of EC2 instances that I treat differently by having the same tag key but a different tag value. For example, I might have a collection of other servers with the Patch Group tag key with a value of IAS Servers.

Screenshot of adding tags

Note: You must wait a few minutes until the EC2 instance becomes available before you can proceed to the next section.

At this point, you now have at least one EC2 instance you can use to configure Systems Manager, use EBS Snapshot Scheduler, and use Amazon Inspector.

Note: If you have a large number of EC2 instances to tag, you may want to use the EC2 CreateTags API rather than manually apply tags to each instance.

Step 2: Configure Systems Manager

In this section, I show you how to use Systems Manager to apply operating system patches to your EC2 instances, and how to manage patch compliance.

To start, I will provide some background information about Systems Manager. Then, I will cover how to:

  • Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
  • Associate a Systems Manager patch baseline with your instance to define which patches Systems Manager should apply.
  • Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
  • Monitor patch compliance to verify the patch state of your instances.

Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is Amazon EC2 Systems Manager?

Patch management is an important measure to prevent malware from infecting your systems. Most malware attacks look for vulnerabilities that are publicly known and in most cases are already patched by the maker of the operating system. These publicly known vulnerabilities are well documented and therefore easier for an attacker to exploit than having to discover a new vulnerability.

Patches for these new vulnerabilities are available through Systems Manager within hours after Microsoft releases them. There are two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your EC2 instance. Second, you must install the Systems Manager agent on your EC2 instance. If you have used a recent Microsoft Windows Server 2012 R2 AMI published by AWS, Amazon has already installed the Systems Manager agent on your EC2 instance. You can confirm this by logging in to an EC2 instance and looking for Amazon SSM Agent under Programs and Features in Windows. To install the Systems Manager agent on an instance that does not have the agent preinstalled or if you want to use the Systems Manager agent on your on-premises servers, see the documentation about installing the Systems Manager agent. If you forgot to attach the newly created role when launching your EC2 instance or if you want to attach the role to already running EC2 instances, see Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI or use the AWS Management Console.

To make sure your EC2 instance receives operating system patches from Systems Manager, you will use the default patch baseline provided and maintained by AWS, and you will define a maintenance window so that you control when your EC2 instances should receive patches. For the maintenance window to be able to run any tasks, you also must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: Systems Manager will use this role instead of EC2. Earlier we created the EC2SSM role with the AmazonEC2RoleforSSM policy, which allowed the Systems Manager agent on our instance to communicate with the Systems Manager service. Here we need a new role with the policy AmazonSSMMaintenanceWindowRole to make sure the Systems Manager service is able to execute commands on our instance.

Create the Systems Manager IAM role

To create the new IAM role for Systems Manager, follow the same procedure as in the previous section, but in Step 3, choose the AmazonSSMMaintenanceWindowRole policy instead of the previously selected AmazonEC2RoleforSSM policy.

Screenshot of creating the new IAM role for Systems Manager

Finish the wizard and give your new role a recognizable name. For example, I named my role MaintenanceWindowRole.

Screenshot of finishing the wizard and giving your new role a recognizable name

By default, only EC2 instances can assume this new role. You must update the trust policy to enable Systems Manager to assume this role.

To update the trust policy associated with this new role:

  1. Navigate to the IAM console and choose Roles in the navigation pane.
  2. Choose MaintenanceWindowRole and choose the Trust relationships tab. Then choose Edit trust relationship.
  3. Update the policy document by copying the following policy and pasting it in the Policy Document box. As you can see, I have added the ssm.amazonaws.com service to the list of allowed Principals that can assume this role. Choose Update Trust Policy.
    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"",
             "Effect":"Allow",
             "Principal":{
                "Service":[
                   "ec2.amazonaws.com",
                   "ssm.amazonaws.com"
               ]
             },
             "Action":"sts:AssumeRole"
          }
       ]
    }

Associate a Systems Manager patch baseline with your instance

Next, you are going to associate a Systems Manager patch baseline with your EC2 instance. A patch baseline defines which patches Systems Manager should apply. You will use the default patch baseline that AWS manages and maintains. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your EC2 instance.

Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Managed Instances. Your new EC2 instance should be available there. If your instance is missing from the list, verify the following:

  1. Go to the EC2 console and verify your instance is running.
  2. Select your instance and confirm you attached the Systems Manager IAM role, EC2SSM.
  3. Make sure that you deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram at the start of this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
  4. Check the Systems Manager Agent logs for any errors.

Now that you have confirmed that Systems Manager can manage your EC2 instance, it is time to associate the AWS maintained patch baseline with your EC2 instance:

  1. Choose Patch Baselines under Systems Manager Services in the navigation pane of the EC2 console.
  2. Choose the default patch baseline as highlighted in the following screenshot, and choose Modify Patch Groups in the Actions drop-down.
    Screenshot of choosing Modify Patch Groups in the Actions drop-down
  3. In the Patch group box, enter the same value you entered under the Patch Group tag of your EC2 instance in “Step 1: Configure your EC2 instance.” In this example, the value I enter is Windows Servers. Choose the check mark icon next to the patch group and choose Close.Screenshot of modifying the patch group

Define a maintenance window

Now that you have successfully set up a role and have associated a patch baseline with your EC2 instance, you will define a maintenance window so that you can control when your EC2 instances should receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your EC2 instances do not all reboot at the same time. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs.

To define a maintenance window:

  1. Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Maintenance Windows. Choose Create a Maintenance Window.
    Screenshot of starting to create a maintenance window in the Systems Manager console
  2. Select the Cron schedule builder to define the schedule for the maintenance window. In the example in the following screenshot, the maintenance window will start every Saturday at 10:00 P.M. UTC.
  3. To specify when your maintenance window will end, specify the duration. In this example, the four-hour maintenance window will end on the following Sunday morning at 2:00 A.M. UTC (in other words, four hours after it started).
  4. Systems manager completes all tasks that are in process, even if the maintenance window ends. In my example, I am choosing to prevent new tasks from starting within one hour of the end of my maintenance window because I estimated my patch operations might take longer than one hour to complete. Confirm the creation of the maintenance window by choosing Create maintenance window.
    Screenshot of completing all boxes in the maintenance window creation process
  5. After creating the maintenance window, you must register the EC2 instance to the maintenance window so that Systems Manager knows which EC2 instance it should patch in this maintenance window. To do so, choose Register new targets on the Targets tab of your newly created maintenance window. You can register your targets by using the same Patch Group tag you used before to associate the EC2 instance with the AWS-provided patch baseline.
    Screenshot of registering new targets
  6. Assign a task to the maintenance window that will install the operating system patches on your EC2 instance:
    1. Open Maintenance Windows in the EC2 console, select your previously created maintenance window, choose the Tasks tab, and choose Register run command task from the Register new task drop-down.
    2. Choose the AWS-RunPatchBaseline document from the list of available documents.
    3. For Parameters:
      1. For Role, choose the role you created previously (called MaintenanceWindowRole).
      2. For Execute on, specify how many EC2 instances Systems Manager should patch at the same time. If you have a large number of EC2 instances and want to patch all EC2 instances within the defined time, make sure this number is not too low. For example, if you have 1,000 EC2 instances, a maintenance window of 4 hours, and 2 hours’ time for patching, make this number at least 500.
      3. For Stop after, specify after how many errors Systems Manager should stop.
      4. For Operation, choose Install to make sure to install the patches.
        Screenshot of stipulating maintenance window parameters

Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. Note that if you don’t want to wait, you can adjust the schedule to run sooner by choosing Edit maintenance window on the Maintenance Windows page of Systems Manager. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed on the Maintenance Windows page of Systems Manager and select your maintenance window.

Screenshot of the maintenance window successfully created

Monitor patch compliance

You also can see the overall patch compliance of all EC2 instances that are part of defined patch groups by choosing Patch Compliance under Systems Manager Services in the navigation pane of the EC2 console. You can filter by Patch Group to see how many EC2 instances within the selected patch group are up to date, how many EC2 instances are missing updates, and how many EC2 instances are in an error state.

Screenshot of monitoring patch compliance

In this section, you have set everything up for patch management on your instance. Now you know how to patch your EC2 instance in a controlled manner and how to check if your EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all EC2 instances you manage.

Summary

In Part 1 of this blog post, I have shown how to configure EC2 instances for use with Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also have shown how to use Systems Manager to keep your Microsoft Windows–based EC2 instances up to date. In Part 2 of this blog post tomorrow, I will show how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any CVEs.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

New – Interactive AWS Cost Explorer API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-interactive-aws-cost-explorer-api/

We launched the AWS Cost Explorer a couple of years ago in order to allow you to track, allocate, and manage your AWS costs. The response to that launch, and to additions that we have made since then, has been very positive. However our customers are, as Jeff Bezos has said, “beautifully, wonderfully, dissatisfied.”

I see this first-hand every day. We launch something and that launch inspires our customers to ask for even more. For example, with many customers going all-in and moving large parts of their IT infrastructure to the AWS Cloud, we’ve had many requests for the raw data that feeds into the Cost Explorer. These customers want to programmatically explore their AWS costs, update ledgers and accounting systems with per-application and per-department costs, and to build high-level dashboards that summarize spending. Some of these customers have been going to the trouble of extracting the data from the charts and reports provided by Cost Explorer!

New Cost Explorer API
Today we are making the underlying data that feeds into Cost Explorer available programmatically. The new Cost Explorer API gives you a set of functions that allow you do everything that I described above. You can retrieve cost and usage data that is filtered and grouped across multiple dimensions (Service, Linked Account, tag, Availability Zone, and so forth), aggregated by day or by month. This gives you the power to start simple (total monthly costs) and to refine your requests to any desired level of detail (writes to DynamoDB tables that have been tagged as production) while getting responses in seconds.

Here are the operations:

GetCostAndUsage – Retrieve cost and usage metrics for a single account or all accounts (master accounts in an organization have access to all member accounts) with filtering and grouping.

GetDimensionValues – Retrieve available filter values for a specified filter over a specified period of time.

GetTags – Retrieve available tag keys and tag values over a specified period of time.

GetReservationUtilization – Retrieve EC2 Reserved Instance utilization over a specified period of time, with daily or monthly granularity plus filtering and grouping.

I believe that these functions, and the data that they return, will give you the ability to do some really interesting things that will give you better insights into your business. For example, you could tag the resources used to support individual marketing campaigns or development projects and then deep-dive into the costs to measure business value. You how have the potential to know, down to the penny, how much you spend on infrastructure for important events like Cyber Monday or Black Friday.

Things to Know
Here are a couple of things to keep in mind as you start to think about ways to make use of the API:

Grouping – The Cost Explorer web application provides you with one level of grouping; the APIs give you two. For example you could group costs or RI utilization by Service and then by Region.

Pagination – The functions can return very large amounts of data and follow the AWS-wide model for pagination by including a nextPageToken if additional data is available. You simply call the same function again, supplying the token, to move forward.

Regions – The service endpoint is in the US East (Northern Virginia) Region and returns usage data for all public AWS Regions.

Pricing – Each API call costs $0.01. To put this into perspective, let’s say you use this API to build a dashboard and it gets 1000 hits per month from your users. Your operating cost for the dashboard should be $10 or so; this is far less expensive than setting up your own systems to extract & ingest the data and respond to interactive queries.

The Cost Explorer API is available now and you can start using it today. To learn more, read about the Cost Explorer API.

Jeff;

Amazon QuickSight Update – Geospatial Visualization, Private VPC Access, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-quicksight-update-geospatial-visualization-private-vpc-access-and-more/

We don’t often recognize or celebrate anniversaries at AWS. With nearly 100 services on our list, we’d be eating cake and drinking champagne several times a week. While that might sound like fun, we’d rather spend our working hours listening to customers and innovating. With that said, Amazon QuickSight has now been generally available for a little over a year and I would like to give you a quick update!

QuickSight in Action
Today, tens of thousands of customers (from startups to enterprises, in industries as varied as transportation, legal, mining, and healthcare) are using QuickSight to analyze and report on their business data.

Here are a couple of examples:

Gemini provides legal evidence procurement for California attorneys who represent injured workers. They have gone from creating custom reports and running one-off queries to creating and sharing dynamic QuickSight dashboards with drill-downs and filtering. QuickSight is used to track sales pipeline, measure order throughput, and to locate bottlenecks in the order processing pipeline.

Jivochat provides a real-time messaging platform to connect visitors to website owners. QuickSight lets them create and share interactive dashboards while also providing access to the underlying datasets. This has allowed them to move beyond the sharing of static spreadsheets, ensuring that everyone is looking at the same and is empowered to make timely decisions based on current data.

Transfix is a tech-powered freight marketplace that matches loads and increases visibility into logistics for Fortune 500 shippers in retail, food and beverage, manufacturing, and other industries. QuickSight has made analytics accessible to both BI engineers and non-technical business users. They scrutinize key business and operational metrics including shipping routes, carrier efficient, and process automation.

Looking Back / Looking Ahead
The feedback on QuickSight has been incredibly helpful. Customers tell us that their employees are using QuickSight to connect to their data, perform analytics, and make high-velocity, data-driven decisions, all without setting up or running their own BI infrastructure. We love all of the feedback that we get, and use it to drive our roadmap, leading to the introduction of over 40 new features in just a year. Here’s a summary:

Looking forward, we are watching an interesting trend develop within our customer base. As these customers take a close look at how they analyze and report on data, they are realizing that a serverless approach offers some tangible benefits. They use Amazon Simple Storage Service (S3) as a data lake and query it using a combination of QuickSight and Amazon Athena, giving them agility and flexibility without static infrastructure. They also make great use of QuickSight’s dashboards feature, monitoring business results and operational metrics, then sharing their insights with hundreds of users. You can read Building a Serverless Analytics Solution for Cleaner Cities and review Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight if you are interested in this approach.

New Features and Enhancements
We’re still doing our best to listen and to learn, and to make sure that QuickSight continues to meet your needs. I’m happy to announce that we are making seven big additions today:

Geospatial Visualization – You can now create geospatial visuals on geographical data sets.

Private VPC Access – You can now sign up to access a preview of a new feature that allows you to securely connect to data within VPCs or on-premises, without the need for public endpoints.

Flat Table Support – In addition to pivot tables, you can now use flat tables for tabular reporting. To learn more, read about Using Tabular Reports.

Calculated SPICE Fields – You can now perform run-time calculations on SPICE data as part of your analysis. Read Adding a Calculated Field to an Analysis for more information.

Wide Table Support – You can now use tables with up to 1000 columns.

Other Buckets – You can summarize the long tail of high-cardinality data into buckets, as described in Working with Visual Types in Amazon QuickSight.

HIPAA Compliance – You can now run HIPAA-compliant workloads on QuickSight.

Geospatial Visualization
Everyone seems to want this feature! You can now take data that contains a geographic identifier (country, city, state, or zip code) and create beautiful visualizations with just a few clicks. QuickSight will geocode the identifier that you supply, and can also accept lat/long map coordinates. You can use this feature to visualize sales by state, map stores to shipping destinations, and so forth. Here’s a sample visualization:

To learn more about this feature, read Using Geospatial Charts (Maps), and Adding Geospatial Data.

Private VPC Access Preview
If you have data in AWS (perhaps in Amazon Redshift, Amazon Relational Database Service (RDS), or on EC2) or on-premises in Teradata or SQL Server on servers without public connectivity, this feature is for you. Private VPC Access for QuickSight uses an Elastic Network Interface (ENI) for secure, private communication with data sources in a VPC. It also allows you to use AWS Direct Connect to create a secure, private link with your on-premises resources. Here’s what it looks like:

If you are ready to join the preview, you can sign up today.

Jeff;

 

How AWS Managed Microsoft AD Helps to Simplify the Deployment and Improve the Security of Active Directory–Integrated .NET Applications

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/how-aws-managed-microsoft-ad-helps-to-simplify-the-deployment-and-improve-the-security-of-active-directory-integrated-net-applications/

Companies using .NET applications to access sensitive user information, such as employee salary, Social Security Number, and credit card information, need an easy and secure way to manage access for users and applications.

For example, let’s say that your company has a .NET payroll application. You want your Human Resources (HR) team to manage and update the payroll data for all the employees in your company. You also want your employees to be able to see their own payroll information in the application. To meet these requirements in a user-friendly and secure way, you want to manage access to the .NET application by using your existing Microsoft Active Directory identities. This enables you to provide users with single sign-on (SSO) access to the .NET application and to manage permissions using Active Directory groups. You also want the .NET application to authenticate itself to access the database, and to limit access to the data in the database based on the identity of the application user.

Microsoft Active Directory supports these requirements through group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD). AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, enables you to manage gMSAs and KCD through your administrative account, helping you to migrate and develop .NET applications that need these native Active Directory features.

In this blog post, I give an overview of how to use AWS Managed Microsoft AD to manage gMSAs and KCD and demonstrate how you can configure a gMSA and KCD in six steps for a .NET application:

  1. Create your AWS Managed Microsoft AD.
  2. Create your Amazon RDS for SQL Server database.
  3. Create a gMSA for your .NET application.
  4. Deploy your .NET application.
  5. Configure your .NET application to use the gMSA.
  6. Configure KCD for your .NET application.

Solution overview

The following diagram shows the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD. The diagram also illustrates authentication and access and is numbered to show the six key steps required to use a gMSA and KCD. To deploy this solution, the AWS Managed Microsoft AD directory must be in the same Amazon Virtual Private Cloud (VPC) as RDS for SQL Server. For this example, my company name is Example Corp., and my directory uses the domain name, example.com.

Diagram showing the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD

Deploy the solution

The following six steps (numbered to correlate with the preceding diagram) walk you through configuring and using a gMSA and KCD.

1. Create your AWS Managed Microsoft AD directory

Using the Directory Service console, create your AWS Managed Microsoft AD directory in your Amazon VPC. In my example, my domain name is example.com.

Image of creating an AWS Managed Microsoft AD directory in an Amazon VPC

2. Create your Amazon RDS for SQL Server database

Using the RDS console, create your Amazon RDS for SQL Server database instance in the same Amazon VPC where your directory is running, and enable Windows Authentication. To enable Windows Authentication, select your directory in the Microsoft SQL Server Windows Authentication section in the Configure Advanced Settings step of the database creation workflow (see the following screenshot).

In my example, I create my Amazon RDS for SQL Server db-example database, and enable Windows Authentication to allow my db-example database to authenticate against my example.com directory.

Screenshot of configuring advanced settings

3. Create a gMSA for your .NET application

Now that you have deployed your directory, database, and application, you can create a gMSA for your .NET application.

To perform the next steps, you must install the Active Directory administration tools on a Windows server that is joined to your AWS Managed Microsoft AD directory domain. If you do not have a Windows server joined to your directory domain, you can deploy a new Amazon EC2 for Microsoft Windows Server instance and join it to your directory domain.

To create a gMSA for your .NET application:

  1. Log on to the instance on which you installed the Active Directory administration tools by using a user that is a member of the Admins security group or the Managed Service Accounts Admins security group in your organizational unit (OU). For my example, I use the Admin user in the example OU.

Screenshot of logging on to the instance on which you installed the Active Directory administration tools

  1. Identify which .NET application servers (hosts) will run your .NET application. Create a new security group in your OU and add your .NET application servers as members of this new group. This allows a group of application servers to use a single gMSA, instead of creating one gMSA for each server. In my example, I create a group, App_server_grp, in my example OU. I also add Appserver1, which is my .NET application server computer name, as a member of this new group.

Screenshot of creating a new security group

  1. Create a gMSA in your directory by running Windows PowerShell from the Start menu. The basic syntax to create the gMSA at the Windows PowerShell command prompt follows.
    PS C:\Users\admin> New-ADServiceAccount -name [gMSAname] -DNSHostName [domainname] -PrincipalsAllowedToRetrieveManagedPassword [AppServersSecurityGroup] -TrustedForDelegation $truedn <Enter>

    In my example, the gMSAname is gMSAexample, the DNSHostName is example.com, and the PrincipalsAllowedToRetrieveManagedPassword is the recently created security group, App_server_grp.

    PS C:\Users\admin> New-ADServiceAccount -name gMSAexample -DNSHostName example.com -PrincipalsAllowedToRetrieveManagedPassword App_server_grp -TrustedForDelegation $truedn <Enter>

    To confirm you created the gMSA, you can run the Get-ADServiceAccount command from the PowerShell command prompt.

    PS C:\Users\admin> Get-ADServiceAccount gMSAexample <Enter>
    
    DistinguishedName : CN=gMSAexample,CN=Managed Service Accounts,DC=example,DC=com
    Enabled           : True
    Name              : gMSAexample
    ObjectClass       : msDS-GroupManagedServiceAccount
    ObjectGUID        : 24d8b68d-36d5-4dc3-b0a9-edbbb5dc8a5b
    SamAccountName    : gMSAexample$
    SID               : S-1-5-21-2100421304-991410377-951759617-1603
    UserPrincipalName :

    You also can confirm you created the gMSA by opening the Active Directory Users and Computers utility located in your Administrative Tools folder, expand the domain (example.com in my case), and expand the Managed Service Accounts folder.
    Screenshot of confirming the creation of the gMSA

4. Deploy your .NET application

Deploy your .NET application on IIS on Amazon EC2 for Windows Server instances. For this step, I assume you are the application’s expert and already know how to deploy it. Make sure that all of your instances are joined to your directory.

5. Configure your .NET application to use the gMSA

You can configure your .NET application to use the gMSA to enforce strong password security policy and ensure password rotation of your service account. This helps to improve the security and simplify the management of your .NET application. Configure your .NET application in two steps:

  1. Grant to gMSA the required permissions to run your .NET application in the respective application folders. This is a critical step because when you change the application pool identity account to use gMSA, downtime can occur if the gMSA does not have the application’s required permissions. Therefore, make sure you first test the configurations in your development and test environments.
  2. Configure your application pool identity on IIS to use the gMSA as the service account. When you configure a gMSA as the service account, you include the $ at the end of the gMSA name. You do not need to provide a password because AWS Managed Microsoft AD automatically creates and rotates the password. In my example, my service account is gMSAexample$, as shown in the following screenshot.

Screenshot of configuring application pool identity

You have completed all the steps to use gMSA to create and rotate your .NET application service account password! Now, you will configure KCD for your .NET application.

6. Configure KCD for your .NET application

You now are ready to allow your .NET application to have access to other services by using the user identity’s permissions instead of the application service account’s permissions. Note that KCD and gMSA are independent features, which means you do not have to create a gMSA to use KCD. For this example, I am using both features to show how you can use them together. To configure a regular service account such as a user or local built-in account, see the Kerberos constrained delegation with ASP.NET blog post on MSDN.

In my example, my goal is to delegate to the gMSAexample account the ability to enforce the user’s permissions to my db-example SQL Server database, instead of the gMSAexample account’s permissions. For this, I have to update the msDS-AllowedToDelegateTo gMSA attribute. The value for this attribute is the service principal name (SPN) of the service instance that you are targeting, which in this case is the db-example Amazon RDS for SQL Server database.

The SPN format for the msDS-AllowedToDelegateTo attribute is a combination of the service class, the Kerberos authentication endpoint, and the port number. The Amazon RDS for SQL Server Kerberos authentication endpoint format is [database_name].[domain_name]. The value for my msDS-AllowedToDelegateTo attribute is MSSQLSvc/db-example.example.com:1433, where MSSQLSvc and 1433 are the SQL Server Database service class and port number standards, respectively.

Follow these steps to perform the msDS-AllowedToDelegateTo gMSA attribute configuration:

  1. Log on to your Active Directory management instance with a user identity that is a member of the Kerberos Delegation Admins security group. In this case, I will use admin.
  2. Open the Active Directory Users and Groups utility located in your Administrative Tools folder, choose View, and then choose Advanced Features.
  3. Expand your domain name (example.com in this example), and then choose the Managed Service Accounts security group. Right-click the gMSA account for the application pool you want to enable for Kerberos delegation, choose Properties, and choose the Attribute Editor tab.
  4. Search for the msDS-AllowedToDelegateTo attribute on the Attribute Editor tab and choose Edit.
  5. Enter the MSSQLSvc/db-example.example.com:1433 value and choose Add.
    Screenshot of entering the value of the multi-valued string
  6. Choose OK and Apply, and your KCD configuration is complete.

Congratulations! At this point, your application is using a gMSA rather than an embedded static user identity and password, and the application is able to access SQL Server using the identity of the application user. The gMSA eliminates the need for you to rotate the application’s password manually, and it allows you to better scope permissions for the application. When you use KCD, you can enforce access to your database consistently based on user identities at the database level, which prevents improper access that might otherwise occur because of an application error.

Summary

In this blog post, I demonstrated how to simplify the deployment and improve the security of your .NET application by using a group Managed Service Account and Kerberos constrained delegation with your AWS Managed Microsoft AD directory. I also outlined the main steps to get your .NET environment up and running on a managed Active Directory and SQL Server infrastructure. This approach will make it easier for you to build new .NET applications in the AWS Cloud or migrate existing ones in a more secure way.

For additional information about using group Managed Service Accounts and Kerberos constrained delegation with your AWS Managed Microsoft AD directory, see the AWS Directory Service documentation.

To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions about this post or its solution, start a new thread on the Directory Service forum.

– Peter

Amazon EC2 Update – X1e Instances in Five More Sizes and a Stronger SLA

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-ec2-update-x1e-instances-in-five-more-sizes-and-a-stronger-sla/

Earlier this year we launched the x1e.32xlarge instances in four AWS Regions with 4 TB of memory. Today, two months after that launch, customers are using these instances to run high-performance relational and NoSQL databases, in-memory databases, and other enterprise applications that are able to take advantage of large amounts of memory.

Five More Sizes of X1e
I am happy to announce that we are extending the memory-optimized X1e family with five additional instance sizes. Here’s the lineup:

Model vCPUs Memory (GiB) SSD Storage (GB) Networking Performance
x1e.xlarge 4 122 120 Up to 10 Gbps
x1e.2xlarge 8 244 240 Up to 10 Gbps
x1e.4xlarge 16 488 480 Up to 10 Gbps
x1e.8xlarge 32 976 960 Up to 10 Gbps
x1e.16xlarge 64 1,952 1,920 10 Gbps
x1e.32xlarge 128 3,904 3,840 25 Gbps

The instances are powered by quad socket Intel® Xeon® E7 8880 processors running at 2.3 GHz, with large L3 caches and plenty of memory bandwidth. ENA networking and EBS optimization are standard, with up to 14 Gbps of dedicated throughput (depending on instance size) to EBS.

As part of today’s launch we are also making all sizes of X1e available in the Asia Pacific (Sydney) Region. This means that you can now launch them in On-Demand and Reserved Instance form in the US East (Northern Virginia), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Sydney) Regions.

Stronger EC2 SLA
I also have another piece of good news!

Effective immediately, we are increasing the EC2 Service Level Agreement (SLA) for both EC2 and EBS to 99.99%, for all regions and for all AWS customers. This change was made possible by our continuous investment in infrastructure and quality of service, along with our focus on operational excellence.

Jeff;

Use the New Visual Editor to Create and Modify Your AWS IAM Policies

Post Syndicated from Joy Chatterjee original https://aws.amazon.com/blogs/security/use-the-new-visual-editor-to-create-and-modify-your-aws-iam-policies/

Today, AWS Identity and Access Management (IAM) made it easier for you to create and modify your IAM policies by using a point-and-click visual editor in the IAM console. The new visual editor guides you through granting permissions for IAM policies without requiring you to write policies in JSON (although you can still author and edit policies in JSON, if you prefer). This update to the IAM console makes it easier to grant least privilege for the AWS service actions you select by listing all the supported resource types and request conditions you can specify. Policy summaries identify unrecognized services and actions and permissions errors when you import existing policies, and now you can use the visual editor to correct them. In this blog post, I give a brief overview of policy concepts and show you how to create a new policy by using the visual editor.

IAM policy concepts

You use IAM policies to define permissions for your IAM entities (groups, users, and roles). Policies are composed of one or more statements that include the following elements:

  • Effect: Determines if a policy statement allows or explicitly denies access.
  • Action: Defines AWS service actions in a policy (these typically map to individual AWS APIs.)
  • Resource: Defines the AWS resources to which actions can apply. The defined resources must be supported by the actions defined in the Action element for permissions to be granted.
  • Condition: Defines when a permission is allowed or denied. The conditions defined in a policy must be supported by the actions defined in the Action element for the permission to be granted.

To grant permissions, you attach policies to groups, users, or roles. Now that I have reviewed the elements of a policy, I will demonstrate how to create an IAM policy with the visual editor.

How to create an IAM policy with the visual editor

Let’s say my human resources (HR) recruiter, Casey, needs to review files located in an Amazon S3 bucket for all the product manager (PM) candidates our HR team has interviewed in 2017. To grant this access, I will create and attach a policy to Casey that grants list and limited read access to all folders that begin with PM_Candidate in the pmrecruiting2017 S3 bucket. To create this new policy, I navigate to the Policies page in the IAM console and choose Create policy. Note that I could also use the visual editor to modify existing policies by choosing Import existing policy; however, for Casey, I will create a new policy.

Image of the "Create policy" button

On the Visual editor tab, I see a section that includes Service, Actions, Resources, and Request Conditions.

Image of the "Visual editor" tab

Select a service

To grant S3 permissions, I choose Select a service, type S3 in the search box, and choose S3 from the list.

Image of choosing "S3"

Select actions

After selecting S3, I can define actions for Casey by using one of four options:

  1. Filter actions in the service by using the search box.
  2. Type actions by choosing Add action next to Manual actions. For example, I can type List* to grant all S3 actions that begin with List*.
  3. Choose access levels from List, Read, Write, Permissions management, and Tagging.
  4. Select individual actions by expanding each access level.

In the following screenshot, I choose options 3 and 4, and choose List and s3:GetObject from the Read access level.

Screenshot of options in the "Select actions" section

We introduced access levels when we launched policy summaries earlier in 2017. Access levels give you a way to categorize actions and help you understand the permissions in a policy. The following table gives you a quick overview of access levels.

Access level Description Example actions
List Actions that allow you to see a list of resources s3:ListBucket, s3:ListAllMyBuckets
Read Actions that allow you to read the content in resources s3:GetObject, s3:GetBucketTagging
Write Actions that allow you to create, delete, or modify resources s3:PutObject, s3:DeleteBucket
Permissions management Actions that allow you to grant or modify permissions to resources s3:PutBucketPolicy
Tagging Actions that allow you to create, delete, or modify tags
Note: Some services support authorization based on tags.
s3:PutBucketTagging, s3:DeleteObjectVersionTagging

Note: By default, all actions you choose will be allowed. To deny actions, choose Switch to deny permissions in the upper right corner of the Actions section.

As shown in the preceding screenshot, if I choose the question mark icon next to GetObject, I can see the description and supported resources and conditions for this action, which can help me scope permissions.

Screenshot of GetObject

The visual editor makes it easy to decide which actions I should select by providing in an integrated documentation panel the action description, supported resources or conditions, and any required actions for every AWS service action. Some AWS service actions have required actions, which are other AWS service actions that need to be granted in a policy for an action to run. For example, the AWS Directory Service action, ds:CreateDirectory, requires seven Amazon EC2 actions to be able to create a Directory Service directory.

Choose resources

In the Resources section, I can choose the resources on which actions can be taken. I choose Resources and see two ways that I can define or select resources:

  1. Define specific resources
  2. Select all resources

Specific is the default option, and only the applicable resources are presented based on the service and actions I chose previously. Because I want to grant Casey access to some objects in a specific bucket, I choose Specific and choose Add ARN under bucket.

Screenshot of Resources section

In the pop-up, I type the bucket name, pmrecruiting2017, and choose Add to specify the S3 bucket resource.

Screenshot of specifying the S3 bucket resource

To specify the objects, I choose Add ARN under object and grant Casey access to all objects starting with PM_Candidate in the pmrecruiting2017 bucket. The visual editor helps you build your Amazon Resource Name (ARN) and validates that it is structured correctly. For AWS services that are AWS Region specific, the visual editor prompts for AWS Region and account number.

The visual editor displays all applicable resources in the Resources section based on the actions I choose. For Casey, I defined an S3 bucket and object in the Resources section. In this example, when the visual editor creates the policy, it creates three statements. The first statement includes all actions that require a wildcard (*) for the Resource element because this action does not support resource-level permissions. The second statement includes all S3 actions that support an S3 bucket. The third statement includes all actions that support an S3 object resource. The visual editor generates policy syntax for you based on supported permissions in AWS services.

Specify request conditions

For additional security, I specify a condition to restrict access to the S3 bucket from inside our internal network. To do this, I choose Specify request conditions in the Request Conditions section, and choose the Source IP check box. A condition is composed of a condition key, an operator, and a value. I choose aws:SourceIp for my Key so that I can control from where the S3 files can be accessed. By default, IpAddress is the Operator, and I set the Value to my internal network.

Screenshot of "Request conditions" section

To add other conditions, choose Add condition and choose Save changes after choosing the key, operator, and value.

After specifying my request condition, I am now able to review all the elements of these S3 permissions.

Screenshot of S3 permissions

Next, I can choose to grant permissions for another service by choosing Add new permissions (bottom left of preceding screenshot), or I can review and create this new policy. Because I have granted all the permissions Casey needs, I choose Review policy. I type a name and a description, and I review the policy summary before choosing Create policy. 

Now that I have created the policy, I attach it to Casey by choosing the Attached entities tab of the policy I just created. I choose Attach and choose Casey. I then choose Attach policy. Casey should now be able to access the interview files she needs to review.

Summary

The visual editor makes it easier to create and modify your IAM policies by guiding you through each element of the policy. The visual editor helps you define resources and request conditions so that you can grant least privilege and generate policies. To start using the visual editor, sign in to the IAM console, navigate to the Policies page, and choose Create policy.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

– Joy

Resume AWS Step Functions from Any State

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/


Yash Pant, Solutions Architect, AWS


Aaron Friedman, Partner Solutions Architect, AWS

When we discuss how to build applications with customers, we often align to the Well Architected Framework pillars of security, reliability, performance efficiency, cost optimization, and operational excellence. Designing for failure is an essential component to developing well architected applications that are resilient to spurious errors that may occur.

There are many ways you can use AWS services to achieve high availability and resiliency of your applications. For example, you can couple Elastic Load Balancing with Auto Scaling and Amazon EC2 instances to build highly available applications. Or use Amazon API Gateway and AWS Lambda to rapidly scale out a microservices-based architecture. Many AWS services have built in solutions to help with the appropriate error handling, such as Dead Letter Queues (DLQ) for Amazon SQS or retries in AWS Batch.

AWS Step Functions is an AWS service that makes it easy for you to coordinate the components of distributed applications and microservices. Step Functions allows you to easily design for failure, by incorporating features such as error retries and custom error handling from AWS Lambda exceptions. These features allow you to programmatically handle many common error modes and build robust, reliable applications.

In some rare cases, however, your application may fail in an unexpected manner. In these situations, you might not want to duplicate in a repeat execution those portions of your state machine that have already run. This is especially true when orchestrating long-running jobs or executing a complex state machine as part of a microservice. Here, you need to know the last successful state in your state machine from which to resume, so that you don’t duplicate previous work. In this post, we present a solution to enable you to resume from any given state in your state machine in the case of an unexpected failure.

Resuming from a given state

To resume a failed state machine execution from the state at which it failed, you first run a script that dynamically creates a new state machine. When the new state machine is executed, it resumes the failed execution from the point of failure. The script contains the following two primary steps:

  1. Parse the execution history of the failed execution to find the name of the state at which it failed, as well as the JSON input to that state.
  2. Create a new state machine, which adds an additional state to failed state machine, called "GoToState". "GoToState" is a choice state at the beginning of the state machine that branches execution directly to the failed state, allowing you to skip states that had succeeded in the previous execution.

The full script along with a CloudFormation template that creates a demo of this is available in the aws-sfn-resume-from-any-state GitHub repo.

Diving into the script

In this section, we walk you through the script and highlight the core components of its functionality. The script contains a main function, which adds a command line parameter for the failedExecutionArn so that you can easily call the script from the command line:

python gotostate.py --failedExecutionArn '<Failed_Execution_Arn>'

Identifying the failed state in your execution

First, the script extracts the name of the failed state along with the input to that state. It does so by using the failed state machine execution history, which is identified by the Amazon Resource Name (ARN) of the execution. The failed state is marked in the execution history, along with the input to that state (which is also the output of the preceding successful state). The script is able to parse these values from the log.

The script loops through the execution history of the failed state machine, and traces it backwards until it finds the failed state. If the state machine failed in a parallel state, then it must restart from the beginning of the parallel state. The script is able to capture the name of the parallel state that failed, rather than any substate within the parallel state that may have caused the failure. The following code is the Python function that does this.


def parseFailureHistory(failedExecutionArn):

    '''
    Parses the execution history of a failed state machine to get the name of failed state and the input to the failed state:
    Input failedExecutionArn = A string containing the execution ARN of a failed state machine y
    Output = A list with two elements: [name of failed state, input to failed state]
    '''
    failedAtParallelState = False
    try:
        #Get the execution history
        response = client.get\_execution\_history(
            executionArn=failedExecutionArn,
            reverseOrder=True
        )
        failedEvents = response['events']
    except Exception as ex:
        raise ex
    #Confirm that the execution actually failed, raise exception if it didn't fail.
    try:
        failedEvents[0]['executionFailedEventDetails']
    except:
        raise('Execution did not fail')
        
    '''
    If you have a 'States.Runtime' error (for example, if a task state in your state machine attempts to execute a Lambda function in a different region than the state machine), get the ID of the failed state, and use it to determine the failed state name and input.
    '''
    
    if failedEvents[0]['executionFailedEventDetails']['error'] == 'States.Runtime':
        failedId = int(filter(str.isdigit, str(failedEvents[0]['executionFailedEventDetails']['cause'].split()[13])))
        failedState = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['name']
        failedInput = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['input']
        return (failedState, failedInput)
        
    '''
    You need to loop through the execution history, tracing back the executed steps.
    The first state you encounter is the failed state. If you failed on a parallel state, you need the name of the parallel state rather than the name of a state within a parallel state that it failed on. This is because you can only attach goToState to the parallel state, but not a substate within the parallel state.
    This loop starts with the ID of the latest event and uses the previous event IDs to trace back the execution to the beginning (id 0). However, it returns as soon it finds the name of the failed state.
    '''

    currentEventId = failedEvents[0]['id']
    while currentEventId != 0:
        #multiply event ID by -1 for indexing because you're looking at the reversed history
        currentEvent = failedEvents[-1 \* currentEventId]
        
        '''
        You can determine if the failed state was a parallel state because it and an event with 'type'='ParallelStateFailed' appears in the execution history before the name of the failed state
        '''

        if currentEvent['type'] == 'ParallelStateFailed':
            failedAtParallelState = True

        '''
        If the failed state is not a parallel state, then the name of failed state to return is the name of the state in the first 'TaskStateEntered' event type you run into when tracing back the execution history
        '''

        if currentEvent['type'] == 'TaskStateEntered' and failedAtParallelState == False:
            failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)

        '''
        If the failed state was a parallel state, then you need to trace execution back to the first event with 'type'='ParallelStateEntered', and return the name of the state
        '''

        if currentEvent['type'] == 'ParallelStateEntered' and failedAtParallelState:
            failedState = failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)
        #Update the ID for the next execution of the loop
        currentEventId = currentEvent['previousEventId']
        

Create the new state machine

The script uses the name of the failed state to create the new state machine, with "GoToState" branching execution directly to the failed state.

To do this, the script requires the Amazon States Language (ASL) definition of the failed state machine. It modifies the definition to append "GoToState", and create a new state machine from it.

The script gets the ARN of the failed state machine from the execution ARN of the failed state machine. This ARN allows it to get the ASL definition of the failed state machine by calling the DesribeStateMachine API action. It creates a new state machine with "GoToState".

When the script creates the new state machine, it also adds an additional input variable called "resuming". When you execute this new state machine, you specify this resuming variable as true in the input JSON. This tells "GoToState" to branch execution to the state that had previously failed. Here’s the function that does this:

def attachGoToState(failedStateName, stateMachineArn):

    '''
    Given a state machine ARN and the name of a state in that state machine, create a new state machine that starts at a new choice state called 'GoToState'. "GoToState" branches to the named state, and sends the input of the state machine to that state, when a variable called "resuming" is set to True.
    Input failedStateName = A string with the name of the failed state
          stateMachineArn = A string with the ARN of the state machine
    Output response from the create_state_machine call, which is the API call that creates a new state machine
    '''

    try:
        response = client.describe\_state\_machine(
            stateMachineArn=stateMachineArn
        )
    except:
        raise('Could not get ASL definition of state machine')
    roleArn = response['roleArn']
    stateMachine = json.loads(response['definition'])
    #Create a name for the new state machine
    newName = response['name'] + '-with-GoToState'
    #Get the StartAt state for the original state machine, because you point the 'GoToState' to this state
    originalStartAt = stateMachine['StartAt']

    '''
    Create the GoToState with the variable $.resuming.
    If new state machine is executed with $.resuming = True, then the state machine skips to the failed state.
    Otherwise, it executes the state machine from the original start state.
    '''

    goToState = {'Type':'Choice', 'Choices':[{'Variable':'$.resuming', 'BooleanEquals':False, 'Next':originalStartAt}], 'Default':failedStateName}
    #Add GoToState to the set of states in the new state machine
    stateMachine['States']['GoToState'] = goToState
    #Add StartAt
    stateMachine['StartAt'] = 'GoToState'
    #Create new state machine
    try:
        response = client.create_state_machine(
            name=newName,
            definition=json.dumps(stateMachine),
            roleArn=roleArn
        )
    except:
        raise('Failed to create new state machine with GoToState')
    return response

Testing the script

Now that you understand how the script works, you can test it out.

The following screenshot shows an example state machine that has failed, called "TestMachine". This state machine successfully completed "FirstState" and "ChoiceState", but when it branched to "FirstMatchState", it failed.

Use the script to create a new state machine that allows you to rerun this state machine, but skip the "FirstState" and the "ChoiceState" steps that already succeeded. You can do this by calling the script as follows:

python gotostate.py --failedExecutionArn 'arn:aws:states:us-west-2:<AWS_ACCOUNT_ID>:execution:TestMachine-with-GoToState:b2578403-f41d-a2c7-e70c-7500045288595

This creates a new state machine called "TestMachine-with-GoToState", and returns its ARN, along with the input that had been sent to "FirstMatchState". You can then inspect the input to determine what caused the error. In this case, you notice that the input to "FirstMachState" was the following:

{
"foo": 1,
"Message": true
}

However, this state machine expects the "Message" field of the JSON to be a string rather than a Boolean. Execute the new "TestMachine-with-GoToState" state machine, change the input to be a string, and add the "resuming" variable that "GoToState" requires:

{
"foo": 1,
"Message": "Hello!",
"resuming":true
}

When you execute the new state machine, it skips "FirstState" and "ChoiceState", and goes directly to "FirstMatchState", which was the state that failed:

Look at what happens when you have a state machine with multiple parallel steps. This example is included in the GitHub repository associated with this post. The repo contains a CloudFormation template that sets up this state machine and provides instructions to replicate this solution.

The following state machine, "ParallelStateMachine", takes an input through two subsequent parallel states before doing some final processing and exiting, along with the JSON with the ASL definition of the state machine.

{
  "Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
  "StartAt": "Parallel",
  "States": {
    "Parallel": {
      "Type": "Parallel",
      "ResultPath":"$.output",
      "Next": "Parallel 2",
      "Branches": [
        {
          "StartAt": "Parallel Step 1, Process 1",
          "States": {
            "Parallel Step 1, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 1, Process 2",
          "States": {
            "Parallel Step 1, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        }
      ]
    },
    "Parallel 2": {
      "Type": "Parallel",
      "Next": "Final Processing",
      "Branches": [
        {
          "StartAt": "Parallel Step 2, Process 1",
          "States": {
            "Parallel Step 2, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 2, Process 2",
          "States": {
            "Parallel Step 2, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        }
      ]
    },
    "Final Processing": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
      "End": true
    }
  }
}

First, use an input that initially fails:

{
  "Message": "Hello!"
}

This fails because the state machine expects you to have a variable in the input JSON called "foo" in the second parallel state to run "Parallel Step 2, Process 1" and "Parallel Step 2, Process 2". Instead, the original input gets processed by the first parallel state and produces the following output to pass to the second parallel state:

{
"output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
}

Run the script on the failed state machine to create a new state machine that allows it to resume directly at the second parallel state instead of having to redo the first parallel state. This creates a new state machine called "ParallelStateMachine-with-GoToState". The following JSON was created by the script to define the new state machine in ASL. It contains the "GoToState" value that was attached by the script.

{
   "Comment":"An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
   "States":{
      "Final Processing":{
         "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
         "End":true,
         "Type":"Task"
      },
      "GoToState":{
         "Default":"Parallel 2",
         "Type":"Choice",
         "Choices":[
            {
               "Variable":"$.resuming",
               "BooleanEquals":false,
               "Next":"Parallel"
            }
         ]
      },
      "Parallel":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 1, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 1"
            },
            {
               "States":{
                  "Parallel Step 1, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 2"
            }
         ],
         "ResultPath":"$.output",
         "Type":"Parallel",
         "Next":"Parallel 2"
      },
      "Parallel 2":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 2, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 1"
            },
            {
               "States":{
                  "Parallel Step 2, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 2"
            }
         ],
         "Type":"Parallel",
         "Next":"Final Processing"
      }
   },
   "StartAt":"GoToState"
}

You can then execute this state machine with the correct input by adding the "foo" and "resuming" variables:

{
  "foo": 1,
  "output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
  "resuming": true
}

This yields the following result. Notice that this time, the state machine executed successfully to completion, and skipped the steps that had previously failed.


Conclusion

When you’re building out complex workflows, it’s important to be prepared for failure. You can do this by taking advantage of features such as automatic error retries in Step Functions and custom error handling of Lambda exceptions.

Nevertheless, state machines still have the possibility of failing. With the methodology and script presented in this post, you can resume a failed state machine from its point of failure. This allows you to skip the execution of steps in the workflow that had already succeeded, and recover the process from the point of failure.

To see more examples, please visit the Step Functions Getting Started page.

If you have questions or suggestions, please comment below.

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Introducing Cloud Native Networking for Amazon ECS Containers

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/introducing-cloud-native-networking-for-ecs-containers/

This post courtesy of ECS Sr. Software Dev Engineer Anirudh Aithal.

Today, AWS announced Task Networking for Amazon ECS. This feature brings Amazon EC2 networking capabilities to tasks using elastic network interfaces.

An elastic network interface is a virtual network interface that you can attach to an instance in a VPC. When you launch an EC2 virtual machine, an elastic network interface is automatically provisioned to provide networking capabilities for the instance.

A task is a logical group of running containers. Previously, tasks running on Amazon ECS shared the elastic network interface of their EC2 host. Now, the new awsvpc networking mode lets you attach an elastic network interface directly to a task.

This simplifies network configuration, allowing you to treat each container just like an EC2 instance with full networking features, segmentation, and security controls in the VPC.

In this post, I cover how awsvpc mode works and show you how you can start using elastic network interfaces with your tasks running on ECS.

Background:  Elastic network interfaces in EC2

When you launch EC2 instances within a VPC, you don’t have to configure an additional overlay network for those instances to communicate with each other. By default, routing tables in the VPC enable seamless communication between instances and other endpoints. This is made possible by virtual network interfaces in VPCs called elastic network interfaces. Every EC2 instance that launches is automatically assigned an elastic network interface (the primary network interface). All networking parameters—such as subnets, security groups, and so on—are handled as properties of this primary network interface.

Furthermore, an IPv4 address is allocated to every elastic network interface by the VPC at creation (the primary IPv4 address). This primary address is unique and routable within the VPC. This effectively makes your VPC a flat network, resulting in a simple networking topology.

Elastic network interfaces can be treated as fundamental building blocks for connecting various endpoints in a VPC, upon which you can build higher-level abstractions. This allows elastic network interfaces to be leveraged for:

  • VPC-native IPv4 addressing and routing (between instances and other endpoints in the VPC)
  • Network traffic isolation
  • Network policy enforcement using ACLs and firewall rules (security groups)
  • IPv4 address range enforcement (via subnet CIDRs)

Why use awsvpc?

Previously, ECS relied on the networking capability provided by Docker’s default networking behavior to set up the network stack for containers. With the default bridge network mode, containers on an instance are connected to each other using the docker0 bridge. Containers use this bridge to communicate with endpoints outside of the instance, using the primary elastic network interface of the instance on which they are running. Containers share and rely on the networking properties of the primary elastic network interface, including the firewall rules (security group subscription) and IP addressing.

This means you cannot address these containers with the IP address allocated by Docker (it’s allocated from a pool of locally scoped addresses), nor can you enforce finely grained network ACLs and firewall rules. Instead, containers are addressable in your VPC by the combination of the IP address of the primary elastic network interface of the instance, and the host port to which they are mapped (either via static or dynamic port mapping). Also, because a single elastic network interface is shared by multiple containers, it can be difficult to create easily understandable network policies for each container.

The awsvpc networking mode addresses these issues by provisioning elastic network interfaces on a per-task basis. Hence, containers no longer share or contend use these resources. This enables you to:

  • Run multiple copies of the container on the same instance using the same container port without needing to do any port mapping or translation, simplifying the application architecture.
  • Extract higher network performance from your applications as they no longer contend for bandwidth on a shared bridge.
  • Enforce finer-grained access controls for your containerized applications by associating security group rules for each Amazon ECS task, thus improving the security for your applications.

Associating security group rules with a container or containers in a task allows you to restrict the ports and IP addresses from which your application accepts network traffic. For example, you can enforce a policy allowing SSH access to your instance, but blocking the same for containers. Alternatively, you could also enforce a policy where you allow HTTP traffic on port 80 for your containers, but block the same for your instances. Enforcing such security group rules greatly reduces the surface area of attack for your instances and containers.

ECS manages the lifecycle and provisioning of elastic network interfaces for your tasks, creating them on-demand and cleaning them up after your tasks stop. You can specify the same properties for the task as you would when launching an EC2 instance. This means that containers in such tasks are:

  • Addressable by IP addresses and the DNS name of the elastic network interface
  • Attachable as ‘IP’ targets to Application Load Balancers and Network Load Balancers
  • Observable from VPC flow logs
  • Access controlled by security groups

­This also enables you to run multiple copies of the same task definition on the same instance, without needing to worry about port conflicts. You benefit from higher performance because you don’t need to perform any port translations or contend for bandwidth on the shared docker0 bridge, as you do with the bridge networking mode.

Getting started

If you don’t already have an ECS cluster, you can create one using the create cluster wizard. In this post, I use “awsvpc-demo” as the cluster name. Also, if you are following along with the command line instructions, make sure that you have the latest version of the AWS CLI or SDK.

Registering the task definition

The only change to make in your task definition for task networking is to set the networkMode parameter to awsvpc. In the ECS console, enter this value for Network Mode.

 

If you plan on registering a container in this task definition with an ECS service, also specify a container port in the task definition. This example specifies an NGINX container exposing port 80:

This creates a task definition named “nginx-awsvpc" with networking mode set to awsvpc. The following commands illustrate registering the task definition from the command line:

$ cat nginx-awsvpc.json
{
        "family": "nginx-awsvpc",
        "networkMode": "awsvpc",
        "containerDefinitions": [
            {
                "name": "nginx",
                "image": "nginx:latest",
                "cpu": 100,
                "memory": 512,
                "essential": true,
                "portMappings": [
                  {
                    "containerPort": 80,
                    "protocol": "tcp"
                  }
                ]
            }
        ]
}

$ aws ecs register-task-definition --cli-input-json file://./nginx-awsvpc.json

Running the task

To run a task with this task definition, navigate to the cluster in the Amazon ECS console and choose Run new task. Specify the task definition as “nginx-awsvpc“. Next, specify the set of subnets in which to run this task. You must have instances registered with ECS in at least one of these subnets. Otherwise, ECS can’t find a candidate instance to attach the elastic network interface.

You can use the console to narrow down the subnets by selecting a value for Cluster VPC:

 

Next, select a security group for the task. For the purposes of this example, create a new security group that allows ingress only on port 80. Alternatively, you can also select security groups that you’ve already created.

Next, run the task by choosing Run Task.

You should have a running task now. If you look at the details of the task, you see that it has an elastic network interface allocated to it, along with the IP address of the elastic network interface:

You can also use the command line to do this:

$ aws ecs run-task --cluster awsvpc-ecs-demo --network-configuration "awsvpcConfiguration={subnets=["subnet-c070009b"],securityGroups=["sg-9effe8e4"]}" nginx-awsvpc $ aws ecs describe-tasks --cluster awsvpc-ecs-demo --task $ECS_TASK_ARN --query tasks[0]
{
    "taskArn": "arn:aws:ecs:us-west-2:xx..x:task/f5xx-...",
    "group": "family:nginx-awsvpc",
    "attachments": [
        {
            "status": "ATTACHED",
            "type": "ElasticNetworkInterface",
            "id": "xx..",
            "details": [
                {
                    "name": "subnetId",
                    "value": "subnet-c070009b"
                },
                {
                    "name": "networkInterfaceId",
                    "value": "eni-b0aaa4b2"
                },
                {
                    "name": "macAddress",
                    "value": "0a:47:e4:7a:2b:02"
                },
                {
                    "name": "privateIPv4Address",
                    "value": "10.0.0.35"
                }
            ]
        }
    ],
    ...
    "desiredStatus": "RUNNING",
    "taskDefinitionArn": "arn:aws:ecs:us-west-2:xx..x:task-definition/nginx-awsvpc:2",
    "containers": [
        {
            "containerArn": "arn:aws:ecs:us-west-2:xx..x:container/62xx-...",
            "taskArn": "arn:aws:ecs:us-west-2:xx..x:task/f5x-...",
            "name": "nginx",
            "networkBindings": [],
            "lastStatus": "RUNNING",
            "networkInterfaces": [
                {
                    "privateIpv4Address": "10.0.0.35",
                    "attachmentId": "xx.."
                }
            ]
        }
    ]
}

When you describe an “awsvpc” task, details of the elastic network interface are returned via the “attachments” object. You can also get this information from the “containers” object. For example:

$ aws ecs describe-tasks --cluster awsvpc-ecs-demo --task $ECS_TASK_ARN --query tasks[0].containers[0].networkInterfaces[0].privateIpv4Address
"10.0.0.35"

Conclusion

The nginx container is now addressable in your VPC via the 10.0.0.35 IPv4 address. You did not have to modify the security group on the instance to allow requests on port 80, thus improving instance security. Also, you ensured that all ports apart from port 80 were blocked for this application without modifying the application itself, which makes it easier to manage your task on the network. You did not have to interact with any of the elastic network interface API operations, as ECS handled all of that for you.

You can read more about the task networking feature in the ECS documentation. For a detailed look at how this new networking mode is implemented on an instance, see Under the Hood: Task Networking for Amazon ECS.

Please use the comments section below to send your feedback.

Under the Hood: Task Networking for Amazon ECS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/under-the-hood-task-networking-for-amazon-ecs/

This post courtsey of ECS Sr. Software Dev Engineer Anirudh Aithal.

Today, AWS announced Task Networking for Amazon ECS, which enables elastic network interfaces to be attached to containers.

In this post, I take a closer look at how this new container-native “awsvpc” network mode is implemented using container networking interface plugins on ECS managed instances (referred to as container instances).

This post is a deep dive into how task networking works with Amazon ECS. If you want to learn more about how you can start using task networking for your containerized applications, see Introducing Cloud Native Networking for Amazon ECS Containers. Cloud Native Computing Foundation (CNCF) hosts the Container Networking Interface (CNI) project, which consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers. For more about cloud native computing in AWS, see Adrian Cockcroft’s post on Cloud Native Computing.

Container instance setup

Before I discuss the details of enabling task networking on container instances, look at how a typical instance looks in ECS.

The diagram above shows a typical container instance. The ECS agent, which itself is running as a container, is responsible for:

  • Registering the EC2 instance with the ECS backend
  • Ensuring that task state changes communicated to it by the ECS backend are enacted on the container instance
  • Interacting with the Docker daemon to create, start, stop, and monitor
  • Relaying container state and task state transitions to the ECS backend

Because the ECS agent is just acting as the supervisor for containers under its management, it offloads the problem of setting up networking for containers to either the Docker daemon (for containers configured with one of Docker’s default networking modes) or a set of CNI plugins (for containers in task with networking mode set to awsvpc).

In either case, network stacks of containers are configured via network namespaces. As per the ip-netns(8) manual, “A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.” The network namespace construct makes the partitioning of network stack between processes and containers running on a host possible.

Network namespaces and CNI plugins

CNI plugins are executable files that comply with the CNI specification and configure the network connectivity of containers. The CNI project defines a specification for the plugins and provides a library for interacting with plugins, thus providing a consistent, reliable, and simple interface with which to interact with the plugins.

You specify the container or its network namespace and invoke the plugin with the ADD command to add network interfaces to a container, and then the DEL command to tear them down. For example, the reference bridge plugin adds all containers on the same host into a bridge that resides in the host network namespace.

This plugin model fits in nicely with the ECS agent’s “minimal intrusion in the container lifecycle” model, as the agent doesn’t need to concern itself with the details of the network setup for containers. It’s also an extensible model, which allows the agent to switch to a different set of plugins if the need arises in future. Finally, the ECS agent doesn’t need to monitor the liveliness of these plugins as they are only invoked when required.

Invoking CNI plugins from the ECS agent

When ECS attaches an elastic network interface to the instance and sends the message to the agent to provision the elastic network interface for containers in a task, the elastic network interface (as with any network device) shows up in the global default network namespace of the host. The ECS agent invokes a chain of CNI plugins to ensure that the elastic network interface is configured appropriately in the container’s network namespace. You can review these plugins in the amazon-ecs-cni-plugins GitHub repo.

The first plugin invoked in this chain is the ecs-eni plugin, which ensures that the elastic network interface is attached to container’s network namespace and configured with the VPC-allocated IP addresses and the default route to use the subnet gateway. The container also needs to make HTTP requests to the credentials endpoint (hosted by the ECS agent) for getting IAM role credentials. This is handled by the ecs-bridge and ecs-ipam plugins, which are invoked next. The CNI library provides mechanisms to interpret the results from the execution of these plugins, which results in an efficient error handling in the agent. The following diagram illustrates the different steps in this process:

To avoid the race condition between configuring the network stack and commands being invoked in application containers, the ECS agent creates an additional “pause” container for each task before starting the containers in the task definition. It then sets up the network namespace of the pause container by executing the previously mentioned CNI plugins. It also starts the rest of the containers in the task so that they share their network stack of the pause container. This means that all containers in a task are addressable by the IP addresses of the elastic network interface, and they can communicate with each other over the localhost interface.

In this example setup, you have two containers in a task behind an elastic network interface. The following commands show that they have a similar view of the network stack and can talk to each other over the localhost interface.

List the last three containers running on the host (you launched a task with two containers and the ECS agent launched the additional container to configure the network namespace):

$ docker ps -n 3 --format "{{.ID}}\t{{.Names}}\t{{.Command}}\t{{.Status}}"
7d7b7fbc30b9	ecs-front-envoy-5-envoy-sds-ecs-ce8bd9eca6dd81a8d101	"/bin/sh -c '/usr/..."	Up 3 days
dfdcb2acfc91	ecs-front-envoy-5-front-envoy-faeae686adf9c1d91000	"/bin/sh -c '/usr/..."	Up 3 days
f731f6dbb81c	ecs-front-envoy-5-internalecspause-a8e6e19e909fa9c9e901	"./pause"	Up 3 days

List interfaces for these containers and make sure that they are the same:

$ for id in `docker ps -n 3 -q`; do pid=`docker inspect $id -f '{{.State.Pid}}'`; echo container $id; sudo nsenter -t $pid -n ip link show; done
container 7d7b7fbc30b9
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff

container dfdcb2acfc91
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff

container f731f6dbb81c
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 0a:58:a9:fe:ac:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
27: eth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 02:5a:a1:1a:43:42 brd ff:ff:ff:ff:ff:ff

Conclusion

All of this work means that you can use the new awsvpc networking mode and benefit from native networking support for your containers. You can learn more about using awsvpc mode in Introducing Cloud Native Networking for Amazon ECS Containers or the ECS documentation.

I appreciate your feedback in the comments section. You can also reach me on GitHub in either the ECS CNI Plugins or the ECS Agent repositories.