Tag Archives: CloudWatch Events

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 2

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-2/

Yesterday in Part 1 of this blog post, I showed you how to:

  1. Launch an Amazon EC2 instance with an AWS Identity and Access Management (IAM) role, an Amazon Elastic Block Store (Amazon EBS) volume, and tags that Amazon EC2 Systems Manager (Systems Manager) and Amazon Inspector use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.

Today in Steps 3 and 4, I show you how to:

  1. Take Amazon EBS snapshots using Amazon EBS Snapshot Scheduler to automate snapshots based on instance tags.
  2. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

To catch up on Steps 1 and 2, see yesterday’s blog post.

Step 3: Take EBS snapshots using EBS Snapshot Scheduler

In this section, I show you how to use EBS Snapshot Scheduler to take snapshots of your instances at specific intervals. To do this, I will show you how to:

  • Determine the schedule for EBS Snapshot Scheduler by providing you with best practices.
  • Deploy EBS Snapshot Scheduler by using AWS CloudFormation.
  • Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.

In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.

Determine the schedule

As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.

In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.

Deploy EBS Snapshot Scheduler

In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:

  1. Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
  2. On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
  3. Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.

Tag your EC2 instances

EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:

  • <snapshot time> – Time in 24-hour format with no colon.
  • <retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
  • <time zone> – The time zone of the times specified in <snapshot time>.
  • <active day(s)>all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.

Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.

Tag Value
scheduler:ebs-snapshot:0900 0900;3;utc;weekdays
scheduler:ebs-snapshot:1000 1000;3;utc;weekdays
scheduler:ebs-snapshot:1100 1100;3;utc;weekdays
scheduler:ebs-snapshot:1200 1200;3;utc;weekdays
scheduler:ebs-snapshot:1300 1300;3;utc;weekdays
scheduler:ebs-snapshot:1400 1400;3;utc;weekdays
scheduler:ebs-snapshot:1500 1500;3;utc;weekdays
scheduler:ebs-snapshot:1600 1600;3;utc;weekdays

Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:

  1. Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
  2. Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
  3. Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
    Screenshot of how your EC2 instance should look in the console
  4. After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.Screenshot of snapshots beginning to populate on the Snapshots page of the EC2 console
  5. To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.Screenshot of checking to see if EBS Snapshot Scheduler is active
  1. You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.Screenshot of monitoring when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule

If you want to restore and attach an EBS volume, see Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Step 4: Use Amazon Inspector

In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:

  • Install the Amazon Inspector agent by using EC2 Run Command.
  • Set up notifications using Amazon SNS to notify you of any findings.
  • Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
  • Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.

Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:

  • Common Vulnerabilities and Exposures
  • Center for Internet Security Benchmarks
  • Runtime Behavior Analysis

In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.

Install the Amazon Inspector agent

To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.

  1. Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
    Screenshot of choosing "Run a command"
  2. To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
    Screenshot of installing the Amazon Inspector agent
  3. Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
    Screenshot showing that the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances

Set up notifications using Amazon SNS

Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.

To set up an SNS topic:

  1. In the AWS Management Console, choose Simple Notification Service under Messaging in the Services menu.
  2. Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
    "Create new topic" page
  3. To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
  4. For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
    Screenshot of editing the topic policy
  5. To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
    Screenshot of subscribing to the new topic

Define an Amazon Inspector target and template

Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.

To create an Amazon Inspector target:

  1. Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
  2. For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
    Screenshot of creating an IAM service role for Amazon Inspector
  3. Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
    Screenshot of defining the Amazon Inspector target
  4. Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
    Screenshot of defining an assessment template
  5. Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
    Screenshot of choosing an assessment template
  6. A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
    Screenshot of choosing the previously created topic and the events about which you want SNS to notify you

Schedule Amazon Inspector assessment runs

The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.

Image of Cron schedule

All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.

To create the CloudWatch Events rule:

  1. Navigate to the CloudWatch console, choose Events, and choose Create rule.
    Screenshot of starting to create a rule in the CloudWatch Events console
  2. On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
  3. You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
    Screenshot of scheduling an Amazon Inspector assessment with a Cron expression
  4. Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
    Screenshot of adding a target
  5. Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
  6. Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
    Screenshot of completing the creation of the rule
  7. To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
    Screenshot of confirming the CloudWatch Events rule works
  8. Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
    Screenshot of confirming the launch of the first assessment run

You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.

Conclusion

In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

If you have comments about today’s or yesterday’s post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

Automating Security Group Updates with AWS Lambda

Post Syndicated from Ian Scofield original https://aws.amazon.com/blogs/compute/automating-security-group-updates-with-aws-lambda/

Customers often use public endpoints to perform cross-region replication or other application layer communication to remote regions. But a common problem is how do you protect these endpoints? It can be tempting to open up the security groups to the world due to the complexity of keeping security groups in sync across regions with a dynamically changing infrastructure.

Consider a situation where you are running large clusters of instances in different regions that all require internode connectivity. One approach would be to use a VPN tunnel between regions to provide a secure tunnel over which to send your traffic. A good example of this is the Transit VPC Solution, which is a published AWS solution to help customers quickly get up and running. However, this adds additional cost and complexity to your solution due to the newly required additional infrastructure.

Another approach, which I’ll explore in this post, is to restrict access to the nodes by whitelisting the public IP addresses of your hosts in the opposite region. Today, I’ll outline a solution that allows for cross-region security group updates, can handle remote region failures, and supports external actions such as manually terminating instances or adding instances to an existing Auto Scaling group.

Solution overview

The overview of this solution is diagrammed below. Although this post covers limiting access to your instances, you should still implement encryption to protect your data in transit.

If your entire infrastructure is running in a single region, you can reference a security group as the source, allowing your IP addresses to change without any updates required. However, if you’re going across the public internet between regions to perform things like application-level traffic or cross-region replication, this is no longer an option. Security groups are regional. When you go across regions it can be tempting to drop security to enable this communication.

Although using an Elastic IP address can provide you with a static IP address that you can define as a source for your security groups, this may not always be feasible, especially when automatic scaling is desired.

In this example scenario, you have a distributed database that requires full internode communication for replication. If you place a cluster in us-east-1 and us-west-2, you must provide a secure method of communication between the two. Because the database uses cloud best practices, you can add or remove nodes as the load varies.

To start the process of updating your security groups, you must know when an instance has come online to trigger your workflow. Auto Scaling groups have the concept of lifecycle hooks that enable you to perform custom actions as the group launches or terminates instances.

When Auto Scaling begins to launch or terminate an instance, it puts the instance into a wait state (Pending:Wait or Terminating:Wait). The instance remains in this state while you perform your various actions until either you tell Auto Scaling to Continue, Abandon, or the timeout period ends. A lifecycle hook can trigger a CloudWatch event, publish to an Amazon SNS topic, or send to an Amazon SQS queue. For this example, you use CloudWatch Events to trigger an AWS Lambda function that updates an Amazon DynamoDB table.

Component breakdown

Here’s a quick breakdown of the components involved in this solution:

• Lambda function
• CloudWatch event
• DynamoDB table

Lambda function

The Lambda function automatically updates your security groups, in the following way:

1. Determines whether a change was triggered by your Auto Scaling group lifecycle hook or manually invoked for a “true up” functionality, which I discuss later in this post.
2. Describes the instances in the Auto Scaling group and obtain public IP addresses for each instance.
3. Updates both local and remote DynamoDB tables.
4. Compares the list of public IP addresses for both local and remote clusters with what’s already in the local region security group. Update the security group.
5. Compares the list of public IP addresses for both local and remote clusters with what’s already in the remote region security group. Update the security group
6. Signals CONTINUE back to the lifecycle hook.

CloudWatch event

The CloudWatch event triggers when an instance passes through either the launching or terminating states. When the Lambda function gets invoked, it receives an event that looks like the following:

{
	"account": "123456789012",
	"region": "us-east-1",
	"detail": {
		"LifecycleHookName": "hook-launching",
		"AutoScalingGroupName": "",
		"LifecycleActionToken": "33965228-086a-4aeb-8c26-f82ed3bef495",
		"LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
		"EC2InstanceId": "i-017425ec54f22f994"
	},
	"detail-type": "EC2 Instance-launch Lifecycle Action",
	"source": "aws.autoscaling",
	"version": "0",
	"time": "2017-05-03T02:20:59Z",
	"id": "cb930cf8-ce8b-4b6c-8011-af17966eb7e2",
	"resources": [
		"arn:aws:autoscaling:us-east-1:123456789012:autoScalingGroup:d3fe9d96-34d0-4c62-b9bb-293a41ba3765:autoScalingGroupName/"
	]
}

DynamoDB table

You use DynamoDB to store lists of remote IP addresses in a local table that is updated by the opposite region as a failsafe source of truth. Although you can describe your Auto Scaling group for the local region, you must maintain a list of IP addresses for the remote region.

To minimize the number of describe calls and prevent an issue in the remote region from blocking your local scaling actions, we keep a list of the remote IP addresses in a local DynamoDB table. Each Lambda function in each region is responsible for updating the public IP addresses of its Auto Scaling group for both the local and remote tables.

As with all the infrastructure in this solution, there is a DynamoDB table in both regions that mirror each other. For example, the following screenshot shows a sample DynamoDB table. The Lambda function in us-east-1 would update the DynamoDB entry for us-east-1 in both tables in both regions.

By updating a DynamoDB table in both regions, it allows the local region to gracefully handle issues with the remote region, which would otherwise prevent your ability to scale locally. If the remote region becomes inaccessible, you have a copy of the latest configuration from the table that you can use to continue to sync with your security groups. When the remote region comes back online, it pushes its updated public IP addresses to the DynamoDB table. The security group is updated to reflect the current status by the remote Lambda function.

 

Walkthrough

Note: All of the following steps are performed in both regions. The Launch Stack buttons will default to the us-east-1 region.

Here’s a quick overview of the steps involved in this process:

1. An instance is launched or terminated, which triggers an Auto Scaling group lifecycle hook, triggering the Lambda function via CloudWatch Events.
2. The Lambda function retrieves the list of public IP addresses for all instances in the local region Auto Scaling group.
3. The Lambda function updates the local and remote region DynamoDB tables with the public IP addresses just received for the local Auto Scaling group.
4. The Lambda function updates the local region security group with the public IP addresses, removing and adding to ensure that it mirrors what is present for the local and remote Auto Scaling groups.
5. The Lambda function updates the remote region security group with the public IP addresses, removing and adding to ensure that it mirrors what is present for the local and remote Auto Scaling groups.

Prerequisites

To deploy this solution, you need to have Auto Scaling groups, launch configurations, and a base security group in both regions. To expedite this process, this CloudFormation template can be launched in both regions.

Step 1: Launch the AWS SAM template in the first region

To make the deployment process easy, I’ve created an AWS Serverless Application Model (AWS SAM) template, which is a new specification that makes it easier to manage and deploy serverless applications on AWS. This template creates the following resources:

• A Lambda function, to perform the various security group actions
• A DynamoDB table, to track the state of the local and remote Auto Scaling groups
• Auto Scaling group lifecycle hooks for instance launching and terminating
• A CloudWatch event, to track the EC2 Instance-Launch Lifecycle-Action and EC2 Instance-terminate Lifecycle-Action events
• A pointer from the CloudWatch event to the Lambda function, and the necessary permissions

Download the template from here or click to launch.

Upon launching the template, you’ll be presented with a list of parameters which includes the remote/local names for your Auto Scaling Groups, AWS region, Security Group IDs, DynamoDB table names, as well as where the code for the Lambda function is located. Because this is the first region you’re launching the stack in, fill out all the parameters except for the RemoteTable parameter as it hasn’t been created yet (you fill this in later).

Step 2: Test the local region

After the stack has finished launching, you can test the local region. Open the EC2 console and find the Auto Scaling group that was created when launching the prerequisite stack. Change the desired number of instances from 0 to 1.

For both regions, check your security group to verify that the public IP address of the instance created is now in the security group.

Local region:

Remote region:

Now, change the desired number of instances for your group back to 0 and verify that the rules are properly removed.

Local region:

Remote region:

Step 3: Launch in the remote region

When you deploy a Lambda function using CloudFormation, the Lambda zip file needs to reside in the same region you are launching the template. Once you choose your remote region, create an Amazon S3 bucket and upload the Lambda zip file there. Next, go to the remote region and launch the same SAM template as before, but make sure you update the CodeBucket and CodeKey parameters. Also, because this is the second launch, you now have all the values and can fill out all the parameters, specifically the RemoteTable value.

 

Step 4: Update the local region Lambda environment variable

When you originally launched the template in the local region, you didn’t have the name of the DynamoDB table for the remote region, because you hadn’t created it yet. Now that you have launched the remote template, you can perform a CloudFormation stack update on the initial SAM template. This populates the remote DynamoDB table name into the initial Lambda function’s environment variables.

In the CloudFormation console in the initial region, select the stack. Under Actions, choose Update Stack, and select the SAM template used for both regions. Under Parameters, populate the remote DynamoDB table name, as shown below. Choose Next and let the stack update complete. This updates your Lambda function and completes the setup process.

 

Step 5: Final testing

You now have everything fully configured and in place to trigger security group changes based on instances being added or removed to your Auto Scaling groups in both regions. Test this by changing the desired capacity of your group in both regions.

True up functionality
If an instance is manually added or removed from the Auto Scaling group, the lifecycle hooks don’t get triggered. To account for this, the Lambda function supports a “true up” functionality in which the function can be manually invoked. If you paste in the following JSON text for your test event, it kicks off the entire workflow. For added peace of mind, you can also have this function fire via a CloudWatch event with a CRON expression for nearly continuous checking.

{
	"detail": {
		"AutoScalingGroupName": "<your ASG name>"
	},
	"trueup":true
}

Extra credit

Now that all the resources are created in both regions, go back and break down the policy to incorporate resource-level permissions for specific security groups, Auto Scaling groups, and the DynamoDB tables.

Although this post is centered around using public IP addresses for your instances, you could instead use a VPN between regions. In this case, you would still be able to use this solution to scope down the security groups to the cluster instances. However, the code would need to be modified to support private IP addresses.

 

Conclusion

At this point, you now have a mechanism in place that captures when a new instance is added to or removed from your cluster and updates the security groups in both regions. This ensures that you are locking down your infrastructure securely by allowing access only to other cluster members.

Keep in mind that this architecture (lifecycle hooks, CloudWatch event, Lambda function, and DynamoDB table) requires that the infrastructure to be deployed in both regions, to have synchronization going both ways.

Because this Lambda function is modifying security group rules, it’s important to have an audit log of what has been modified and who is modifying them. The out-of-the-box function provides logs in CloudWatch for what IP addresses are being added and removed for which ports. As these are all API calls being made, they are logged in CloudTrail and can be traced back to the IAM role that you created for your lifecycle hooks. This can provide historical data that can be used for troubleshooting or auditing purposes.

Security is paramount at AWS. We want to ensure that customers are protecting access to their resources. This solution helps you keep your security groups in both regions automatically in sync with your Auto Scaling group resources. Let us know if you have any questions or other solutions you’ve come up with!

Creating a Cost-Efficient Amazon ECS Cluster for Scheduled Tasks

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/creating-a-cost-efficient-amazon-ecs-cluster-for-scheduled-tasks/

Madhuri Peri
Sr. DevOps Consultant

When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.

One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances.  Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.

Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.

In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.

Architecture

The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.

Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.

The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.

The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.

Walkthrough

To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:

  • Amazon ECR repository for storing the Docker image to be used in the task definition
  • S3 bucket that holds the transferred logs
  • Task definition, with image name and S3 bucket as environment variables provided via input parameter
  • CloudWatch Events rule
  • Amazon ECS cluster
  • Amazon ECS container instances using Spot Fleet
  • IAM roles required for the container instance profiles

Before you begin

Ensure that Git, Docker, and the AWS CLI are installed on your computer.

In your AWS account, instantiate one Amazon Aurora instance using the console. For more information, see Creating an Amazon Aurora DB Cluster.

Implementation Steps

  1. Clone the code from GitHub that performs RDS API calls to retrieve the log files.
    git clone https://github.com/awslabs/aws-ecs-scheduled-tasks.git
  2. Build and tag the image.
    cd aws-ecs-scheduled-tasks/container-code/src && ls

    Dockerfile		rdslogsshipper.py	requirements.txt

    docker build -t rdslogsshipper .

    Sending build context to Docker daemon 9.728 kB
    Step 1 : FROM python:3
     ---> 41397f4f2887
    Step 2 : WORKDIR /usr/src/app
     ---> Using cache
     ---> 59299c020e7e
    Step 3 : COPY requirements.txt ./
     ---> 8c017e931c3b
    Removing intermediate container df09e1bed9f2
    Step 4 : COPY rdslogsshipper.py /usr/src/app
     ---> 099a49ca4325
    Removing intermediate container 1b1da24a6699
    Step 5 : RUN pip install --no-cache-dir -r requirements.txt
     ---> Running in 3ed98b30901d
    Collecting boto3 (from -r requirements.txt (line 1))
      Downloading boto3-1.4.6-py2.py3-none-any.whl (128kB)
    Collecting botocore (from -r requirements.txt (line 2))
      Downloading botocore-1.6.7-py2.py3-none-any.whl (3.6MB)
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->-r requirements.txt (line 1))
      Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    Collecting jmespath<1.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
      Downloading jmespath-0.9.3-py2.py3-none-any.whl
    Collecting python-dateutil<3.0.0,>=2.1 (from botocore->-r requirements.txt (line 2))
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting docutils>=0.10 (from botocore->-r requirements.txt (line 2))
      Downloading docutils-0.14-py3-none-any.whl (543kB)
    Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore->-r requirements.txt (line 2))
      Downloading six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, s3transfer, boto3
    Successfully installed boto3-1.4.6 botocore-1.6.7 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0
     ---> f892d3cb7383
    Removing intermediate container 3ed98b30901d
    Step 6 : COPY . .
     ---> ea7550c04fea
    Removing intermediate container b558b3ebd406
    Successfully built ea7550c04fea
  3. Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
  4. Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
  5. Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes:
    aws --profile default --region us-west-2 events put-rule --name demo-ecs-task-rule  --schedule-expression "rate(15 minutes)"

    {
        "RuleArn": "arn:aws:events:us-west-2:12345678901:rule/demo-ecs-task-rule"
    }
  6. CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
    1. Create the IAM role to be assumed by CloudWatch.
      aws --profile default --region us-west-2 iam create-role --role-name Test-Role --assume-role-policy-document file://event-role.json

      {
          "Role": {
              "AssumeRolePolicyDocument": {
                  "Version": "2012-10-17", 
                  "Statement": [
                      {
                          "Action": "sts:AssumeRole", 
                          "Effect": "Allow", 
                          "Principal": {
                              "Service": "events.amazonaws.com"
                          }
                      }
                  ]
              }, 
              "RoleId": "AROAIRYYLDCVZCUACT7FS", 
              "CreateDate": "2017-07-14T22:44:52.627Z", 
              "RoleName": "Test-Role", 
              "Path": "/", 
              "Arn": "arn:aws:iam::12345678901:role/Test-Role"
          }
      }

      The following is an example of the event-role.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                    "Service": "events.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
    2. Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources.
      aws --profile default --region us-west-2 iam create-policy --policy-name test-policy --policy-document file://event-policy.json

      {
          "Policy": {
              "PolicyName": "test-policy", 
              "CreateDate": "2017-07-14T22:51:20.293Z", 
              "AttachmentCount": 0, 
              "IsAttachable": true, 
              "PolicyId": "ANPAI7XDIQOLTBUMDWGJW", 
              "DefaultVersionId": "v1", 
              "Path": "/", 
              "Arn": "arn:aws:iam::123455678901:policy/test-policy", 
              "UpdateDate": "2017-07-14T22:51:20.293Z"
          }
      }

      The following is an example of the event-policy.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:RunTask"
                ],
                "Resource": [
                    "arn:aws:ecs:*::task-definition/"
                ],
                "Condition": {
                    "ArnLike": {
                        "ecs:cluster": "arn:aws:ecs:*::cluster/"
                    }
                }
            }
          ]
      }
    3. Attach the IAM policy to the role.
      aws --profile default --region us-west-2 iam attach-role-policy --role-name Test-Role --policy-arn arn:aws:iam::1234567890:policy/test-policy
  7. Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings.
    aws events put-targets --rule demo-ecs-task-rule --targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"

    {
        "FailedEntries": [], 
        "FailedEntryCount": 0
    }

That’s it. The logs now run based on the defined schedule.

To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.

Conclusion

In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.

In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.

If you have questions or suggestions, please comment below.

How to Query Personally Identifiable Information with Amazon Macie

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/how-to-query-personally-identifiable-information-with-amazon-macie/

Amazon Macie logo

In August 2017 at the AWS Summit New York, AWS launched a new security and compliance service called Amazon Macie. Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS. In this blog post, I demonstrate how you can use Macie to help enable compliance with applicable regulations, starting with data retention.

How to query retained PII with Macie

Data retention and mandatory data deletion are common topics across compliance frameworks, so knowing what is stored and how long it has been or needs to be stored is of critical importance. For example, you can use Macie for Payment Card Industry Data Security Standard (PCI DSS) 3.2, requirement 3, “Protect stored cardholder data,” which mandates a “quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention.” You also can use Macie for ISO 27017 requirement 12.3.1, which calls for “retention periods for backup data.” In each of these cases, you can use Macie’s built-in queries to identify the age of data in your Amazon S3 buckets and to help meet your compliance needs.

To get started with Macie and run your first queries of personally identifiable information (PII) and sensitive data, follow the initial setup as described in the launch post on the AWS Blog. After you have set up Macie, walk through the following steps to start running queries. Start by focusing on the S3 buckets that you want to inventory and capture important compliance related activity and data.

To start running Macie queries:

  1. In the AWS Management Console, launch the Macie console (you can type macie to find the console).
  2. Click Dashboard in the navigation pane. This shows you an overview of the risk level and data classification type of all inventoried S3 buckets, categorized by date and type.
    Screenshot of "Dashboard" in the navigation pane
  3. Choose S3 objects by PII priority. This dashboard lets you sort by PII priority and PII types.
    Screenshot of "S3 objects by PII priority"
  4. In this case, I want to find information about credit card numbers. I choose the magnifying glass for the type cc_number (note that PII types can be used for custom queries). This view shows the events where PII classified data has been uploaded to S3. When I scroll down, I see the individual files that have been identified.
    Screenshot showing the events where PII classified data has been uploaded to S3
  5. Before looking at the files, I want to continue to build the query by only showing items with high priority. To do so, I choose the row called Object PII Priority and then the magnifying glass icon next to High.
    Screenshot of refining the query for high priority events
  6. To view the results matching these queries, I scroll down and choose any file listed. This shows vital information such as creation date, location, and object access control list (ACL).
  7. The piece I am most interested in this case is the Object PII details line to understand more about what was found in the file. In this case, I see name and credit card information, which is what caused the high priority. Scrolling up again, I also see that the query fields have updated as I interacted with the UI.
    Screenshot showing "Object PII details"

Let’s say that I want to get an alert every time Macie finds new data matching this query. This alert can be used to automate response actions by using AWS Lambda and Amazon CloudWatch Events.

  1. I choose the left green icon called Save query as alert.
    Screenshot of "Save query as alert" button
  2. I can customize the alert and change things like category or severity to fit my needs based on the alert data.
  3. Another way to find the information I am looking for is to run custom queries. To start using custom queries, I choose Research in the navigation pane.
    1. To learn more about custom Macie queries and what you can do on the Research tab, see Using the Macie Research Tab.
  4. I change the type of query I want to run from CloudTrail data to S3 objects in the drop-down list menu.
    Screenshot of choosing "S3 objects" from the drop-down list menu
  5. Because I want PII data, I start typing in the query box, which has an autocomplete feature. I choose the pii_types: query. I can now type the data I want to look for. In this case, I want to see all files matching the credit card filter so I type cc_number and press Enter. The query box now says, pii_types:cc_number. I press Enter again to enable autocomplete, and then I type AND pii_types:email to require both a credit card number and email address in a single object.
    The query looks for all files matching the credit card filter ("cc_number")
  6. I choose the magnifying glass to search and Macie shows me all S3 objects that are tagged as PII of type Credit Cards. I can further specify that I only want to see PII of type Credit Card that are classified as High priority by adding AND and pii_impact:high to the query.
    Screenshot showing narrowing the query results furtherAs before, I can save this new query as an alert by clicking Save query as alert, which will be triggered by data matching the query going forward.

Advanced tip

Try the following advanced queries using Lucene query syntax and save the queries as alerts in Macie.

  • Use a regular-expression based query to search for a minimum of 10 credit card numbers and 10 email addresses in a single object:
    • pii_explain.cc_number:/([1-9][0-9]|[0-9]{3,}) distinct Credit Card Numbers.*/ AND pii_explain.email:/([1-9][0-9]|[0-9]{3,}) distinct Email Addresses.*/
  • Search for objects containing at least one credit card, name, and email address that have an object policy enabling global access (searching for S3 AllUsers or AuthenticatedUsers permissions):
    • (object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers” OR  object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers”) AND (pii_types.cc_number AND pii_types.email AND pii_types.name)

These are two ways to identify and be alerted about PII by using Macie. In a similar way, you can create custom alerts for various AWS CloudTrail events by choosing a different data set on which to run the queries again. In the examples in this post, I identified credit cards stored in plain text (all data in this post is example data only), determined how long they had been stored in S3 by viewing the result details, and set up alerts to notify or trigger actions on new sensitive data being stored. With queries like these, you can build a reliable data validation program.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about how to use Macie, start a new thread on the Macie forum or contact AWS Support.

-Chad

Automating Amazon EBS Snapshot Management with AWS Step Functions and Amazon CloudWatch Events

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/automating-amazon-ebs-snapshot-management-with-aws-step-functions-and-amazon-cloudwatch-events/

Brittany Doncaster, Solutions Architect

Business continuity is important for building mission-critical workloads on AWS. As an AWS customer, you might define recovery point objectives (RPO) and recovery time objectives (RTO) for different tier applications in your business. After the RPO and RTO requirements are defined, it is up to your architects to determine how to meet those requirements.

You probably store persistent data in Amazon EBS volumes, which live within a single Availability Zone. And, following best practices, you take snapshots of your EBS volumes to back up the data on Amazon S3, which provides 11 9’s of durability. If you are following these best practices, then you’ve probably recognized the need to manage the number of snapshots you keep for a particular EBS volume and delete older, unneeded snapshots. Doing this cleanup helps save on storage costs.

Some customers also have policies stating that backups need to be stored a certain number of miles away as part of a disaster recovery (DR) plan. To meet these requirements, customers copy their EBS snapshots to the DR region. Then, the same snapshot management and cleanup has to also be done in the DR region.

All of this snapshot management logic consists of different components. You would first tag your snapshots so you could manage them. Then, determine how many snapshots you currently have for a particular EBS volume and assess that value against a retention rule. If the number of snapshots was greater than your retention value, then you would clean up old snapshots. And finally, you might copy the latest snapshot to your DR region. All these steps are just an example of a simple snapshot management workflow. But how do you automate something like this in AWS? How do you do it without servers?

One of the most powerful AWS services released in 2016 was Amazon CloudWatch Events. It enables you to build event-driven IT automation, based on events happening within your AWS infrastructure. CloudWatch Events integrates with AWS Lambda to let you execute your custom code when one of those events occurs. However, the actions to take based on those events aren’t always composed of a single Lambda function. Instead, your business logic may consist of multiple steps (like in the case of the example snapshot management flow described earlier). And you may want to run those steps in sequence or in parallel. You may also want to have retry logic or exception handling for each step.

AWS Step Functions serves just this purpose―to help you coordinate your functions and microservices. Step Functions enables you to simplify your effort and pull the error handling, retry logic, and workflow logic out of your Lambda code. Step Functions integrates with Lambda to provide a mechanism for building complex serverless applications. Now, you can kick off a Step Functions state machine based on a CloudWatch event.

In this post, I discuss how you can target Step Functions in a CloudWatch Events rule. This allows you to have event-driven snapshot management based on snapshot completion events firing in CloudWatch Event rules.

As an example of what you could do with Step Functions and CloudWatch Events, we’ve developed a reference architecture that performs management of your EBS snapshots.

Automating EBS Snapshot Management with Step Functions

This architecture assumes that you have already set up CloudWatch Events to create the snapshots on a schedule or that you are using some other means of creating snapshots according to your needs.

This architecture covers the pieces of the workflow that need to happen after a snapshot has been created.

  • It creates a CloudWatch Events rule to invoke a Step Functions state machine execution when an EBS snapshot is created.
  • The state machine then tags the snapshot, cleans up the oldest snapshots if the number of snapshots is greater than the defined number to retain, and copies the snapshot to a DR region.
  • When the DR region snapshot copy is completed, another state machine kicks off in the DR region. The new state machine has a similar flow and uses some of the same Lambda code to clean up the oldest snapshots that are greater than the defined number to retain.
  • Also, both state machines demonstrate how you can use Step Functions to handle errors within your workflow. Any errors that are caught during execution result in the execution of a Lambda function that writes a message to an SNS topic. Therefore, if any errors occur, you can subscribe to the SNS topic and get notified.

The following is an architecture diagram of the reference architecture:

Creating the Lambda functions and Step Functions state machines

First, pull the code from GitHub and use the AWS CLI to create S3 buckets for the Lambda code in the primary and DR regions. For this example, assume that the primary region is us-west-2 and the DR region is us-east-2. Run the following commands, replacing the italicized text in <> with your own unique bucket names.

git clone https://github.com/awslabs/aws-step-functions-ebs-snapshot-mgmt.git

cd aws-step-functions-ebs-snapshot-mgmt/

aws s3 mb s3://<primary region bucket name> --region us-west-2

aws s3 mb s3://<DR region bucket name> --region us-east-2

Next, use the Serverless Application Model (SAM), which uses AWS CloudFormation to deploy the Lambda functions and Step Functions state machines in the primary and DR regions. Replace the italicized text in <> with the S3 bucket names that you created earlier.

aws cloudformation package --template-file PrimaryRegionTemplate.yaml --s3-bucket <primary region bucket name>  --output-template-file tempPrimary.yaml --region us-west-2

aws cloudformation deploy --template-file tempPrimary.yaml --stack-name ebsSnapshotMgmtPrimary --capabilities CAPABILITY_IAM --region us-west-2

aws cloudformation package --template-file DR_RegionTemplate.yaml --s3-bucket <DR region bucket name> --output-template-file tempDR.yaml  --region us-east-2

aws cloudformation deploy --template-file tempDR.yaml --stack-name ebsSnapshotMgmtDR --capabilities CAPABILITY_IAM --region us-east-2

CloudWatch event rule verification

The CloudFormation templates deploy the following resources:

  • The Lambda functions that are coordinated by Step Functions
  • The Step Functions state machine
  • The SNS topic
  • The CloudWatch Events rules that trigger the state machine execution

So, all of the CloudWatch event rules have been created for you by performing the preceding commands. The next section demonstrates how you could create the CloudWatch event rule manually. To jump straight to testing the workflow, see the “Testing in your Account” section. Otherwise, you begin by setting up the CloudWatch event rule in the primary region for the createSnapshot event and also the CloudWatch event rule in the DR region for the copySnapshot command.

First, open the CloudWatch console in the primary region.

Choose Create Rule and create a rule for the createSnapshot command, with your newly created Step Function state machine as the target.

For Event Source, choose Event Pattern and specify the following values:

  • Service Name: EC2
  • Event Type: EBS Snapshot Notification
  • Specific Event: createSnapshot

For Target, choose Step Functions state machine, then choose the state machine created by the CloudFormation commands. Choose Create a new role for this specific resource. Your completed rule should look like the following:

Choose Configure Details and give the rule a name and description.

Choose Create Rule. You now have a CloudWatch Events rule that triggers a Step Functions state machine execution when the EBS snapshot creation is complete.

Now, set up the CloudWatch Events rule in the DR region as well. This looks almost same, but is based off the copySnapshot event instead of createSnapshot.

In the upper right corner in the console, switch to your DR region. Choose CloudWatch, Create Rule.

For Event Source, choose Event Pattern and specify the following values:

  • Service Name: EC2
  • Event Type: EBS Snapshot Notification
  • Specific Event: copySnapshot

For Target, choose Step Functions state machine, then select the state machine created by the CloudFormation commands. Choose Create a new role for this specific resource. Your completed rule should look like in the following:

As in the primary region, choose Configure Details and then give this rule a name and description. Complete the creation of the rule.

Testing in your account

To test this setup, open the EC2 console and choose Volumes. Select a volume to snapshot. Choose Actions, Create Snapshot, and then create a snapshot.

This results in a new execution of your state machine in the primary and DR regions. You can view these executions by going to the Step Functions console and selecting your state machine.

From there, you can see the execution of the state machine.

Primary region state machine:

DR region state machine:

I’ve also provided CloudFormation templates that perform all the earlier setup without using git clone and running the CloudFormation commands. Choose the Launch Stack buttons below to launch the primary and DR region stacks in Dublin and Ohio, respectively. From there, you can pick up at the Testing in Your Account section above to finish the example. All of the code for this example architecture is located in the aws-step-functions-ebs-snapshot-mgmt AWSLabs repo.

Launch EBS Snapshot Management into Ireland with CloudFormation
Primary Region eu-west-1 (Ireland)

Launch EBS Snapshot Management into Ohio with CloudFormation
DR Region us-east-2 (Ohio)

Summary

This reference architecture is just an example of how you can use Step Functions and CloudWatch Events to build event-driven IT automation. The possibilities are endless:

  • Use this pattern to perform other common cleanup type jobs such as managing Amazon RDS snapshots, old versions of Lambda functions, or old Amazon ECR images—all triggered by scheduled events.
  • Use Trusted Advisor events to identify unused EC2 instances or EBS volumes, then coordinate actions on them, such as alerting owners, stopping, or snapshotting.

Happy coding and please let me know what useful state machines you build!

Automate Your IT Operations Using AWS Step Functions and Amazon CloudWatch Events

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/automate-your-it-operations-using-aws-step-functions-and-amazon-cloudwatch-events/


Rob Percival, Associate Solutions Architect

Are you interested in reducing the operational overhead of your AWS Cloud infrastructure? One way to achieve this is to automate the response to operational events for resources in your AWS account.

Amazon CloudWatch Events provides a near real-time stream of system events that describe the changes and notifications for your AWS resources. From this stream, you can create rules to route specific events to AWS Step Functions, AWS Lambda, and other AWS services for further processing and automated actions.

In this post, learn how you can use Step Functions to orchestrate serverless IT automation workflows in response to CloudWatch events sourced from AWS Health, a service that monitors and generates events for your AWS resources. As a real-world example, I show automating the response to a scenario where an IAM user access key has been exposed.

Serverless workflows with Step Functions and Lambda

Step Functions makes it easy to develop and orchestrate components of operational response automation using visual workflows. Building automation workflows from individual Lambda functions that perform discrete tasks lets you develop, test, and modify the components of your workflow quickly and seamlessly. As serverless services, Step Functions and Lambda also provide the benefits of more productive development, reduced operational overhead, and no costs incurred outside of when the workflows are actively executing.

Example workflow

As an example, this post focuses on automating the response to an event generated by AWS Health when an IAM access key has been publicly exposed on GitHub. This is a diagram of the automation workflow:

AWS proactively monitors popular code repository sites for IAM access keys that have been publicly exposed. Upon detection of an exposed IAM access key, AWS Health generates an AWS_RISK_CREDENTIALS_EXPOSED event in the AWS account related to the exposed key. A configured CloudWatch Events rule detects this event and invokes a Step Functions state machine. The state machine then orchestrates the automated workflow that deletes the exposed IAM access key, summarizes the recent API activity for the exposed key, and sends the summary message to an Amazon SNS topic to notify the subscribers―in that order.

The corresponding Step Functions state machine diagram of this automation workflow can be seen below:

While this particular example focuses on IT automation workflows in response to the AWS_RISK_CREDENTIALS_EXPOSEDevent sourced from AWS Health, it can be generalized to integrate with other events from these services, other event-generating AWS services, and even run on a time-based schedule.

Walkthrough

To follow along, use the code and resources found in the aws-health-tools GitHub repo. The code and resources include an AWS CloudFormation template, in addition to instructions on how to use it.

Launch Stack into N. Virginia with CloudFormation

The Step Functions state machine execution starts with the exposed keys event details in JSON, a sanitized example of which is provided below:

{
    "version": "0",
    "id": "121345678-1234-1234-1234-123456789012",
    "detail-type": "AWS Health Event",
    "source": "aws.health",
    "account": "123456789012",
    "time": "2016-06-05T06:27:57Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventArn": "arn:aws:health:us-east-1::event/AWS_RISK_CREDENTIALS_EXPOSED_XXXXXXXXXXXXXXXXX",
        "service": "RISK",
        "eventTypeCode": "AWS_RISK_CREDENTIALS_EXPOSED",
        "eventTypeCategory": "issue",
        "startTime": "Sat, 05 Jun 2016 15:10:09 GMT",
        "eventDescription": [
            {
                "language": "en_US",
                "latestDescription": "A description of the event is provided here"
            }
        ],
        "affectedEntities": [
            {
                "entityValue": "ACCESS_KEY_ID_HERE"
            }
        ]
    }
}

After it’s invoked, the state machine execution proceeds as follows.

Step 1: Delete the exposed IAM access key pair

The first thing you want to do when you determine that an IAM access key has been exposed is to delete the key pair so that it can no longer be used to make API calls. This Step Functions task state deletes the exposed access key pair detailed in the incoming event, and retrieves the IAM user associated with the key to look up API activity for the user in the next step. The user name, access key, and other details about the event are passed to the next step as JSON.

This state contains a powerful error-handling feature offered by Step Functions task states called a catch configuration. Catch configurations allow you to reroute and continue state machine invocation at new states depending on potential errors that occur in your task function. In this case, the catch configuration skips to Step 3. It immediately notifies your security team that errors were raised in the task function of this step (Step 1), when attempting to look up the corresponding IAM user for a key or delete the user’s access key.

Note: Step Functions also offers a retry configuration for when you would rather retry a task function that failed due to error, with the option to specify an increasing time interval between attempts and a maximum number of attempts.

Step 2: Summarize recent API activity for key

After you have deleted the access key pair, you’ll want to have some immediate insight into whether it was used for malicious activity in your account. Another task state, this step uses AWS CloudTrail to look up and summarize the most recent API activity for the IAM user associated with the exposed key. The summary is in the form of counts for each API call made and resource type and name affected. This summary information is then passed to the next step as JSON. This step requires information that you obtained in Step 1. Step Functions ensures the successful completion of Step 1 before moving to Step 2.

Step 3: Notify security

The summary information gathered in the last step can provide immediate insight into any malicious activity on your account made by the exposed key. To determine this and further secure your account if necessary, you must notify your security team with the gathered summary information.

This final task state generates an email message providing in-depth detail about the event using the API activity summary, and publishes the message to an SNS topic subscribed to by the members of your security team.

If the catch configuration of the task state in Step 1 was triggered, then the security notification email instead directs your security team to log in to the console and navigate to the Personal Health Dashboard to view more details on the incident.

Lessons learned

When implementing this use case with Step Functions and Lambda, consider the following:

  • One of the most important parts of implementing automation in response to operational events is to ensure visibility into the response and resolution actions is retained. Step Functions and Lambda enable you to orchestrate your granular response and resolution actions that provides direct visibility into the state of the automation workflow.
  • This basic workflow currently executes these steps serially with a catch configuration for error handling. More sophisticated workflows can leverage the parallel execution, branching logic, and time delay functionality provided by Step Functions.
  • Catch and retry configurations for task states allow for orchestrating reliable workflows while maintaining the granularity of each Lambda function. Without leveraging a catch configuration in Step 1, you would have had to duplicate code from the function in Step 3 to ensure that your security team was notified on failure to delete the access key.
  • Step Functions and Lambda are serverless services, so there is no cost for these services when they are not running. Because this IT automation workflow only runs when an IAM access key is exposed for this account (which is hopefully rare!), the total monthly cost for this workflow is essentially $0.

Conclusion

Automating the response to operational events for resources in your AWS account can free up the valuable time of your engineers. Step Functions and Lambda enable granular IT automation workflows to achieve this result while gaining direct visibility into the orchestration and state of the automation.

For more examples of how to use Step Functions to automate the operations of your AWS resources, or if you’d like to see how Step Functions can be used to build and orchestrate serverless applications, visit Getting Started on the Step Functions website.

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

New – API & CloudFormation Support for Amazon CloudWatch Dashboards

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/

We launched CloudWatch Dashboards a couple of years ago. In the post that I wrote for the launch, I showed you how to interactively create a dashboard that displayed chosen CloudWatch metrics in graphical form. After the launch, we added additional features including a full screen mode, a dark theme, control over the range of the Y axis, simplified renaming, persistent storage, and new visualization options.

New API & CLI
While console support is wonderful for interactive use, many customers have asked us to support programmatic creation and manipulation of dashboards and the widgets within. They would like to dynamically build and maintain dashboards, adding and removing widgets as the corresponding AWS resources are created and destroyed. Other customers are interested in setting up and maintaining a consistent set of dashboards across two or more AWS accounts.

I am happy to announce that API, CLI, and AWS CloudFormation support for CloudWatch Dashboards is available now and that you can start using it today!

There are four new API functions (and equivalent CLI commands):

ListDashboards / aws cloudwatch list-dashboards – Fetch a list of all dashboards within an account, or a subset that share a common prefix.

GetDashboard / aws cloudwatch get-dashboard – Fetch details for a single dashboard.

PutDashboard / aws cloudwatch put-dashboard – Create a new dashboard or update an existing one.

DeleteDashboards / aws cloudwatch delete-dashboards – Delete one or more dashboards.

Dashboard Concepts
I want to show you how to use these functions and commands. Before I dive in, I should review a couple of important dashboard concepts and attributes.

Global – Dashboards are part of an AWS account, and are not associated with a specific AWS Region. Each account can have up to 500 dashboards.

Named – Each dashboard has a name that is unique within the AWS account. Names can be up to 255 characters long.

Grid Model – Each dashboard is composed of a grid of cells. The grid is 24 cells across and as tall as necessary. Each widget on the dashboard is positioned at a particular set of grid coordinates, and has a size that spans an integral number of grid cells.

Widgets (Visualizations) – Each widget can display text or a set of CloudWatch metrics. Text is specified using Markdown; metrics can be displayed as single values, line charts, or stacked area charts. Each dashboard can have up to 100 widgets. Widgets that display metrics can also be associated with a CloudWatch Alarm.

Dashboards have a JSON representation that you can now see and edit from within the console. Simply click on the Action menu and choose View/edit source:

Here’s the source for my dashboard:

You can use this JSON as a starting point for your own applications. As you can see, there’s an entry in the widgets array for each widget on the dashboard; each entry describes one widget, starting with its type, position, and size.

Creating a Dashboard Using the API
Let’s say I want to create a dashboard that has a widget for each of my EC2 instances in a particular region. I’ll use Python and the AWS SDK for Python, and start as follows (excuse the amateur nature of my code):

import boto3
import json

cw  = boto3.client("cloudwatch")
ec2 = boto3.client("ec2")

x, y          = [0, 0]
width, height = [3, 3]
max_width     = 12
widgets       = []

Then I simply iterate over the instances, creating a widget dictionary for each one, and appending it to the widgets array:

instances = ec2.describe_instances()
for r in instances['Reservations']:
    for i in r['Instances']:

        widget = {'type'      : 'metric',
                  'x'         : x,
                  'y'         : y,
                  'height'    : height,
                  'width'     : width,
                  'properties': {'view'    : 'timeSeries',
                                 'stacked' : False,
                                 'metrics' : [['AWS/EC2', 'NetworkIn', 'InstanceId', i['InstanceId']],
                                              ['.',       'NetworkOut', '.',         '.']
                                             ],
                                 'period'  : 300,
                                 'stat'    : 'Average',
                                 'region'  : 'us-east-1',
                                 'title'   : i['InstanceId']
                                }
                 }

        widgets.append(widget)

I update the position (x and y) within the loop, and form a grid (if I don’t specify positions, the widgets will be laid out left to right, top to bottom):

        x += width
        if (x + width > max_width):
            x = 0
            y += height

After I have processed all of the instances, I create a JSON version of the widget array:

body   = {'widgets' : widgets}
body_j = json.dumps(body)

And I create or update my dashboard:

cw.put_dashboard(DashboardName = "EC2_Networking",
                 DashboardBody = body_j)

I run the code, and get the following dashboard:

The CloudWatch team recommends that dashboards created programmatically include a text widget indicating that the dashboard was generated automatically, along with a link to the source code or CloudFormation template that did the work. This will discourage users from making manual, out-of-band changers to the dashboards.

As I mentioned earlier, each metric widget can also be associated with a CloudWatch Alarm. You can create the alarms programmatically or by using a CloudFormation template such as the Sample CPU Utilization Alarm. If you decide to do this, the alarm threshold will be displayed in the widget. To learn more about this, read Tara Walker’s recent post, Amazon CloudWatch Launches Alarms on Dashboards.

Going one step further, I could use CloudWatch Events and a Lamba Function to track the creation and deletion of certain resources and update a dashboard in concert with the changes. To learn how to do this, read Keeping CloudWatch Dashboards up to Date Using AWS Lambda.

Accessing a Dashboard Using the CLI
I can also access and manipulate my dashboards from the command line. For example, I can generate a simple list:

$ aws cloudwatch list-dashboards --output table
----------------------------------------------
|               ListDashboards               |
+--------------------------------------------+
||             DashboardEntries             ||
|+-----------------+----------------+-------+|
||  DashboardName  | LastModified   | Size  ||
|+-----------------+----------------+-------+|
||  Disk-Metrics   |  1496405221.0  |  316  ||
||  EC2_Networking |  1498090434.0  |  2830 ||
||  Main-Metrics   |  1498085173.0  |  234  ||
|+-----------------+----------------+-------+|

And I can get rid of the Disk-Metrics dashboard:

$ aws cloudwatch delete-dashboards --dashboard-names Disk-Metrics

I can also retrieve the JSON that defines a dashboard:

Creating a Dashboard Using CloudFormation
Dashboards can also be specified in CloudFormation templates. Here’s a simple template in YAML (the DashboardBody is still specified in JSON):

Resources:
  MyDashboard:
    Type: "AWS::CloudWatch::Dashboard"
    Properties:
      DashboardName: SampleDashboard
      DashboardBody: '{"widgets":[{"type":"text","x":0,"y":0,"width":6,"height":6,"properties":{"markdown":"Hi there from CloudFormation"}}]}'

I place the template in a file and then create a stack using the console or the CLI:

$ aws cloudformation create-stack --stack-name MyDashboard --template-body file://dash.yaml
{
    "StackId": "arn:aws:cloudformation:us-east-1:xxxxxxxxxxxx:stack/MyDashboard/a2a3fb20-5708-11e7-8ffd-500c21311262"
}

Here’s the dashboard:

Available Now
This feature is available now and you can start using it today. You can create 3 dashboards with up to 50 metrics per dashboard at no charge; additional dashboards are priced at $3 per month, as listed on the CloudWatch Pricing page. You can make up to 1 million calls to the new API functions each month at no charge; beyond that you pay $.01 for every 1,000 calls.

Jeff;

New – Cross-Account Delivery of CloudWatch Events

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-cross-account-delivery-of-cloudwatch-events/

CloudWatch Events allow you to track and respond to changes in your AWS resources. You get a near real-time stream of events that you can route to one or more targets (AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and more) using rules. The events that are generated depend on the particular AWS service. For example, here are the events generated for EC2 instances:

Or for S3 (CloudTrail must be enabled in order to create rules that use these events):

See the CloudWatch Event Types list to see which services and events are available.

New Cross-Account Event Delivery
Our customers have asked us to extend CloudWatch Events to handle some interesting & powerful use cases that span multiple AWS accounts, and we are happy to oblige. Today we are adding support for controlled, cross-account delivery of CloudWatch Events. As you will see, you can now arrange to route events from one AWS account to another. As is the case with the existing event delivery model, you can use CloudWatch Events rules to specify which events you would like to send to another account.

Here are some of the use cases that have been shared with us:

Separation of Concerns – Customers would like to handle and respond to events in a separate account in order to implement advanced security schemes.

Rollup – Customers are using AWS Organizations and would like to track certain types of events across the entire organization, across a multitude of AWS accounts.

Each AWS account uses a resource event bus to distribute events. This object dates back to the introduction of CloudWatch Events, but has never been formally called out as such. AWS services, the PutEvents function, and other accounts can publish events to it.

The event bus (currently one per account, with plans to allow more in the future) now has an associated access policy. This policy specifies the set of AWS accounts that are allowed to send events to the bus. You can add one or more accounts, or you can specify that any account is allowed to send events.

You can create event distribution topologies that work on a fan-in or a fan-out basis. A fan-in model allows you to handle events from multiple accounts in one place. A fan-out model allows you to route different types of events to distinct locations and accounts.

In order to avoid the possibility of creating a loop, events that are sent from one account to another will not be sent to a third one. You should take this in to account when you are planning your cross-account implementation.

Using Cross-Account Event Delivery
In order to test this new feature, I made use of my work and my personal AWS accounts. I log in to my personal account and went to the CloudWatch Console. Then I select Event Buses, clicked on Add Permission, and enter the Account ID of my work account:

I can see all of my buses (just one is allowed right now) and permissions in one place:

Next, I log in to my work account and create a rule that will send events to the event bus in my personal account. In this case my personal account is interested in changes of state for EC2 instances running in my work account:

Back in my personal account, I create a rule that will fire on any EC2 event, targeting it at an SNS topic that is configured to send email:

After testing this rule with an EC2 instance launched in my personal account, I launch an instance in my work account and wait for the email message:

The account and resources fields in the message are from the source (work) account.

Things to Know
This functionality is available in all AWS Regions where CloudWatch Events is available and you can start using it today. It is also accessible from the CloudWatch Events APIs and the AWS Command Line Interface (CLI).

Events forwarded from one account to another are considered custom events. The sending account is charged $1 for every million events (see the CloudWatch Pricing page for more info).

Jeff;

PS – AWS CloudFormation support is in the works and coming soon!

How to Create an AMI Builder with AWS CodeBuild and HashiCorp Packer – Part 2

Post Syndicated from Heitor Lessa original https://aws.amazon.com/blogs/devops/how-to-create-an-ami-builder-with-aws-codebuild-and-hashicorp-packer-part-2/

Written by AWS Solutions Architects Jason Barto and Heitor Lessa

 
In Part 1 of this post, we described how AWS CodeBuild, AWS CodeCommit, and HashiCorp Packer can be used to build an Amazon Machine Image (AMI) from the latest version of Amazon Linux. In this post, we show how to use AWS CodePipeline, AWS CloudFormation, and Amazon CloudWatch Events to continuously ship new AMIs. We use Ansible by Red Hat to harden the OS on the AMIs through a well-known set of security controls outlined by the Center for Internet Security in its CIS Amazon Linux Benchmark.

You’ll find the source code for this post in our GitHub repo.

At the end of this post, we will have the following architecture:

Requirements

 
To follow along, you will need Git and a text editor. Make sure Git is configured to work with AWS CodeCommit, as described in Part 1.

Technologies

 
In addition to the services and products used in Part 1 of this post, we also use these AWS services and third-party software:

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

Amazon CloudWatch Events enables you to react selectively to events in the cloud and in your applications. Specifically, you can create CloudWatch Events rules that match event patterns, and take actions in response to those patterns.

AWS CodePipeline is a continuous integration and continuous delivery service for fast and reliable application and infrastructure updates. AWS CodePipeline builds, tests, and deploys your code every time there is a code change, based on release process models you define.

Amazon SNS is a fast, flexible, fully managed push notification service that lets you send individual messages or to fan out messages to large numbers of recipients. Amazon SNS makes it simple and cost-effective to send push notifications to mobile device users or email recipients. The service can even send messages to other distributed services.

Ansible is a simple IT automation system that handles configuration management, application deployment, cloud provisioning, ad-hoc task-execution, and multinode orchestration.

Getting Started

 
We use CloudFormation to bootstrap the following infrastructure:

Component Purpose
AWS CodeCommit repository Git repository where the AMI builder code is stored.
S3 bucket Build artifact repository used by AWS CodePipeline and AWS CodeBuild.
AWS CodeBuild project Executes the AWS CodeBuild instructions contained in the build specification file.
AWS CodePipeline pipeline Orchestrates the AMI build process, triggered by new changes in the AWS CodeCommit repository.
SNS topic Notifies subscribed email addresses when an AMI build is complete.
CloudWatch Events rule Defines how the AMI builder should send a custom event to notify an SNS topic.
Region AMI Builder Launch Template
N. Virginia (us-east-1)
Ireland (eu-west-1)

After launching the CloudFormation template linked here, we will have a pipeline in the AWS CodePipeline console. (Failed at this stage simply means we don’t have any data in our newly created AWS CodeCommit Git repository.)

Next, we will clone the newly created AWS CodeCommit repository.

If this is your first time connecting to a AWS CodeCommit repository, please see instructions in our documentation on Setup steps for HTTPS Connections to AWS CodeCommit Repositories.

To clone the AWS CodeCommit repository (console)

  1. From the AWS Management Console, open the AWS CloudFormation console.
  2. Choose the AMI-Builder-Blogpost stack, and then choose Output.
  3. Make a note of the Git repository URL.
  4. Use git to clone the repository.

For example: git clone https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/AMI-Builder_repo

To clone the AWS CodeCommit repository (CLI)

# Retrieve CodeCommit repo URL
git_repo=$(aws cloudformation describe-stacks --query 'Stacks[0].Outputs[?OutputKey==`GitRepository`].OutputValue' --output text --stack-name "AMI-Builder-Blogpost")

# Clone repository locally
git clone ${git_repo}

Bootstrap the Repo with the AMI Builder Structure

 
Now that our infrastructure is ready, download all the files and templates required to build the AMI.

Your local Git repo should have the following structure:

.
├── ami_builder_event.json
├── ansible
├── buildspec.yml
├── cloudformation
├── packer_cis.json

Next, push these changes to AWS CodeCommit, and then let AWS CodePipeline orchestrate the creation of the AMI:

git add .
git commit -m "My first AMI"
git push origin master

AWS CodeBuild Implementation Details

 
While we wait for the AMI to be created, let’s see what’s changed in our AWS CodeBuild buildspec.yml file:

...
phases:
  ...
  build:
    commands:
      ...
      - ./packer build -color=false packer_cis.json | tee build.log
  post_build:
    commands:
      - egrep "${AWS_REGION}\:\sami\-" build.log | cut -d' ' -f2 > ami_id.txt
      # Packer doesn't return non-zero status; we must do that if Packer build failed
      - test -s ami_id.txt || exit 1
      - sed -i.bak "s/<<AMI-ID>>/$(cat ami_id.txt)/g" ami_builder_event.json
      - aws events put-events --entries file://ami_builder_event.json
      ...
artifacts:
  files:
    - ami_builder_event.json
    - build.log
  discard-paths: yes

In the build phase, we capture Packer output into a file named build.log. In the post_build phase, we take the following actions:

  1. Look up the AMI ID created by Packer and save its findings to a temporary file (ami_id.txt).
  2. Forcefully make AWS CodeBuild to fail if the AMI ID (ami_id.txt) is not found. This is required because Packer doesn’t fail if something goes wrong during the AMI creation process. We have to tell AWS CodeBuild to stop by informing it that an error occurred.
  3. If an AMI ID is found, we update the ami_builder_event.json file and then notify CloudWatch Events that the AMI creation process is complete.
  4. CloudWatch Events publishes a message to an SNS topic. Anyone subscribed to the topic will be notified in email that an AMI has been created.

Lastly, the new artifacts phase instructs AWS CodeBuild to upload files built during the build process (ami_builder_event.json and build.log) to the S3 bucket specified in the Outputs section of the CloudFormation template. These artifacts can then be used as an input artifact in any later stage in AWS CodePipeline.

For information about customizing the artifacts sequence of the buildspec.yml, see the Build Specification Reference for AWS CodeBuild.

CloudWatch Events Implementation Details

 
CloudWatch Events allow you to extend the AMI builder to not only send email after the AMI has been created, but to hook up any of the supported targets to react to the AMI builder event. This event publication means you can decouple from Packer actions you might take after AMI completion and plug in other actions, as you see fit.

For more information about targets in CloudWatch Events, see the CloudWatch Events API Reference.

In this case, CloudWatch Events should receive the following event, match it with a rule we created through CloudFormation, and publish a message to SNS so that you can receive an email.

Example CloudWatch custom event

[
        {
            "Source": "com.ami.builder",
            "DetailType": "AmiBuilder",
            "Detail": "{ \"AmiStatus\": \"Created\"}",
            "Resources": [ "ami-12cd5guf" ]
        }
]

Cloudwatch Events rule

{
  "detail-type": [
    "AmiBuilder"
  ],
  "source": [
    "com.ami.builder"
  ],
  "detail": {
    "AmiStatus": [
      "Created"
    ]
  }
}

Example SNS message sent in email

{
    "version": "0",
    "id": "f8bdede0-b9d7...",
    "detail-type": "AmiBuilder",
    "source": "com.ami.builder",
    "account": "<<aws_account_number>>",
    "time": "2017-04-28T17:56:40Z",
    "region": "eu-west-1",
    "resources": ["ami-112cd5guf "],
    "detail": {
        "AmiStatus": "Created"
    }
}

Packer Implementation Details

 
In addition to the build specification file, there are differences between the current version of the HashiCorp Packer template (packer_cis.json) and the one used in Part 1.

Variables

  "variables": {
    "vpc": "{{env `BUILD_VPC_ID`}}",
    "subnet": "{{env `BUILD_SUBNET_ID`}}",
         “ami_name”: “Prod-CIS-Latest-AMZN-{{isotime \”02-Jan-06 03_04_05\”}}”
  },
  • ami_name: Prefixes a name used by Packer to tag resources during the Builders sequence.
  • vpc and subnet: Environment variables defined by the CloudFormation stack parameters.

We no longer assume a default VPC is present and instead use the VPC and subnet specified in the CloudFormation parameters. CloudFormation configures the AWS CodeBuild project to use these values as environment variables. They are made available throughout the build process.

That allows for more flexibility should you need to change which VPC and subnet will be used by Packer to launch temporary resources.

Builders

  "builders": [{
    ...
    "ami_name": “{{user `ami_name`| clean_ami_name}}”,
    "tags": {
      "Name": “{{user `ami_name`}}”,
    },
    "run_tags": {
      "Name": “{{user `ami_name`}}",
    },
    "run_volume_tags": {
      "Name": “{{user `ami_name`}}",
    },
    "snapshot_tags": {
      "Name": “{{user `ami_name`}}",
    },
    ...
    "vpc_id": "{{user `vpc` }}",
    "subnet_id": "{{user `subnet` }}"
  }],

We now have new properties (*_tag) and a new function (clean_ami_name) and launch temporary resources in a VPC and subnet specified in the environment variables. AMI names can only contain a certain set of ASCII characters. If the input in project deviates from the expected characters (for example, includes whitespace or slashes), Packer’s clean_ami_name function will fix it.

For more information, see functions on the HashiCorp Packer website.

Provisioners

  "provisioners": [
    {
        "type": "shell",
        "inline": [
            "sudo pip install ansible"
        ]
    }, 
    {
        "type": "ansible-local",
        "playbook_file": "ansible/playbook.yaml",
        "role_paths": [
            "ansible/roles/common"
        ],
        "playbook_dir": "ansible",
        "galaxy_file": "ansible/requirements.yaml"
    },
    {
      "type": "shell",
      "inline": [
        "rm .ssh/authorized_keys ; sudo rm /root/.ssh/authorized_keys"
      ]
    }

We used shell provisioner to apply OS patches in Part 1. Now, we use shell to install Ansible on the target machine and ansible-local to import, install, and execute Ansible roles to make our target machine conform to our standards.

Packer uses shell to remove temporary keys before it creates an AMI from the target and temporary EC2 instance.

Ansible Implementation Details

 
Ansible provides OS patching through a custom Common role that can be easily customized for other tasks.

CIS Benchmark and Cloudwatch Logs are implemented through two Ansible third-party roles that are defined in ansible/requirements.yaml as seen in the Packer template.

The Ansible provisioner uses Ansible Galaxy to download these roles onto the target machine and execute them as instructed by ansible/playbook.yaml.

For information about how these components are organized, see the Playbook Roles and Include Statements in the Ansible documentation.

The following Ansible playbook (ansible</playbook.yaml) controls the execution order and custom properties:

---
- hosts: localhost
  connection: local
  gather_facts: true    # gather OS info that is made available for tasks/roles
  become: yes           # majority of CIS tasks require root
  vars:
    # CIS Controls whitepaper:  http://bit.ly/2mGAmUc
    # AWS CIS Whitepaper:       http://bit.ly/2m2Ovrh
    cis_level_1_exclusions:
    # 3.4.2 and 3.4.3 effectively blocks access to all ports to the machine
    ## This can break automation; ignoring it as there are stronger mechanisms than that
      - 3.4.2 
      - 3.4.3
    # CloudWatch Logs will be used instead of Rsyslog/Syslog-ng
    ## Same would be true if any other software doesn't support Rsyslog/Syslog-ng mechanisms
      - 4.2.1.4
      - 4.2.2.4
      - 4.2.2.5
    # Autofs is not installed in newer versions, let's ignore
      - 1.1.19
    # Cloudwatch Logs role configuration
    logs:
      - file: /var/log/messages
        group_name: "system_logs"
  roles:
    - common
    - anthcourtney.cis-amazon-linux
    - dharrisio.aws-cloudwatch-logs-agent

Both third-party Ansible roles can be easily configured through variables (vars). We use Ansible playbook variables to exclude CIS controls that don’t apply to our case and to instruct the CloudWatch Logs agent to stream the /var/log/messages log file to CloudWatch Logs.

If you need to add more OS or application logs, you can easily duplicate the playbook and make changes. The CloudWatch Logs agent will ship configured log messages to CloudWatch Logs.

For more information about parameters you can use to further customize third-party roles, download Ansible roles for the Cloudwatch Logs Agent and CIS Amazon Linux from the Galaxy website.

Committing Changes

 
Now that Ansible and CloudWatch Events are configured as a part of the build process, commiting any changes to the AWS CodeComit Git Repository will triger a new AMI build process that can be followed through the AWS CodePipeline console.

When the build is complete, an email will be sent to the email address you provided as a part of the CloudFormation stack deployment. The email serves as notification that an AMI has been built and is ready for use.

Summary

 
We used AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Packer, and Ansible to build a pipeline that continuously builds new, hardened CIS AMIs. We used Amazon SNS so that email addresses subscribed to a SNS topic are notified upon completion of the AMI build.

By treating our AMI creation process as code, we can iterate and track changes over time. In this way, it’s no different from a software development workflow. With that in mind, software patches, OS configuration, and logs that need to be shipped to a central location are only a git commit away.

Next Steps

 
Here are some ideas to extend this AMI builder:

  • Hook up a Lambda function in Cloudwatch Events to update EC2 Auto Scaling configuration upon completion of the AMI build.
  • Use AWS CodePipeline parallel steps to build multiple Packer images.
  • Add a commit ID as a tag for the AMI you created.
  • Create a scheduled Lambda function through Cloudwatch Events to clean up old AMIs based on timestamp (name or additional tag).
  • Implement Windows support for the AMI builder.
  • Create a cross-account or cross-region AMI build.

Cloudwatch Events allow the AMI builder to decouple AMI configuration and creation so that you can easily add your own logic using targets (AWS Lambda, Amazon SQS, Amazon SNS) to add events or recycle EC2 instances with the new AMI.

If you have questions or other feedback, feel free to leave it in the comments or contribute to the AMI Builder repo on GitHub.

Visualize and Monitor Amazon EC2 Events with Amazon CloudWatch Events and Amazon Kinesis Firehose

Post Syndicated from Karan Desai original https://aws.amazon.com/blogs/big-data/visualize-and-monitor-amazon-ec2-events-with-amazon-cloudwatch-events-and-amazon-kinesis-firehose/

Monitoring your AWS environment is important for security, performance, and cost control purposes. For example, by monitoring and analyzing API calls made to your Amazon EC2 instances, you can trace security incidents and gain insights into administrative behaviors and access patterns. The kinds of events you might monitor include console logins, Amazon EBS snapshot creation/deletion/modification, VPC creation/deletion/modification, and instance reboots, etc.

In this post, I show you how to build a near real-time API monitoring solution for EC2 events using Amazon CloudWatch Events and Amazon Kinesis Firehose. Please be sure to have Amazon CloudTrail enabled in your account.

  • CloudWatch Events offers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events now supports Kinesis Firehose as a target.
  • Kinesis Firehose is a fully managed service for continuously capturing, transforming, and delivering data in minutes to storage and analytics destinations such as Amazon S3, Amazon Kinesis Analytics, Amazon Redshift, and Amazon Elasticsearch Service.

Walkthrough

For this walkthrough, you create a CloudWatch event rule that matches specific EC2 events such as:

  • Starting, stopping, and terminating an instance
  • Creating and deleting VPC route tables
  • Creating and deleting a security group
  • Creating, deleting, and modifying instance volumes and snapshots

Your CloudWatch event target is a Kinesis Firehose delivery stream that delivers this data to an Elasticsearch cluster, where you set up Kibana for visualization. Using this solution, you can easily load and visualize EC2 events in minutes without setting up complicated data pipelines.

Set up the Elasticsearch cluster

Create the Amazon ES domain in the Amazon ES console, or by using the create-elasticsearch-domain command in the AWS CLI.

This example uses the following configuration:

  • Domain Name: esLogSearch
  • Elasticsearch Version: 1
  • Instance Count: 2
  • Instance type:elasticsearch
  • Enable dedicated master: true
  • Enable zone awareness: true
  • Restrict Amazon ES to an IP-based access policy

Other settings are left as the defaults.

Create a Kinesis Firehose delivery stream

In the Kinesis Firehose console, create a new delivery stream with Amazon ES as the destination. For detailed steps, see Create a Kinesis Firehose Delivery Stream to Amazon Elasticsearch Service.

Set up CloudWatch Events

Create a rule, and configure the event source and target. You can choose to configure multiple event sources with several AWS resources, along with options to specify specific or multiple event types.

In the CloudWatch console, choose Events.

For Service Name, choose EC2.

In Event Pattern Preview, choose Edit and copy the pattern below. For this walkthrough, I selected events that are specific to the EC2 API, but you can modify it to include events for any of your AWS resources.

 

{
	"source": [
		"aws.ec2"
	],
	"detail-type": [
		"AWS API Call via CloudTrail"
	],
	"detail": {
		"eventSource": [
			"ec2.amazonaws.com"
		],
		"eventName": [
			"RunInstances",
			"StopInstances",
			"StartInstances",
			"CreateFlowLogs",
			"CreateImage",
			"CreateNatGateway",
			"CreateVpc",
			"DeleteKeyPair",
			"DeleteNatGateway",
			"DeleteRoute",
			"DeleteRouteTable",
"CreateSnapshot",
"DeleteSnapshot",
			"DeleteVpc",
			"DeleteVpcEndpoints",
			"DeleteSecurityGroup",
			"ModifyVolume",
			"ModifyVpcEndpoint",
			"TerminateInstances"
		]
	}
}

The following screenshot shows what your event looks like in the console.

Next, choose Add target and select the delivery stream that you just created.

Set up Kibana on the Elasticsearch cluster

Amazon ES provides a default installation of Kibana with every Amazon ES domain. You can find the Kibana endpoint on your domain dashboard in the Amazon ES console. You can restrict Amazon ES access to an IP-based access policy.

In the Kibana console, for Index name or pattern, type log. This is the name of the Elasticsearch index.

For Time-field name, choose @time.

To view the events, choose Discover.

The following chart demonstrates the API operations and the number of times that they have been triggered in the past 12 hours.

Summary

In this post, you created a continuous, near real-time solution to monitor various EC2 events such as starting and shutting down instances, creating VPCs, etc. Likewise, you can build a continuous monitoring solution for all the API operations that are relevant to your daily AWS operations and resources.

With Kinesis Firehose as a new target for CloudWatch Events, you can retrieve, transform, and load system events to the storage and analytics destination of your choice in minutes, without setting up complicated data pipelines.

If you have any questions or suggestions, please comment below.


Additional Reading

Learn how to build a serverless architecture to analyze Amazon CloudFront access logs using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

 

 

 

Amazon EC2 Container Service – Launch Recap, Customer Stories, and Code

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-ec2-container-service-launch-recap-customer-stories-and-code/

Today seems like a good time to recap some of the features that we have added to Amazon EC2 Container Service over the last year or so, and to share some customer success stories and code with you! The service makes it easy for you to run any number of Docker containers across a managed cluster of EC2 instances, with full console, API, CloudFormation, CLI, and PowerShell support. You can store your Linux and Windows Docker images in the EC2 Container Registry for easy access.

Launch Recap
Let’s start by taking a look at some of the newest ECS features and some helpful how-to blog posts that will show you how to use them:

Application Load Balancing – We added support for the application load balancer last year. This high-performance load balancing option runs at the application level and allows you to define content-based routing rules. It provides support for dynamic ports and can be shared across multiple services, making it easier for you to run microservices in containers. To learn more, read about Service Load Balancing.

IAM Roles for Tasks – You can secure your infrastructure by assigning IAM roles to ECS tasks. This allows you to grant permissions on a fine-grained, per-task basis, customizing the permissions to the needs of each task. Read IAM Roles for Tasks to learn more.

Service Auto Scaling – You can define scaling policies that scale your services (tasks) up and down in response to changes in demand. You set the desired minimum and maximum number of tasks, create one or more scaling policies, and Service Auto Scaling will take care of the rest. The documentation for Service Auto Scaling will help you to make use of this feature.

Blox – Scheduling, in a container-based environment, is the process of assigning tasks to instances. ECS gives you three options: automated (via the built-in Service Scheduler), manual (via the RunTask function), and custom (via a scheduler that you provide). Blox is an open source scheduler that supports a one-task-per-host model, with room to accommodate other models in the future. It monitors the state of the cluster and is well-suited to running monitoring agents, log collectors, and other daemon-style tasks.

Windows – We launched ECS with support for Linux containers and followed up with support for running Windows Server 2016 Base with Containers.

Container Instance Draining – From time to time you may need to remove an instance from a running cluster in order to scale the cluster down or to perform a system update. Earlier this year we added a set of lifecycle hooks that allow you to better manage the state of the instances. Read the blog post How to Automate Container Instance Draining in Amazon ECS to see how to use the lifecycle hooks and a Lambda function to automate the process of draining existing work from an instance while preventing new work from being scheduled for it.

CI/CD Pipeline with Code* – Containers simplify software deployment and are an ideal target for a CI/CD (Continuous Integration / Continuous Deployment) pipeline. The post Continuous Deployment to Amazon ECS using AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS CloudFormation shows you how to build and operate a CI/CD pipeline using multiple AWS services.

CloudWatch Logs Integration – This launch gave you the ability to configure the containers that run your tasks to send log information to CloudWatch Logs for centralized storage and analysis. You simply install the Amazon ECS Container Agent and enable the awslogs log driver.

CloudWatch Events – ECS generates CloudWatch Events when the state of a task or a container instance changes. These events allow you to monitor the state of the cluster using a Lambda function. To learn how to capture the events and store them in an Elasticsearch cluster, read Monitor Cluster State with Amazon ECS Event Stream.

Task Placement Policies – This launch provided you with fine-grained control over the placement of tasks on container instances within clusters. It allows you to construct policies that include cluster constraints, custom constraints (location, instance type, AMI, and attribute), placement strategies (spread or bin pack) and to use them without writing any code. Read Introducing Amazon ECS Task Placement Policies to see how to do this!

EC2 Container Service in Action
Many of our customers from large enterprises to hot startups and across all industries, such as financial services, hospitality, and consumer electronics, are using Amazon ECS to run their microservices applications in production. Companies such as Capital One, Expedia, Okta, Riot Games, and Viacom rely on Amazon ECS.

Mapbox is a platform for designing and publishing custom maps. The company uses ECS to power their entire batch processing architecture to collect and process over 100 million miles of sensor data per day that they use for powering their maps. They also optimize their batch processing architecture on ECS using Spot Instances. The Mapbox platform powers over 5,000 apps and reaches more than 200 million users each month. Its backend runs on ECS allowing it to serve more than 1.3 billion requests per day. To learn more about their recent migration to ECS, read their recent blog post, We Switched to Amazon ECS, and You Won’t Believe What Happened Next.

Travel company Expedia designed their backends with a microservices architecture. With the popularization of Docker, they decided they would like to adopt Docker for its faster deployments and environment portability. They chose to use ECS to orchestrate all their containers because it had great integration with the AWS platform, everything from ALB to IAM roles to VPC integration. This made ECS very easy to use with their existing AWS infrastructure. ECS really reduced the heavy lifting of deploying and running containerized applications. Expedia runs 75% of all apps on AWS in ECS allowing it to process 4 billion requests per hour. Read Kuldeep Chowhan‘s blog post, How Expedia Runs Hundreds of Applications in Production Using Amazon ECS to learn more.

Realtor.com provides home buyers and sellers with a comprehensive database of properties that are currently for sale. Their move to AWS and ECS has helped them to support business growth that now numbers 50 million unique monthly users who drive up to 250,000 requests per second at peak times. ECS has helped them to deploy their code more quickly while increasing utilization of their cloud infrastructure. Read the Realtor.com Case Study to learn more about how they use ECS, Kinesis, and other AWS services.

Instacart talks about how they use ECS to power their same-day grocery delivery service:

Capital One talks about how they use ECS to automate their operations and their infrastructure management:

Code
Clever developers are using ECS as a base for their own work. For example:

Rack is an open source PaaS (Platform as a Service). It focuses on infrastructure automation, runs in an isolated VPC, and uses a single-tenant build service for security.

Empire is also an open source PaaS. It provides a Heroku-like workflow and is targeted at small and medium sized startups, with an emphasis on microservices.

Cloud Container Cluster Visualizer (c3vis) helps to visualize resource utilization within ECS clusters:

Stay Tuned
We have plenty of new features in the works for ECS, so stay tuned!

Jeff;