Tag Archives: fail

Easier Certificate Validation Using DNS with AWS Certificate Manager

Post Syndicated from Todd Cignetti original https://aws.amazon.com/blogs/security/easier-certificate-validation-using-dns-with-aws-certificate-manager/

Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates are used to secure network communications and establish the identity of websites over the internet. Before issuing a certificate for your website, Amazon must validate that you control the domain name for your site. You can now use AWS Certificate Manager (ACM) Domain Name System (DNS) validation to establish that you control a domain name when requesting SSL/TLS certificates with ACM. Previously ACM supported only email validation, which required the domain owner to receive an email for each certificate request and validate the information in the request before approving it.

With DNS validation, you write a CNAME record to your DNS configuration to establish control of your domain name. After you have configured the CNAME record, ACM can automatically renew DNS-validated certificates before they expire, as long as the DNS record has not changed. To make it even easier to validate your domain, ACM can update your DNS configuration for you if you manage your DNS records with Amazon Route 53. In this blog post, I demonstrate how to request a certificate for a website by using DNS validation. To perform the equivalent steps using the AWS CLI or AWS APIs and SDKs, see AWS Certificate Manager in the AWS CLI Reference and the ACM API Reference.

Requesting an SSL/TLS certificate by using DNS validation

In this section, I walk you through the four steps required to obtain an SSL/TLS certificate through ACM to identify your site over the internet. SSL/TLS provides encryption for sensitive data in transit and authentication by using certificates to establish the identity of your site and secure connections between browsers and applications and your site. DNS validation and SSL/TLS certificates provisioned through ACM are free.

Step 1: Request a certificate

To get started, sign in to the AWS Management Console and navigate to the ACM console. Choose Get started to request a certificate.

Screenshot of getting started in the ACM console

If you previously managed certificates in ACM, you will instead see a table with your certificates and a button to request a new certificate. Choose Request a certificate to request a new certificate.

Screenshot of choosing "Request a certificate"

Type the name of your domain in the Domain name box and choose Next. In this example, I type www.example.com. You must use a domain name that you control. Requesting certificates for domains that you don’t control violates the AWS Service Terms.

Screenshot of entering a domain name

Step 2: Select a validation method

With DNS validation, you write a CNAME record to your DNS configuration to establish control of your domain name. Choose DNS validation, and then choose Review.

Screenshot of selecting validation method

Step 3: Review your request

Review your request and choose Confirm and request to request the certificate.

Screenshot of reviewing request and confirming it

Step 4: Submit your request

After a brief delay while ACM populates your domain validation information, choose the down arrow (highlighted in the following screenshot) to display all the validation information for your domain.

Screenshot of validation information

ACM displays the CNAME record you must add to your DNS configuration to validate that you control the domain name in your certificate request. If you use a DNS provider other than Route 53 or if you use a different AWS account to manage DNS records in Route 53, copy the DNS CNAME information from the validation information, or export it to a file (choose Export DNS configuration to a file) and write it to your DNS configuration. For information about how to add or modify DNS records, check with your DNS provider. For more information about using DNS with Route 53 DNS, see the Route 53 documentation.

If you manage DNS records for your domain with Route 53 in the same AWS account, choose Create record in Route 53 to have ACM update your DNS configuration for you.

After updating your DNS configuration, choose Continue to return to the ACM table view.

ACM then displays a table that includes all your certificates. The certificate you requested is displayed so that you can see the status of your request. After you write the DNS record or have ACM write the record for you, it typically takes DNS 30 minutes to propagate the record, and it might take several hours for Amazon to validate it and issue the certificate. During this time, ACM shows the Validation status as Pending validation. After ACM validates the domain name, ACM updates the Validation status to Success. After the certificate is issued, the certificate status is updated to Issued. If ACM cannot validate your DNS record and issue the certificate after 72 hours, the request times out, and ACM displays a Timed out validation status. To recover, you must make a new request. Refer to the Troubleshooting Section of the ACM User Guide for instructions about troubleshooting validation or issuance failures.

Screenshot of a certificate issued and validation successful

You now have an ACM certificate that you can use to secure your application or website. For information about how to deploy certificates with other AWS services, see the documentation for Amazon CloudFront, Amazon API Gateway, Application Load Balancers, and Classic Load Balancers. Note that your certificate must be in the US East (N. Virginia) Region to use the certificate with CloudFront.

ACM automatically renews certificates that are deployed and in use with other AWS services as long as the CNAME record remains in your DNS configuration. To learn more about ACM DNS validation, see the ACM FAQs and the ACM documentation.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about this blog post, start a new thread on the ACM forum or contact AWS Support.

– Todd

Serverless Automated Cost Controls, Part1

Post Syndicated from Shankar Ramachandran original https://aws.amazon.com/blogs/compute/serverless-automated-cost-controls-part1/

This post courtesy of Shankar Ramachandran, Pubali Sen, and George Mao

In line with AWS’s continual efforts to reduce costs for customers, this series focuses on how customers can build serverless automated cost controls. This post provides an architecture blueprint and a sample implementation to prevent budget overruns.

This solution uses the following AWS products:

  • AWS Budgets – An AWS Cost Management tool that helps customers define and track budgets for AWS costs, and forecast for up to three months.
  • Amazon SNS – An AWS service that makes it easy to set up, operate, and send notifications from the cloud.
  • AWS Lambda – An AWS service that lets you run code without provisioning or managing servers.

You can fine-tune a budget for various parameters, for example filtering by service or tag. The Budgets tool lets you post notifications on an SNS topic. A Lambda function that subscribes to the SNS topic can act on the notification. Any programmatically implementable action can be taken.

The diagram below describes the architecture blueprint.

In this post, we describe how to use this blueprint with AWS Step Functions and IAM to effectively revoke the ability of a user to start new Amazon EC2 instances, after a budget amount is exceeded.

Freedom with guardrails

AWS lets you quickly spin up resources as you need them, deploying hundreds or even thousands of servers in minutes. This means you can quickly develop and roll out new applications. Teams can experiment and innovate more quickly and frequently. If an experiment fails, you can always de-provision those servers without risk.

This improved agility also brings in the need for effective cost controls. Your Finance and Accounting department must budget, monitor, and control the AWS spend. For example, this could be a budget per project. Further, Finance and Accounting must take appropriate actions if the budget for the project has been exceeded, for example. Call it “freedom with guardrails” – where Finance wants to give developers freedom, but with financial constraints.

Architecture

This section describes how to use the blueprint introduced earlier to implement a “freedom with guardrails” solution.

  1. The budget for “Project Beta” is set up in Budgets. In this example, we focus on EC2 usage and identify the instances that belong to this project by filtering on the tag Project with the value Beta. For more information, see Creating a Budget.
  2. The budget configuration also includes settings to send a notification on an SNS topic when the usage exceeds 100% of the budgeted amount. For more information, see Creating an Amazon SNS Topic for Budget Notifications.
  3. The master Lambda function receives the SNS notification.
  4. It triggers execution of a Step Functions state machine with the parameters for completing the configured action.
  5. The action Lambda function is triggered as a task in the state machine. The function interacts with IAM to effectively remove the user’s permissions to create an EC2 instance.

This decoupled modular design allows for extensibility.  New actions (serially or in parallel) can be added by simply adding new steps.

Implementing the solution

All the instructions and code needed to implement the architecture have been posted on the Serverless Automated Cost Controls GitHub repo. We recommend that you try this first in a Dev/Test environment.

This implementation description can be broken down into two parts:

  1. Create a solution stack for serverless automated cost controls.
  2. Verify the solution by testing the EC2 fleet.

To tie this back to the “freedom with guardrails” scenario, the Finance department performs a one-time implementation of the solution stack. To simulate resources for Project Beta, the developers spin up the test EC2 fleet.

Prerequisites

There are two prerequisites:

  • Make sure that you have the necessary IAM permissions. For more information, see the section titled “Required IAM permissions” in the README.
  • Define and activate a cost allocation tag with the key Project. For more information, see Using Cost Allocation Tags. It can take up to 12 hours for the tags to propagate to Budgets.

Create resources

The solution stack includes creating the following resources:

  • Three Lambda functions
  • One Step Functions state machine
  • One SNS topic
  • One IAM group
  • One IAM user
  • IAM policies as needed
  • One budget

Two of the Lambda functions were described in the previous section, to a) receive the SNS notification and b) trigger the Step Functions state machine. Another Lambda function is used to create the budget, as a custom AWS CloudFormation resource. The SNS topic connects Budgets with Lambda function A. Lambda function B is configured as a task in Step Functions. A budget for $2 is created which is filtered by Service: EC2 and Tag: Project, Beta. A test IAM group and user is created to enable you to validate this Cost Control Solution.

To create the serverless automated cost control solution stack, choose the button below. It takes few minutes to spin up the stack. You can monitor the progress in the CloudFormation console.

When you see the CREATE_COMPLETE status for the stack you had created, choose Outputs. Copy the following four values that you need later:

  • TemplateURL
  • UserName
  • SignInURL
  • Password

Verify the stack

The next step is to verify the serverless automated cost controls solution stack that you just created. To do this, spin up an EC2 fleet of t2.micro instances, representative of the resources needed for Project Beta, and tag them with Project, Beta.

  1. Browse to the SignInURL, and log in using the UserName and Password values copied on from the stack output.
  2. In the CloudFormation console, choose Create Stack.
  3. For Choose a template, select Choose an Amazon S3 template URL and paste the TemplateURL value from the preceding section. Choose Next.
  4. Give this stack a name, such as “testEc2FleetForProjectBeta”. Choose Next.
  5. On the Specify Details page, enter parameters such as the UserName and Password copied in the previous section. Choose Next.
  6. Ignore any errors related to listing IAM roles. The test user has a minimal set of permissions that is just sufficient to spin up this test stack (in line with security best practices).
  7. On the Options page, choose Next.
  8. On the Review page, choose Create. It takes a few minutes to spin up the stack, and you can monitor the progress in the CloudFormation console. 
  9. When you see the status “CREATE_COMPLETE”, open the EC2 console to verify that four t2.micro instances have been spun up, with the tag of Project, Beta.

The hourly cost for these instances depends on the region in which they are running. On the average (irrespective of the region), you can expect the aggregate cost for this EC2 fleet to exceed the set $2 budget in 48 hours.

Verify the solution

The first step is to identify the test IAM group that was created in the previous section. The group should have “projectBeta” in the name, prepended with the CloudFormation stack name and appended with an alphanumeric string. Verify that the managed policy associated is: “EC2FullAccess”, which indicates that the users in this group have unrestricted access to EC2.

There are two stages of verification for this serverless automated cost controls solution: simulating a notification and waiting for a breach.

Simulated notification

Because it takes at least a few hours for the aggregate cost of the EC2 fleet to breach the set budget, you can verify the solution by simulating the notification from Budgets.

  1. Log in to the SNS console (using your regular AWS credentials).
  2. Publish a message on the SNS topic that has “budgetNotificationTopic” in the name. The complete name is appended by the CloudFormation stack identifier.  
  3. Copy the following text as the body of the notification: “This is a mock notification”.
  4. Choose Publish.
  5. Open the IAM console to verify that the policy for the test group has been switched to “EC2ReadOnly”. This prevents users in this group from creating new instances.
  6. Verify that the test user created in the previous section cannot spin up new EC2 instances.  You can log in as the test user and try creating a new EC2 instance (via the same CloudFormation stack or the EC2 console). You should get an error message indicating that you do not have the necessary permissions.
  7. If you are proceeding to stage 2 of the verification, then you must switch the permissions back to “EC2FullAccess” for the test group, which can be done in the IAM console.

Automatic notification

Within 48 hours, the aggregate cost of the EC2 fleet spun up in the earlier section breaches the budget rule and triggers an automatic notification. This results in the permissions getting switched out, just as in the simulated notification.

Clean up

Use the following steps to delete your resources and stop incurring costs.

  1. Open the CloudFormation console.
  2. Delete the EC2 fleet by deleting the appropriate stack (for example, delete the stack named “testEc2FleetForProjectBeta”).                                               
  3. Next, delete the “costControlStack” stack.                                                                                                                                                    

Conclusion

Using Lambda in tandem with Budgets, you can build Serverless automated cost controls on AWS. Find all the resources (instructions, code) for implementing the solution discussed in this post on the Serverless Automated Cost Controls GitHub repo.

Stay tuned to this series for more tips about building serverless automated cost controls. In the next post, we discuss using smart lighting to influence developer behavior and describe a solution to encourage cost-aware development practices.

If you have questions or suggestions, please comment below.

 

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 2

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-2/

Yesterday in Part 1 of this blog post, I showed you how to:

  1. Launch an Amazon EC2 instance with an AWS Identity and Access Management (IAM) role, an Amazon Elastic Block Store (Amazon EBS) volume, and tags that Amazon EC2 Systems Manager (Systems Manager) and Amazon Inspector use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.

Today in Steps 3 and 4, I show you how to:

  1. Take Amazon EBS snapshots using Amazon EBS Snapshot Scheduler to automate snapshots based on instance tags.
  2. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

To catch up on Steps 1 and 2, see yesterday’s blog post.

Step 3: Take EBS snapshots using EBS Snapshot Scheduler

In this section, I show you how to use EBS Snapshot Scheduler to take snapshots of your instances at specific intervals. To do this, I will show you how to:

  • Determine the schedule for EBS Snapshot Scheduler by providing you with best practices.
  • Deploy EBS Snapshot Scheduler by using AWS CloudFormation.
  • Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.

In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.

Determine the schedule

As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.

In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.

Deploy EBS Snapshot Scheduler

In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:

  1. Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
  2. On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
  3. Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.

Tag your EC2 instances

EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:

  • <snapshot time> – Time in 24-hour format with no colon.
  • <retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
  • <time zone> – The time zone of the times specified in <snapshot time>.
  • <active day(s)>all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.

Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.

Tag Value
scheduler:ebs-snapshot:0900 0900;3;utc;weekdays
scheduler:ebs-snapshot:1000 1000;3;utc;weekdays
scheduler:ebs-snapshot:1100 1100;3;utc;weekdays
scheduler:ebs-snapshot:1200 1200;3;utc;weekdays
scheduler:ebs-snapshot:1300 1300;3;utc;weekdays
scheduler:ebs-snapshot:1400 1400;3;utc;weekdays
scheduler:ebs-snapshot:1500 1500;3;utc;weekdays
scheduler:ebs-snapshot:1600 1600;3;utc;weekdays

Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:

  1. Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
  2. Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
  3. Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
    Screenshot of how your EC2 instance should look in the console
  4. After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.Screenshot of snapshots beginning to populate on the Snapshots page of the EC2 console
  5. To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.Screenshot of checking to see if EBS Snapshot Scheduler is active
  1. You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.Screenshot of monitoring when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule

If you want to restore and attach an EBS volume, see Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Step 4: Use Amazon Inspector

In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:

  • Install the Amazon Inspector agent by using EC2 Run Command.
  • Set up notifications using Amazon SNS to notify you of any findings.
  • Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
  • Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.

Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:

  • Common Vulnerabilities and Exposures
  • Center for Internet Security Benchmarks
  • Runtime Behavior Analysis

In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.

Install the Amazon Inspector agent

To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.

  1. Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
    Screenshot of choosing "Run a command"
  2. To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
    Screenshot of installing the Amazon Inspector agent
  3. Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
    Screenshot showing that the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances

Set up notifications using Amazon SNS

Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.

To set up an SNS topic:

  1. In the AWS Management Console, choose Simple Notification Service under Messaging in the Services menu.
  2. Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
    "Create new topic" page
  3. To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
  4. For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
    Screenshot of editing the topic policy
  5. To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
    Screenshot of subscribing to the new topic

Define an Amazon Inspector target and template

Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.

To create an Amazon Inspector target:

  1. Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
  2. For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
    Screenshot of creating an IAM service role for Amazon Inspector
  3. Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
    Screenshot of defining the Amazon Inspector target
  4. Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
    Screenshot of defining an assessment template
  5. Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
    Screenshot of choosing an assessment template
  6. A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
    Screenshot of choosing the previously created topic and the events about which you want SNS to notify you

Schedule Amazon Inspector assessment runs

The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.

Image of Cron schedule

All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.

To create the CloudWatch Events rule:

  1. Navigate to the CloudWatch console, choose Events, and choose Create rule.
    Screenshot of starting to create a rule in the CloudWatch Events console
  2. On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
  3. You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
    Screenshot of scheduling an Amazon Inspector assessment with a Cron expression
  4. Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
    Screenshot of adding a target
  5. Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
  6. Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
    Screenshot of completing the creation of the rule
  7. To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
    Screenshot of confirming the CloudWatch Events rule works
  8. Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
    Screenshot of confirming the launch of the first assessment run

You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.

Conclusion

In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

If you have comments about today’s or yesterday’s post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

Google Wipes 786 Pirate Sites From Search Results

Post Syndicated from Andy original https://torrentfreak.com/google-wipes-786-pirate-sites-from-search-results-171121/

Late July, President Vladimir Putin signed a new law which requires local telecoms watchdog Rozcomnadzor to maintain a list of banned domains while identifying sites, services, and software that provide access to them.

Rozcomnadzor is required to contact the operators of such services with a request for them to block banned resources. If they do not, then they themselves will become blocked. In addition, search engines are also required to remove blocked resources from their search results, in order to discourage people from accessing them.

Removing entire domains from search results is a controversial practice and something which search providers have long protested against. They argue that it’s not their job to act as censors and in any event, content remains online, whether it’s indexed by search or not.

Nevertheless, on October 1 the new law (“On Information, Information Technologies and Information Protection”) came into effect and it appears that Russia’s major search engines have been very busy in its wake.

According to a report from Rozcomnadzor, search providers Google, Yandex, Mail.ru, Rambler, and Sputnik have stopped presenting information in results for sites that have been permanently blocked by ISPs following a decision by the Moscow City Court.

“To date, search engines have stopped access to 786 pirate sites listed in the register of Internet resources which contain content distributed in violation of intellectual property rights,” the watchdog reports.

The domains aren’t being named by Rozcomnadzor or the search engines but are almost definitely those sites that have had complaints filed against them at the City Court on multiple occasions but have failed to take remedial action. Also included will be mirror and proxy sites which either replicate or facilitate access to these blocked and apparently defiant domains.

The news comes in the wake of reports earlier this month that Russia is considering a rapid site blocking mechanism that could see domains rendered inaccessible within 24 hours, without any parties having to attend a court hearing.

While it’s now extremely clear that Russia has one of the most aggressive site-blocking regimes in the world, with both ISPs and search engines required to prevent access to infringing sites, it’s uncertain whether these measures will be enough to tackle rampant online piracy.

New research published in October by Group-IB revealed that despite thousands of domains being blocked, last year the market for pirate video in Russia more than doubled.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Ultimate 3D printer control with OctoPrint

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/octoprint/

Control and monitor your 3D printer remotely with a Raspberry Pi and OctoPrint.

Timelapse of OctoPrint Ornament

Printed on a bq Witbox STL file can be found here: http://www.thingiverse.com/thing:191635 OctoPrint is located here: http://www.octoprint.org

3D printing

Whether you have a 3D printer at home or use one at your school or local makerspace, it’s fair to assume you’ve had a failed print or two in your time. Filament knotting or running out, your print peeling away from the print bed — these are common issues for all 3D printing enthusiasts, and they can be costly if they’re discovered too late.

OctoPrint

OctoPrint is a free open-source software, created and maintained by Gina Häußge, that performs a multitude of useful 3D printing–related tasks, including remote control of your printer, live video, and data collection.

The OctoPrint logo

Control and monitoring

To control the print process, use OctoPrint on a Raspberry Pi connected to your 3D printer. First, ensure a safe uninterrupted run by using the software to restrict who can access the printer. Then, before starting your print, use the web app to work on your STL file. The app also allows you to reposition the print head at any time, as well as pause or stop printing if needed.

Live video streaming

Since OctoPrint can stream video of your print as it happens, you can watch out for any faults that may require you to abort and restart. Proud of your print? Record the entire process from start to finish and upload the time-lapse video to your favourite social media platform.

OctoPrint software graphic user interface screenshot

Data capture

Octoprint records real-time data, such as the temperature, giving you another way to monitor your print to ensure a smooth, uninterrupted process. Moreover, the records will help with troubleshooting if there is a problem.

OctoPrint software graphic user interface screenshot

Print the Millenium Falcon

OK, you can print anything you like. However, this design definitely caught our eye this week.

3D-Printed Fillenium Malcon (Timelapse)

This is a Timelapse of my biggest print project so far on my own designed/built printer. It’s 500x170x700(mm) and weights 3 Kilograms of Filament.

You can support the work of Gina and OctoPrint by visiting her Patreon account and following OctoPrint on Twitter, Facebook, or G+. And if you’ve set up a Raspberry Pi to run OctoPrint, or you’ve created some cool Pi-inspired 3D prints, make sure to share them with us on our own social media channels.

The post Ultimate 3D printer control with OctoPrint appeared first on Raspberry Pi.

170 ‘Pirate’ IPTV Vendors Throw in the Towel Facing Legal Pressure

Post Syndicated from Ernesto original https://torrentfreak.com/170-pirate-iptv-vendors-throw-the-in-the-towel-facing-legal-pressure-171121/

Pirate streaming boxes are all the rage this year. Not just among the dozens of millions of users, they are on top of the anti-piracy agenda as well.

Dubbed Piracy 3.0 by the MPAA, copyright holders are trying their best to curb this worrisome trend. In the Netherlands local anti-piracy group BREIN is leading the charge.

Backed by the major film studios, the organization booked a significant victory earlier this year against Filmspeler. In this case, the European Court of Justice ruled that selling or using devices pre-configured to obtain copyright-infringing content is illegal.

Paired with the earlier GS Media ruling, which held that companies with a for-profit motive can’t knowingly link to copyright-infringing material, this provides a powerful enforcement tool.

With these decisions in hand, BREIN previously pressured hundreds of streaming box vendors to halt sales of hardware with pirate addons, but it didn’t stop there. This week the group also highlighted its successes against vendors of unauthorized IPTV services.

“BREIN has already stopped 170 illegal providers of illegal media players and/or IPTV subscriptions. Even providers that only offer illegal IPTV subscriptions are being dealt with,” BREIN reports.

In addition to shutting down the trade in IPTV services, the anti-piracy group also removed 375 advertisements for such services from various marketplaces.

“This is illegal commerce. If you wait until you are warned, you are too late,” BREIN director Tim Kuik says.

“You can be held personally liable. You can also be charged and criminally prosecuted. Willingly committing commercial copyright infringement can lead to a 82,000 euro fine and 4 years imprisonment,” he adds.

While most pirate IPTV vendors threw in the towel voluntarily, some received an extra incentive. Twenty signed a settlement with BREIN for varying amounts, up to tens of thousands of euros. They all face further penalties if they continue to sell pirate subscriptions.

In some cases, the courts were involved. This includes the recent lawsuit against MovieStreamer, that was ordered to stop its IPTV hyperlinking activities immediately. Failure to do so will result in a 5,000 euro per day fine. In addition, the vendor was also ordered to pay legal costs of 17,527 euros.

While BREIN has booked plenty of successes already, as exampled here, the pirate streaming box problem is far from solved. The anti-piracy group currently has one case pending in court, but more are likely to follow in the near future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Using AWS CodeCommit Pull Requests to request code reviews and discuss code

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/using-aws-codecommit-pull-requests-to-request-code-reviews-and-discuss-code/

Thank you to Michael Edge, Senior Cloud Architect, for a great blog on CodeCommit pull requests.

~~~~~~~

AWS CodeCommit is a fully managed service for securely hosting private Git repositories. CodeCommit now supports pull requests, which allows repository users to review, comment upon, and interactively iterate on code changes. Used as a collaboration tool between team members, pull requests help you to review potential changes to a CodeCommit repository before merging those changes into the repository. Each pull request goes through a simple lifecycle, as follows:

  • The new features to be merged are added as one or more commits to a feature branch. The commits are not merged into the destination branch.
  • The pull request is created, usually from the difference between two branches.
  • Team members review and comment on the pull request. The pull request might be updated with additional commits that contain changes made in response to comments, or include changes made to the destination branch.
  • Once team members are happy with the pull request, it is merged into the destination branch. The commits are applied to the destination branch in the same order they were added to the pull request.

Commenting is an integral part of the pull request process, and is used to collaborate between the developers and the reviewer. Reviewers add comments and questions to a pull request during the review process, and developers respond to these with explanations. Pull request comments can be added to the overall pull request, a file within the pull request, or a line within a file.

To make the comments more useful, sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user. The username will then be associated with the comment, indicating the owner of the comment. Pull request comments are a great quality improvement tool as they allow the entire development team visibility into what reviewers are looking for in the code. They also serve as a record of the discussion between team members at a point in time, and shouldn’t be deleted.

AWS CodeCommit is also introducing the ability to add comments to a commit, another useful collaboration feature that allows team members to discuss code changed as part of a commit. This helps you discuss changes made in a repository, including why the changes were made, whether further changes are necessary, or whether changes should be merged. As is the case with pull request comments, you can comment on an overall commit, on a file within a commit, or on a specific line or change within a file, and other repository users can respond to your comments. Comments are not restricted to commits, they can also be used to comment on the differences between two branches, or between two tags. Commit comments are separate from pull request comments, i.e. you will not see commit comments when reviewing a pull request – you will only see pull request comments.

A pull request example

Let’s get started by running through an example. We’ll take a typical pull request scenario and look at how we’d use CodeCommit and the AWS Management Console for each of the steps.

To try out this scenario, you’ll need:

  • An AWS CodeCommit repository with some sample code in the master branch. We’ve provided sample code below.
  • Two AWS Identity and Access Management (IAM) users, both with the AWSCodeCommitPowerUser managed policy applied to them.
  • Git installed on your local computer, and access configured for AWS CodeCommit.
  • A clone of the AWS CodeCommit repository on your local computer.

In the course of this example, you’ll sign in to the AWS CodeCommit console as one IAM user to create the pull request, and as the other IAM user to review the pull request. To learn more about how to set up your IAM users and how to connect to AWS CodeCommit with Git, see the following topics:

  • Information on creating an IAM user with AWS Management Console access.
  • Instructions on how to access CodeCommit using Git.
  • If you’d like to use the same ‘hello world’ application as used in this article, here is the source code:
package com.amazon.helloworld;

public class Main {
	public static void main(String[] args) {

		System.out.println("Hello, world");
	}
}

The scenario below uses the us-east-2 region.

Creating the branches

Before we jump in and create a pull request, we’ll need at least two branches. In this example, we’ll follow a branching strategy similar to the one described in GitFlow. We’ll create a new branch for our feature from the main development branch (the default branch). We’ll develop the feature in the feature branch. Once we’ve written and tested the code for the new feature in that branch, we’ll create a pull request that contains the differences between the feature branch and the main development branch. Our team lead (the second IAM user) will review the changes in the pull request. Once the changes have been reviewed, the feature branch will be merged into the development branch.

Figure 1: Pull request link

Sign in to the AWS CodeCommit console with the IAM user you want to use as the developer. You can use an existing repository or you can go ahead and create a new one. We won’t be merging any changes to the master branch of your repository, so it’s safe to use an existing repository for this example. You’ll find the Pull requests link has been added just above the Commits link (see Figure 1), and below Commits you’ll find the Branches link. Click Branches and create a new branch called ‘develop’, branched from the ‘master’ branch. Then create a new branch called ‘feature1’, branched from the ‘develop’ branch. You’ll end up with three branches, as you can see in Figure 2. (Your repository might contain other branches in addition to the three shown in the figure).

Figure 2: Create a feature branch

If you haven’t cloned your repo yet, go to the Code link in the CodeCommit console and click the Connect button. Follow the instructions to clone your repo (detailed instructions are here). Open a terminal or command line and paste the git clone command supplied in the Connect instructions for your repository. The example below shows cloning a repository named codecommit-demo:

git clone https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo

If you’ve previously cloned the repo you’ll need to update your local repo with the branches you created. Open a terminal or command line and make sure you’re in the root directory of your repo, then run the following command:

git remote update origin

You’ll see your new branches pulled down to your local repository.

$ git remote update origin
Fetching origin
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
 * [new branch]      develop    -> origin/develop
 * [new branch]      feature1   -> origin/feature1

You can also see your new branches by typing:

git branch --all

$ git branch --all
* master
  remotes/origin/develop
  remotes/origin/feature1
  remotes/origin/master

Now we’ll make a change to the ‘feature1’ branch. Open a terminal or command line and check out the feature1 branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Make code changes

Edit a file in the repo using your favorite editor and save the changes. Commit your changes to the local repository, and push your changes to CodeCommit. For example:

git commit -am 'added new feature'
git push origin feature1

$ git commit -am 'added new feature'
[feature1 8f6cb28] added new feature
1 file changed, 1 insertion(+), 1 deletion(-)

$ git push origin feature1
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 617 bytes | 617.00 KiB/s, done.
Total 9 (delta 2), reused 0 (delta 0)
To https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   2774a53..8f6cb28  feature1 -> feature1

Creating the pull request

Now we have a ‘feature1’ branch that differs from the ‘develop’ branch. At this point we want to merge our changes into the ‘develop’ branch. We’ll create a pull request to notify our team members to review our changes and check whether they are ready for a merge.

In the AWS CodeCommit console, click Pull requests. Click Create pull request. On the next page select ‘develop’ as the destination branch and ‘feature1’ as the source branch. Click Compare. CodeCommit will check for merge conflicts and highlight whether the branches can be automatically merged using the fast-forward option, or whether a manual merge is necessary. A pull request can be created in both situations.

Figure 3: Create a pull request

After comparing the two branches, the CodeCommit console displays the information you’ll need in order to create the pull request. In the ‘Details’ section, the ‘Title’ for the pull request is mandatory, and you may optionally provide comments to your reviewers to explain the code change you have made and what you’d like them to review. In the ‘Notifications’ section, there is an option to set up notifications to notify subscribers of changes to your pull request. Notifications will be sent on creation of the pull request as well as for any pull request updates or comments. And finally, you can review the changes that make up this pull request. This includes both the individual commits (a pull request can contain one or more commits, available in the Commits tab) as well as the changes made to each file, i.e. the diff between the two branches referenced by the pull request, available in the Changes tab. After you have reviewed this information and added a title for your pull request, click the Create button. You will see a confirmation screen, as shown in Figure 4, indicating that your pull request has been successfully created, and can be merged without conflicts into the ‘develop’ branch.

Figure 4: Pull request confirmation page

Reviewing the pull request

Now let’s view the pull request from the perspective of the team lead. If you set up notifications for this CodeCommit repository, creating the pull request would have sent an email notification to the team lead, and he/she can use the links in the email to navigate directly to the pull request. In this example, sign in to the AWS CodeCommit console as the IAM user you’re using as the team lead, and click Pull requests. You will see the same information you did during creation of the pull request, plus a record of activity related to the pull request, as you can see in Figure 5.

Figure 5: Team lead reviewing the pull request

Commenting on the pull request

You now perform a thorough review of the changes and make a number of comments using the new pull request comment feature. To gain an overall perspective on the pull request, you might first go to the Commits tab and review how many commits are included in this pull request. Next, you might visit the Changes tab to review the changes, which displays the differences between the feature branch code and the develop branch code. At this point, you can add comments to the pull request as you work through each of the changes. Let’s go ahead and review the pull request. During the review, you can add review comments at three levels:

  • The overall pull request
  • A file within the pull request
  • An individual line within a file

The overall pull request
In the Changes tab near the bottom of the page you’ll see a ‘Comments on changes’ box. We’ll add comments here related to the overall pull request. Add your comments as shown in Figure 6 and click the Save button.

Figure 6: Pull request comment

A specific file in the pull request
Hovering your mouse over a filename in the Changes tab will cause a blue ‘comments’ icon to appear to the left of the filename. Clicking the icon will allow you to enter comments specific to this file, as in the example in Figure 7. Go ahead and add comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 7: File comment

A specific line in a file in the pull request
A blue ‘comments’ icon will appear as you hover over individual lines within each file in the pull request, allowing you to create comments against lines that have been added, removed or are unchanged. In Figure 8, you add comments against a line that has been added to the source code, encouraging the developer to review the naming standards. Go ahead and add line comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 8: Line comment

A pull request that has been commented at all three levels will look similar to Figure 9. The pull request comment is shown expanded in the ‘Comments on changes’ section, while the comments at file and line level are shown collapsed. A ‘comment’ icon indicates that comments exist at file and line level. Clicking the icon will expand and show the comment. Since you are expecting the developer to make further changes based on your comments, you won’t merge the pull request at this stage, but will leave it open awaiting feedback. Each comment you made results in a notification being sent to the developer, who can respond to the comments. This is great for remote working, where developers and team lead may be in different time zones.

Figure 9: Fully commented pull request

Adding a little complexity

A typical development team is going to be creating pull requests on a regular basis. It’s highly likely that the team lead will merge other pull requests into the ‘develop’ branch while pull requests on feature branches are in the review stage. This may result in a change to the ‘Mergable’ status of a pull request. Let’s add this scenario into the mix and check out how a developer will handle this.

To test this scenario, we could create a new pull request and ask the team lead to merge this to the ‘develop’ branch. But for the sake of simplicity we’ll take a shortcut. Clone your CodeCommit repo to a new folder, switch to the ‘develop’ branch, and make a change to one of the same files that were changed in your pull request. Make sure you change a line of code that was also changed in the pull request. Commit and push this back to CodeCommit. Since you’ve just changed a line of code in the ‘develop’ branch that has also been changed in the ‘feature1’ branch, the ‘feature1’ branch cannot be cleanly merged into the ‘develop’ branch. Your developer will need to resolve this merge conflict.

A developer reviewing the pull request would see the pull request now looks similar to Figure 10, with a ‘Resolve conflicts’ status rather than the ‘Mergable’ status it had previously (see Figure 5).

Figure 10: Pull request with merge conflicts

Reviewing the review comments

Once the team lead has completed his review, the developer will review the comments and make the suggested changes. As a developer, you’ll see the list of review comments made by the team lead in the pull request Activity tab, as shown in Figure 11. The Activity tab shows the history of the pull request, including commits and comments. You can reply to the review comments directly from the Activity tab, by clicking the Reply button, or you can do this from the Changes tab. The Changes tab shows the comments for the latest commit, as comments on previous commits may be associated with lines that have changed or been removed in the current commit. Comments for previous commits are available to view and reply to in the Activity tab.

In the Activity tab, use the shortcut link (which looks like this </>) to move quickly to the source code associated with the comment. In this example, you will make further changes to the source code to address the pull request review comments, so let’s go ahead and do this now. But first, you will need to resolve the ‘Resolve conflicts’ status.

Figure 11: Pull request activity

Resolving the ‘Resolve conflicts’ status

The ‘Resolve conflicts’ status indicates there is a merge conflict between the ‘develop’ branch and the ‘feature1’ branch. This will require manual intervention to restore the pull request back to the ‘Mergable’ state. We will resolve this conflict next.

Open a terminal or command line and check out the develop branch by running the following command:

git checkout develop

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

To incorporate the changes the team lead made to the ‘develop’ branch, merge the remote ‘develop’ branch with your local copy:

git pull

$ git pull
remote: Counting objects: 9, done.
Unpacking objects: 100% (9/9), done.
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   af13c82..7b36f52  develop    -> origin/develop
Updating af13c82..7b36f52
Fast-forward
 src/main/java/com/amazon/helloworld/Main.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Then checkout the ‘feature1’ branch:

git checkout feature1

$ git checkout feature1
Switched to branch 'feature1'
Your branch is up-to-date with 'origin/feature1'.

Now merge the changes from the ‘develop’ branch into your ‘feature1’ branch:

git merge develop

$ git merge develop
Auto-merging src/main/java/com/amazon/helloworld/Main.java
CONFLICT (content): Merge conflict in src/main/java/com/amazon/helloworld/Main.java
Automatic merge failed; fix conflicts and then commit the result.

Yes, this fails. The file Main.java has been changed in both branches, resulting in a merge conflict that can’t be resolved automatically. However, Main.java will now contain markers that indicate where the conflicting code is, and you can use these to resolve the issues manually. Edit Main.java using your favorite IDE, and you’ll see it looks something like this:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

<<<<<<< HEAD
        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earthling. Today's date is: " + todaysdate);
=======
      System.out.println("Hello, earth");
>>>>>>> develop
   }
}

The code between HEAD and ‘===’ is the code the developer added in the ‘feature1’ branch (HEAD represents ‘feature1’ because this is the current checked out branch). The code between ‘===’ and ‘>>> develop’ is the code added to the ‘develop’ branch by the team lead. We’ll resolve the conflict by manually merging both changes, resulting in an updated Main.java:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysdate);
   }
}

After saving the change you can add and commit it to your local repo:

git add src/
git commit -m 'fixed merge conflict by merging changes'

Fixing issues raised by the reviewer

Now you are ready to address the comments made by the team lead. If you are no longer pointing to the ‘feature1’ branch, check out the ‘feature1’ branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Edit the source code in your favorite IDE and make the changes to address the comments. In this example, the developer has updated the source code as follows:

package com.amazon.helloworld;

import java.util.*;

/**
 *  This class prints a hello world message
 *
 * @author Michael Edge
 * @see HelloEarth
 * @version 1.0
 */

public class Main {
   public static void main(String[] args) {

        Date todaysDate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysDate);
   }
}

After saving the changes, commit and push to the CodeCommit ‘feature1’ branch as you did previously:

git commit -am 'updated based on review comments'
git push origin feature1

Responding to the reviewer

Now that you’ve fixed the code issues you will want to respond to the review comments. In the AWS CodeCommit console, check that your latest commit appears in the pull request Commits tab. You now have a pull request consisting of more than one commit. The pull request in Figure 12 has four commits, which originated from the following activities:

  • 8th Nov: the original commit used to initiate this pull request
  • 10th Nov, 3 hours ago: the commit by the team lead to the ‘develop’ branch, merged into our ‘feature1’ branch
  • 10th Nov, 24 minutes ago: the commit by the developer that resolved the merge conflict
  • 10th Nov, 4 minutes ago: the final commit by the developer addressing the review comments

Figure 12: Pull request with multiple commits

Let’s reply to the review comments provided by the team lead. In the Activity tab, reply to the pull request comment and save it, as shown in Figure 13.

Figure 13: Replying to a pull request comment

At this stage, your code has been committed and you’ve updated your pull request comments, so you are ready for a final review by the team lead.

Final review

The team lead reviews the code changes and comments made by the developer. As team lead, you own the ‘develop’ branch and it’s your decision on whether to merge the changes in the pull request into the ‘develop’ branch. You can close the pull request with or without merging using the Merge and Close buttons at the bottom of the pull request page (see Figure 13). Clicking Close will allow you to add comments on why you are closing the pull request without merging. Merging will perform a fast-forward merge, incorporating the commits referenced by the pull request. Let’s go ahead and click the Merge button to merge the pull request into the ‘develop’ branch.

Figure 14: Merging the pull request

After merging a pull request, development of that feature is complete and the feature branch is no longer needed. It’s common practice to delete the feature branch after merging. CodeCommit provides a check box during merge to automatically delete the associated feature branch, as seen in Figure 14. Clicking the Merge button will merge the pull request into the ‘develop’ branch, as shown in Figure 15. This will update the status of the pull request to ‘Merged’, and will close the pull request.

Conclusion

This blog has demonstrated how pull requests can be used to request a code review, and enable reviewers to get a comprehensive summary of what is changing, provide feedback to the author, and merge the code into production. For more information on pull requests, see the documentation.

UK Government Publishes Advice on ‘Illicit Streaming Devices’

Post Syndicated from Andy original https://torrentfreak.com/uk-government-publishes-advice-on-illicit-streaming-devices-171120/

With torrents and other methods of obtaining content simmering away in the background, unauthorized streaming is the now the method of choice for millions of pirates around the globe.

Previously accessible only via a desktop browser, streaming is now available on a wide range of devices, from tablets and phones through to dedicated set-top box. These, collectively, are now being branded Illicit Streaming Devices (ISD) by the entertainment industries.

It’s terminology the UK government’s Intellectual Property Office has adopted this morning. In a new public advisory, the IPO notes that illicit streaming is the watching of content without the copyright owner’s permission using a variety of devices.

“Illicit streaming devices are physical boxes that are connected to your TV or USB sticks that plug into the TV such as adapted Amazon Fire sticks and so called ‘Kodi’ boxes or Android TV boxes,” the IPO reports.

“These devices are legal when used to watch legitimate, free to air, content. They become illegal once they are adapted to stream illicit content, for example TV programmes, films and subscription sports channels without paying the appropriate subscriptions.”

The IPO notes that streaming devices usually need to be loaded with special software add-ons in order to view copyright-infringing content. However, there are now dedicated apps available to view movies and TV shows which can be loaded straight on to smartphones and tablets.

But how can people know if the device they have is an ISD or not? According to the IPO it’s all down to common sense. If people usually charge for the content you’re getting for free, it’s illegal.

“If you are watching television programmes, films or sporting events where you would normally be paying to view them and you have not paid, you are likely to be using an illicit streaming device (ISD) or app. This could include a film recently released in the cinema, a sporting event that is being broadcast by BT Sport or a television programme, like Game of Thrones, that is only available on Sky,” the IPO says.

In an effort to familiarize the public with some of the terminology used by ISD sellers on eBay, Amazon or Gumtree, for example, the IPO then wanders into a bit of a minefield that really needs much greater clarification.

First up, the government states that ISDs are often described online as being “Fully loaded”, which is a colloquial term for a device with addons already installed. Although they won’t all be infringing, it’s very often the case that the majority are intended to be, so no problems here.

However, the IPO then says that people should keep an eye out for the term ‘jail broken’, which many readers will understand to be the process some hardware devices, such as Apple products, are put through in order for third-party software to be run on them. On occasion, some ISD sellers do put this term on Android devices, for example, but it’s incorrect, in a tiny minority, and of course misleading.

The IPO also warns people against devices marketed as “Plug and Play” but again this is a dual-use term and shouldn’t put consumers off a purchase without a proper investigation. A search on eBay this morning for that exact term didn’t yield any ISDs at all, only games consoles that can be plugged in and played with a minimum of fuss.

“Subscription Gift”, on the other hand, almost certainly references an illicit IPTV or satellite card-sharing subscription and is rarely used for anything else. 100% illegal, no doubt.

The government continues by giving reasons why people should avoid ISDs, not least since their use deprives the content industries of valuable revenue.

“[The creative industries] provide employment for more than 1.9 million people and contributes £84.1 billion to our economy. Using illicit streaming devices is illegal,” the IPO writes.

“If you are not paying for this content you are depriving industry of the revenue it needs to fund the next generation of TV programmes, films and sporting events we all enjoy. Instead it provides funds for the organized criminals who sell or adapt these illicit devices.”

Then, in keeping with the danger-based narrative employed by the entertainment industries’ recently, the government also warns that ISDs can have a negative effect on child welfare, not to mention on physical safety in the home.

“These devices often lack parental controls. Using them could expose children or young people to explicit or age inappropriate content,” the IPO warns.

“Another important reason for consumers to avoid purchasing these streaming devices is from an electrical safety point of view. Where devices and their power cables have been tested, some have failed EU safety standards and have the potential to present a real danger to the public, causing a fire in your home or premises.”

While there can be no doubt whatsoever that failing EU electrical standards in any way is unacceptable for any device, the recent headlines stating that “Kodi Boxes Can Kill Their Owners” are sensational at best and don’t present the full picture.

As reported this weekend, simply not having a recognized branding on such devices means that they fail electrical standards, with non-genuine phone chargers presenting a greater risk around the UK.

Finally, the government offers some advice for people who either want to get off the ISD gravy train or ensure that others don’t benefit from it.

“These devices can be used legally by removing the software. If you are unsure get advice to help you use the device legally. If you wish to watch content that’s only available via subscription, such as sports, you should approach the relevant provider to find out about legal ways to watch,” the IPO advises.

Get it Right from a Genuine Site helps you get the music, TV, films, games, books, newspapers, magazines and sport that you love from genuine services.”

And, if the public thinks that people selling such devices deserve a visit from the authorities, people are asked to report them to the Crimestoppers charity via an anonymous hotline.

The government’s guidance is exactly what one might expect, given that the advisory is likely to have been strongly assisted by companies including the Federation Against Copyright Theft, Premier League, and Sky, who have taken the lead in this area during the past year or so.

The big question is, however, whether many people using these devices really believe that obtaining subscription TV, movies, and sports for next to free is 100% legal. If there are people out there they must be in the minority but at least the government itself is now putting them on the right path.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Why Linus is right (as usual)

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/why-linus-is-right-as-usual.html

People are debating this email from Linus Torvalds (maintainer of the Linux kernel). It has strong language, like:

Some security people have scoffed at me when I say that security
problems are primarily “just bugs”.
Those security people are f*cking morons.
Because honestly, the kind of security person who doesn’t accept that
security problems are primarily just bugs, I don’t want to work with.

I thought I’d explain why Linus is right.
Linus has an unwritten manifesto of how the Linux kernel should be maintained. It’s not written down in one place, instead we are supposed to reverse engineer it from his scathing emails, where he calls people morons for not understanding it. This is one such scathing email. The rules he’s expressing here are:
  • Large changes to the kernel should happen in small iterative steps, each one thoroughly debugged.
  • Minor security concerns aren’t major emergencies; they don’t allow bypassing the rules more than any other bug/feature.
Last year, some security “hardening” code was added to the kernel to prevent a class of buffer-overflow/out-of-bounds issues. This code didn’t address any particular 0day vulnerability, but was designed to prevent a class of future potential exploits from being exploited. This is reasonable.
This code had bugs, but that’s no sin. All code has bugs.
The sin, from Linus’s point of view, is that when an overflow/out-of-bounds access was detected, the code would kill the user-mode process or kernel. Linus thinks it should have only generated warnings, and let the offending code continue to run.
Of course, that would in theory make the change of little benefit, because it would no longer prevent 0days from being exploited.
But warnings would only be temporary, the first step. There’s likely to be be bugs in the large code change, and it would probably uncover bugs in other code. While bounds-checking is a security issue, it’s first implementation will always find existing code having latent bounds bugs. Or, it’ll have “false-positives” triggering on things that aren’t actually the flaws its looking for. Killing things made these bugs worse, causing catastrophic failures in the latest kernel that didn’t exist before. Warnings, however, would have equally highlighted the bugs, but without causing catastrophic failures. My car runs multiple copies of Linux — such catastrophic failures would risk my life.
Only after a year, when the bugs have been fixed, would the default behavior of the code be changed to kill buggy code, thus preventing exploitation.
In other words, large changes to the kernel should happen in small, manageable steps. This hardening hasn’t existed for 25 years of the Linux kernel, so there’s no emergency requiring it be added immediately rather than conservatively, no reason to bypass Linus’s development processes. There’s no reason it couldn’t have been warnings for a year while working out problems, followed by killing buggy code later.
Linus was correct here. No vuln has appeared in the last year that this code would’ve stopped, so the fact that it killed processes/kernels rather than generated warnings was unnecessary. Conversely, because it killed things, bugs in the kernel code were costly, and required emergency patches.
Despite his unreasonable tone, Linus is a hugely reasonable person. He’s not trying to stop changes to the kernel. He’s not trying to stop security improvements. He’s not even trying to stop processes from getting killed That’s not why people are moronic. Instead, they are moronic for not understanding that large changes need to made conservatively, and security issues are no more important than any other feature/bug.

Update: Also, since most security people aren’t developers, they are also a bit clueless how things actually work. Bounds-checking, which they define as purely a security feature to stop buffer-overflows is actually overwhelmingly a debugging feature. When you turn on bounds-checking for the first time, it’ll trigger on a lot of latent bugs in the code — things that never caused a problem in the past (like reading past ends of buffers) but cause trouble now. Developers know this, security “experts” tend not to. These kernel changes were made by security people who failed to understand this, who failed to realize that their changes would uncover lots of bugs in existing code, and that killing buggy code was hugely inappropriate.

Update: Another flaw developers are intimately familiar with is how “hardening” code can cause false-positives, triggering on non-buggy code. A good example is where the BIND9 code crashed on an improper assert(). This hardening code designed to prevent exploitation made things worse by triggering on valid input/code.

Update: No, it’s probably not okay to call people “morons” as Linus does. They may be wrong, but they usually are reasonable people. On the other hand, security people tend to be sanctimonious bastards with rigid thinking, so after he as dealt with that minority, I can see why Linus treats all security people that way.

Kodi Addon Dev Says “Show of Force” Will Be Met With Defiance

Post Syndicated from Andy original https://torrentfreak.com/kodi-addon-dev-says-show-force-will-met-defiance-171119/

For many years, the members of the MPAA have flexed their muscles all around the globe, working to prevent people from engaging in online piracy. If the last 17 years ‘progress’ is anything to go by, it’s a war that will go on indefinitely.

With Columbia, Disney, Paramount, Twentieth Century Fox, Universal, and Warner on board, the MPAA has historically relied on sheer power to intimidate opponents. That has certainly worked in many large piracy cases but for many peripheral smaller-scale pirates, their presence is largely ignored.

This week, however, several players in the Kodi scene discovered that these giants – and more besides – have the ability to literally turn up at their front door. As reported Thursday, UK-based Kodi addon developer The_Alpha received a hand-delivered cease-and-desist letter from all of the above, accompanied by new faces Netflix, Amazon and Sky TV.

These companies are part of the Alliance for Creativity and Entertainment (ACE), a massive and recently-formed anti-piracy coalition comprised of 30 global entertainment brands. TorrentFreak reached out to The_Alpha for his thoughts on coming under such a dazzling spotlight but perhaps understandably he didn’t want to comment.

The leader of the Ares Project was willing to go on the record, however, after he too received a hand-delivered threat during the week. His decision was to immediately comply and shutdown but TF is informed that others might not be so willing to follow suit.

A Kodi addon developer living in the UK who spoke to us on condition of anonymity told us that most people operating in the scene expected some kind of trouble – just not on this scale.

“Did you see the [company logos] across the top of Alpha’s letter? That’s some serious shit right there. The film companies are no surprise but Amazon delivers my groceries so I don’t expect this shit from them,” he said.

When the ACE partnership was formed earlier this year, it seemed pretty clear that the main drive was towards the pooling of anti-piracy resources to be more effective and efficient. However, it can’t have escaped ACE that such a broad and powerful alliance could also have a profound psychological effect on its adversaries.

“There’s no doubt in my mind that they’re turning up mob-handed to put the shits up people like Alpha and the rest of us,” the developer said. “It’s hardly a fair dust-up is it? What have we got to fight back with, a giro [state benefits]? It’s a show of force, ‘look how important we are’!”

Interestingly, however, the dev told us that it isn’t necessarily the size of the coalition that has him most concerned. What caught his eye was the inclusion of two influential UK-based companies in the alliance.

“Having Sly [a local derogatory nickname for Sky TV] and the Premier League on the letter makes it much more serious to me than seeing Warner or whatever,” he commented.

“I don’t get involved in footie but Sly is everywhere round here and I think it’s something the Brit dev scene might take notice of, even if most say ‘fuck it’ and carry on anyway.”

When questioned whether that’s likely, our source said that while ACE might be able to tackle some of the bigger targets like Ares Project or Colossus, they fundamentally misunderstand how the Kodi scene works.

“If you want a good example of a scattered pirate scene, I give you Kodi. They can bomb the base or whatever but nobody lives there,” he explained.

“There’s some older blokes like me who can do without the stress but a lot of younger coders, builders and YouTubers who thrive on it. They’re used to running around council estates with real-life problems. A faffy letter from some toff in a suit means literally nothing. Like I said, all they have to lose is a giro.”

Whether this is just bravado will remain to be seen, but our earlier discussions with others in the scene indicate a particular weakness in the UK, with many players vulnerable to being found after failing to hide their identities in the past. To a point, our source agrees that this is a problem.

“People are saying that Alpha was found after trying to raise some charity money related to his disabled son but I don’t know for sure and nor does anybody else. What strikes me is that none of us really thought things would get this on top here because all you ever hear about is America this, Canada that, whatever. Does this means that more of us are getting done in England? You tell me,” he said.

Only time will tell but stamping out the pirate Kodi scene is going to be hard work.

Within hours of several projects disappearing Wednesday and Thursday, YouTube and myriad blogs were being flooded with guides detailing immediate replacements. This ad-hoc network of enthusiasts makes the exchange of information happen at an alarming rate and it’s hard to see how any company – no matter how powerful – will ever be able to keep up.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

The Truth Behind the “Kodi Boxes Can Kill Their Owners” Headlines

Post Syndicated from Andy original https://torrentfreak.com/the-truth-behind-the-kodi-boxes-can-kill-their-owners-headlines-171118/

Another week, another batch of ‘Kodi Box Armageddon’ stories. This time it hasn’t been directly about the content they can provide but the physical risks they pose to their owners.

After being primed in advance, the usual British tabloids jumped into action early Thursday, noting that following tests carried out on “illicit streaming devices” (aka Android set-top devices), 100% of them failed to meet UK national electrical safety regulations.

The tests were carried out by Electrical Safety First, a charity which was prompted into action by anti-piracy outfit Federation Against Copyright Theft.

“A series of product safety tests on popular illicit streaming devices entering the UK have found that 100% fail to meet national electrical safety regulations,” a FACT statement reads.

“The news is all the more significant as the Intellectual Property Office (IPO) estimates that more than one million of these illegal devices have been sold in the UK in the last two years, representing a significant risk to the general public.”

After reading many sensational headlines stating that “Kodi Boxes Might Kill Their Owners”, please excuse us for groaning. This story has absolutely nothing – NOTHING – to do with Kodi or any other piece of software. Quite obviously, software doesn’t catch fire.

So, suspecting that there might be more to this than meets the eye, we decided to look beyond the press releases into the actual Electrical Safety First (ESF) report. While we have no doubt that ESF is extremely competent in its field (it is, no question), the front page of its report is disappointing.

Despite the items sent for testing being straightforward Android-based media players, the ESF report clearly describes itself as examining “illicit streaming devices”. It’s terminology that doesn’t describe the subject matter from an electrical, safety or technical perspective but is pretty convenient for FACT clients Sky and the Premier League.

Nevertheless, the full picture reveals rather more than most of the headlines suggest.

First of all, it’s important to know that ESF tested just nine devices out of the million or so allegedly sold in the UK during the past two years. Even more importantly, every single one of those devices was supplied to ESF by FACT.

Now, we’re not suggesting they were hand-picked to fail but it’s clear that the samples weren’t provided from a neutral source. Also, as we’ll learn shortly, it’s possible to determine in advance if an item will fail to meet UK standards simply by looking at its packaging and casing.

But perhaps even more intriguing is that the electrical testing carried out by ESF related primarily not to the set-top boxes themselves, but to their power supplies. ESF say so themselves.

“The product review relates primarily to the switched mode power supply units for the connection to the mains supply, which were supplied with the devices, to identify any potential risks to consumers such as electric shocks, heating and resistance to fire,” ESF reports.

The set-top boxes themselves were only assessed “in terms of any faults in the marking, warnings and instructions,” the group adds.

So, what we’re really talking about here isn’t dangerous illicit streaming devices set-top boxes, but the power supply units that come with them. It might seem like a small detail but we’ll come to the vast importance of this later on.

Firstly, however, we should note that none of the equipment supplied by FACT complied with Schedule 1 of the Electrical Equipment (Safety) Regulations 1994. This means that they failed to have the “Conformité Européene” or CE logo present. That’s unacceptable.

In addition, none of them lived up the requirements of Schedule 3 of the Electrical Equipment (Safety) Regulations 1994 either, which in part requires the manufacturer’s brand name or trademark to be “clearly printed on the electrical equipment or, where that is not possible, on the packaging.” (That’s how you can tell they’ll definitely fail UK standards, before sending them for testing)

Also, none of the samples were supplied with “sufficient safety or warning information to ensure the safe and correct use, assembly, installation or maintenance of the equipment.” This represents ‘a technical breach’ of the regulations, ESF reports.

Finally, several of the samples were considered to be a potential risk to their users, either via electric shock and/or fire. That’s an important finding and people who suspect they have such devices at home should definitely take note.

However, the really important point isn’t mentioned in the tabloids, probably since it distracts from the “Kodi Armageddon” narrative which underlies the whole study and subsequent reports.

ESF says that one of the key issues is that the set-top boxes come unbranded, something which breaches safety regulations while making it difficult for consumers to assess whether they’re buying a quality product. Crucially, this is not exclusively a set-top box problem, it is much, MUCH bigger.

“Issues with power supply units or unbranded and counterfeit chargers go beyond illicit streaming devices. In the last year, issues have been reported with other consumer electrical devices, such as laptop chargers and counterfeit phone chargers,” the same ESF report reveals.

“The total annual online sales of mains plug-in chargers is estimated to be in the region of 1.8 million and according to Electrical Safety First, it is likely that most of these sales involve cheap, unbranded chargers.”

So, we looked into this issue of problem power supplies and chargers generally, to see where this report fits into the bigger picture. It transpires it’s a massive problem, all over the UK, across a wide range of products. In fact, Trading Standards reports that 99% of non-genuine Apple chargers bought online “fail a basic safety test”.

But buying from reputable High Street retailers doesn’t help either.

During the past year, Poundworld was fined for selling – wait for it – 72,000 dangerous chargers. Home Bargains was also fined for selling “thousands” of power adaptors that fail to meet UK standards.

“All samples provided failed to comply with Electrical Equipment Safety Regulations and were not marked with the manufacturer’s name,” Trading Standards reports.

That sounds familiar.

So, there you have it. Far from this being an isolated “Kodi Box Crisis” as some have proclaimed, this is a broad issue affecting imported electrical items in general. On this basis, one can’t help but think the tabloids missed a trick here. Think of the power of this headline:

ALL UNBRANDED ELECTRICAL EQUIPMENT CAN KILL, DISCONNECT EVERYTHING

or, alternatively:

PIRATES URGED TO SWITCH TO BRANDED AMAZON FIRESTICKS, SAFER FOR KODI

Perhaps not….

The ESF report can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Resume AWS Step Functions from Any State

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/


Yash Pant, Solutions Architect, AWS


Aaron Friedman, Partner Solutions Architect, AWS

When we discuss how to build applications with customers, we often align to the Well Architected Framework pillars of security, reliability, performance efficiency, cost optimization, and operational excellence. Designing for failure is an essential component to developing well architected applications that are resilient to spurious errors that may occur.

There are many ways you can use AWS services to achieve high availability and resiliency of your applications. For example, you can couple Elastic Load Balancing with Auto Scaling and Amazon EC2 instances to build highly available applications. Or use Amazon API Gateway and AWS Lambda to rapidly scale out a microservices-based architecture. Many AWS services have built in solutions to help with the appropriate error handling, such as Dead Letter Queues (DLQ) for Amazon SQS or retries in AWS Batch.

AWS Step Functions is an AWS service that makes it easy for you to coordinate the components of distributed applications and microservices. Step Functions allows you to easily design for failure, by incorporating features such as error retries and custom error handling from AWS Lambda exceptions. These features allow you to programmatically handle many common error modes and build robust, reliable applications.

In some rare cases, however, your application may fail in an unexpected manner. In these situations, you might not want to duplicate in a repeat execution those portions of your state machine that have already run. This is especially true when orchestrating long-running jobs or executing a complex state machine as part of a microservice. Here, you need to know the last successful state in your state machine from which to resume, so that you don’t duplicate previous work. In this post, we present a solution to enable you to resume from any given state in your state machine in the case of an unexpected failure.

Resuming from a given state

To resume a failed state machine execution from the state at which it failed, you first run a script that dynamically creates a new state machine. When the new state machine is executed, it resumes the failed execution from the point of failure. The script contains the following two primary steps:

  1. Parse the execution history of the failed execution to find the name of the state at which it failed, as well as the JSON input to that state.
  2. Create a new state machine, which adds an additional state to failed state machine, called "GoToState". "GoToState" is a choice state at the beginning of the state machine that branches execution directly to the failed state, allowing you to skip states that had succeeded in the previous execution.

The full script along with a CloudFormation template that creates a demo of this is available in the aws-sfn-resume-from-any-state GitHub repo.

Diving into the script

In this section, we walk you through the script and highlight the core components of its functionality. The script contains a main function, which adds a command line parameter for the failedExecutionArn so that you can easily call the script from the command line:

python gotostate.py --failedExecutionArn '<Failed_Execution_Arn>'

Identifying the failed state in your execution

First, the script extracts the name of the failed state along with the input to that state. It does so by using the failed state machine execution history, which is identified by the Amazon Resource Name (ARN) of the execution. The failed state is marked in the execution history, along with the input to that state (which is also the output of the preceding successful state). The script is able to parse these values from the log.

The script loops through the execution history of the failed state machine, and traces it backwards until it finds the failed state. If the state machine failed in a parallel state, then it must restart from the beginning of the parallel state. The script is able to capture the name of the parallel state that failed, rather than any substate within the parallel state that may have caused the failure. The following code is the Python function that does this.


def parseFailureHistory(failedExecutionArn):

    '''
    Parses the execution history of a failed state machine to get the name of failed state and the input to the failed state:
    Input failedExecutionArn = A string containing the execution ARN of a failed state machine y
    Output = A list with two elements: [name of failed state, input to failed state]
    '''
    failedAtParallelState = False
    try:
        #Get the execution history
        response = client.get\_execution\_history(
            executionArn=failedExecutionArn,
            reverseOrder=True
        )
        failedEvents = response['events']
    except Exception as ex:
        raise ex
    #Confirm that the execution actually failed, raise exception if it didn't fail.
    try:
        failedEvents[0]['executionFailedEventDetails']
    except:
        raise('Execution did not fail')
        
    '''
    If you have a 'States.Runtime' error (for example, if a task state in your state machine attempts to execute a Lambda function in a different region than the state machine), get the ID of the failed state, and use it to determine the failed state name and input.
    '''
    
    if failedEvents[0]['executionFailedEventDetails']['error'] == 'States.Runtime':
        failedId = int(filter(str.isdigit, str(failedEvents[0]['executionFailedEventDetails']['cause'].split()[13])))
        failedState = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['name']
        failedInput = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['input']
        return (failedState, failedInput)
        
    '''
    You need to loop through the execution history, tracing back the executed steps.
    The first state you encounter is the failed state. If you failed on a parallel state, you need the name of the parallel state rather than the name of a state within a parallel state that it failed on. This is because you can only attach goToState to the parallel state, but not a substate within the parallel state.
    This loop starts with the ID of the latest event and uses the previous event IDs to trace back the execution to the beginning (id 0). However, it returns as soon it finds the name of the failed state.
    '''

    currentEventId = failedEvents[0]['id']
    while currentEventId != 0:
        #multiply event ID by -1 for indexing because you're looking at the reversed history
        currentEvent = failedEvents[-1 \* currentEventId]
        
        '''
        You can determine if the failed state was a parallel state because it and an event with 'type'='ParallelStateFailed' appears in the execution history before the name of the failed state
        '''

        if currentEvent['type'] == 'ParallelStateFailed':
            failedAtParallelState = True

        '''
        If the failed state is not a parallel state, then the name of failed state to return is the name of the state in the first 'TaskStateEntered' event type you run into when tracing back the execution history
        '''

        if currentEvent['type'] == 'TaskStateEntered' and failedAtParallelState == False:
            failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)

        '''
        If the failed state was a parallel state, then you need to trace execution back to the first event with 'type'='ParallelStateEntered', and return the name of the state
        '''

        if currentEvent['type'] == 'ParallelStateEntered' and failedAtParallelState:
            failedState = failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)
        #Update the ID for the next execution of the loop
        currentEventId = currentEvent['previousEventId']
        

Create the new state machine

The script uses the name of the failed state to create the new state machine, with "GoToState" branching execution directly to the failed state.

To do this, the script requires the Amazon States Language (ASL) definition of the failed state machine. It modifies the definition to append "GoToState", and create a new state machine from it.

The script gets the ARN of the failed state machine from the execution ARN of the failed state machine. This ARN allows it to get the ASL definition of the failed state machine by calling the DesribeStateMachine API action. It creates a new state machine with "GoToState".

When the script creates the new state machine, it also adds an additional input variable called "resuming". When you execute this new state machine, you specify this resuming variable as true in the input JSON. This tells "GoToState" to branch execution to the state that had previously failed. Here’s the function that does this:

def attachGoToState(failedStateName, stateMachineArn):

    '''
    Given a state machine ARN and the name of a state in that state machine, create a new state machine that starts at a new choice state called 'GoToState'. "GoToState" branches to the named state, and sends the input of the state machine to that state, when a variable called "resuming" is set to True.
    Input failedStateName = A string with the name of the failed state
          stateMachineArn = A string with the ARN of the state machine
    Output response from the create_state_machine call, which is the API call that creates a new state machine
    '''

    try:
        response = client.describe\_state\_machine(
            stateMachineArn=stateMachineArn
        )
    except:
        raise('Could not get ASL definition of state machine')
    roleArn = response['roleArn']
    stateMachine = json.loads(response['definition'])
    #Create a name for the new state machine
    newName = response['name'] + '-with-GoToState'
    #Get the StartAt state for the original state machine, because you point the 'GoToState' to this state
    originalStartAt = stateMachine['StartAt']

    '''
    Create the GoToState with the variable $.resuming.
    If new state machine is executed with $.resuming = True, then the state machine skips to the failed state.
    Otherwise, it executes the state machine from the original start state.
    '''

    goToState = {'Type':'Choice', 'Choices':[{'Variable':'$.resuming', 'BooleanEquals':False, 'Next':originalStartAt}], 'Default':failedStateName}
    #Add GoToState to the set of states in the new state machine
    stateMachine['States']['GoToState'] = goToState
    #Add StartAt
    stateMachine['StartAt'] = 'GoToState'
    #Create new state machine
    try:
        response = client.create_state_machine(
            name=newName,
            definition=json.dumps(stateMachine),
            roleArn=roleArn
        )
    except:
        raise('Failed to create new state machine with GoToState')
    return response

Testing the script

Now that you understand how the script works, you can test it out.

The following screenshot shows an example state machine that has failed, called "TestMachine". This state machine successfully completed "FirstState" and "ChoiceState", but when it branched to "FirstMatchState", it failed.

Use the script to create a new state machine that allows you to rerun this state machine, but skip the "FirstState" and the "ChoiceState" steps that already succeeded. You can do this by calling the script as follows:

python gotostate.py --failedExecutionArn 'arn:aws:states:us-west-2:<AWS_ACCOUNT_ID>:execution:TestMachine-with-GoToState:b2578403-f41d-a2c7-e70c-7500045288595

This creates a new state machine called "TestMachine-with-GoToState", and returns its ARN, along with the input that had been sent to "FirstMatchState". You can then inspect the input to determine what caused the error. In this case, you notice that the input to "FirstMachState" was the following:

{
"foo": 1,
"Message": true
}

However, this state machine expects the "Message" field of the JSON to be a string rather than a Boolean. Execute the new "TestMachine-with-GoToState" state machine, change the input to be a string, and add the "resuming" variable that "GoToState" requires:

{
"foo": 1,
"Message": "Hello!",
"resuming":true
}

When you execute the new state machine, it skips "FirstState" and "ChoiceState", and goes directly to "FirstMatchState", which was the state that failed:

Look at what happens when you have a state machine with multiple parallel steps. This example is included in the GitHub repository associated with this post. The repo contains a CloudFormation template that sets up this state machine and provides instructions to replicate this solution.

The following state machine, "ParallelStateMachine", takes an input through two subsequent parallel states before doing some final processing and exiting, along with the JSON with the ASL definition of the state machine.

{
  "Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
  "StartAt": "Parallel",
  "States": {
    "Parallel": {
      "Type": "Parallel",
      "ResultPath":"$.output",
      "Next": "Parallel 2",
      "Branches": [
        {
          "StartAt": "Parallel Step 1, Process 1",
          "States": {
            "Parallel Step 1, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 1, Process 2",
          "States": {
            "Parallel Step 1, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        }
      ]
    },
    "Parallel 2": {
      "Type": "Parallel",
      "Next": "Final Processing",
      "Branches": [
        {
          "StartAt": "Parallel Step 2, Process 1",
          "States": {
            "Parallel Step 2, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 2, Process 2",
          "States": {
            "Parallel Step 2, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        }
      ]
    },
    "Final Processing": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
      "End": true
    }
  }
}

First, use an input that initially fails:

{
  "Message": "Hello!"
}

This fails because the state machine expects you to have a variable in the input JSON called "foo" in the second parallel state to run "Parallel Step 2, Process 1" and "Parallel Step 2, Process 2". Instead, the original input gets processed by the first parallel state and produces the following output to pass to the second parallel state:

{
"output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
}

Run the script on the failed state machine to create a new state machine that allows it to resume directly at the second parallel state instead of having to redo the first parallel state. This creates a new state machine called "ParallelStateMachine-with-GoToState". The following JSON was created by the script to define the new state machine in ASL. It contains the "GoToState" value that was attached by the script.

{
   "Comment":"An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
   "States":{
      "Final Processing":{
         "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
         "End":true,
         "Type":"Task"
      },
      "GoToState":{
         "Default":"Parallel 2",
         "Type":"Choice",
         "Choices":[
            {
               "Variable":"$.resuming",
               "BooleanEquals":false,
               "Next":"Parallel"
            }
         ]
      },
      "Parallel":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 1, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 1"
            },
            {
               "States":{
                  "Parallel Step 1, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 2"
            }
         ],
         "ResultPath":"$.output",
         "Type":"Parallel",
         "Next":"Parallel 2"
      },
      "Parallel 2":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 2, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 1"
            },
            {
               "States":{
                  "Parallel Step 2, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 2"
            }
         ],
         "Type":"Parallel",
         "Next":"Final Processing"
      }
   },
   "StartAt":"GoToState"
}

You can then execute this state machine with the correct input by adding the "foo" and "resuming" variables:

{
  "foo": 1,
  "output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
  "resuming": true
}

This yields the following result. Notice that this time, the state machine executed successfully to completion, and skipped the steps that had previously failed.


Conclusion

When you’re building out complex workflows, it’s important to be prepared for failure. You can do this by taking advantage of features such as automatic error retries in Step Functions and custom error handling of Lambda exceptions.

Nevertheless, state machines still have the possibility of failing. With the methodology and script presented in this post, you can resume a failed state machine from its point of failure. This allows you to skip the execution of steps in the workflow that had already succeeded, and recover the process from the point of failure.

To see more examples, please visit the Step Functions Getting Started page.

If you have questions or suggestions, please comment below.

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

Staying Busy Between Code Pushes

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/11/16/staying-busy-between-code-pushes/

Staying Busy Between Code Pushes.

Maintaining a regular cadence of pushing out releases, adding new features, implementing bug fixes and staying on top of support requests is important for any software to thrive; but especially important for open source software due to its rapid pace. It’s easy to lose yourself in code and forget that events are happening all the time – in every corner of the world, where we can learn, share knowledge, and meet like-minded individuals to build better software, together. There are so many amazing events we’d like to participate in, but there simply isn’t enough time (or budget) to fit them all in. Here’s what we’ve been up to recently; between code pushes.

Recent Events

Øredev Conference | Malmö, Sweden: Øredev is one of the biggest developer conferences in Scandinavia, and Grafana Labs jumped at the chance to be a part of it. In early November, Grafana Labs Principal Developer, Carl Bergquist, gave a great talk on “Monitoring for Everyone”, which discussed the concepts of monitoring and why everyone should care, different ways to monitor your systems, extending your monitoring to containers and microservices, and finally what to monitor and alert on. Watch the video of his talk below.

InfluxDays | San Francisco, CA: Dan Cech, our Director of Platform Services, spoke at InfluxDays in San Francisco on Nov 14, and Grafana Labs sponsored the event. InfluxDB is a popular data source for Grafana, so we wanted to connect to the InfluxDB community and show them how to get the most out of their data. Dan discussed building dashboards, choosing the best panels for your data, setting up alerting in Grafana and a few sneak peeks of the upcoming Grafana 5.0. The video of his talk is forthcoming, but Dan has made his presentation available.

PromCon | Munich, Germany: PromCon is the Prometheus-focused event of the year. In August, Carl Bergquist, had the opportunity to speak at PromCon and take a deep dive into Grafana and Prometheus. Many attendees at PromCon were already familiar with Grafana, since it’s the default dashboard tool for Prometheus, but Carl had a trove of tricks and optimizations to share. He also went over some major changes and what we’re currently working on.

CNCF Meetup | New York, NY: Grafana Co-founder and CEO, Raj Dutt, particpated in a panel discussion with the folks of Packet and the Cloud Native Computing Foundation. The discussion focused on the success stories, failures, rationales and in-the-trenches challenges when running cloud native in private or non “public cloud” datacenters (bare metal, colocation, private clouds, special hardware or networking setups, compliance and security-focused deployments).

Percona Live | Dublin: Daniel Lee traveled to Dublin, Ireland this fall to present at the database conference Percona Live. There he showed the new native MySQL support, along with a number of upcoming features in Grafana 5.0. His presentation is available to download.

Big Monitoring Meetup | St. Petersburg, Russian Federation: Alexander Zobnin, our developer located in Russia, is the primary maintainer of our popular Zabbix plugin. He attended the Big Monitoring Meetup to discuss monitoring, Grafana dashboards and democratizing metrics.

Why observability matters – now and in the future | Webinar: Our own Carl Bergquist and Neil Gehani, Director of Product at Weaveworks, to discover best practices on how to get started with monitoring both your application and infrastructure. Start capturing metrics that matter, aggregate and visualize them in a useful way that allows for identifying bottlenecks and proactively preventing incidents. View Carl’s presentation.

Upcoming Events

We’re going to maintain this momentum with a number of upcoming events, and hope you can join us.

KubeCon | Austin, TX – Dec. 6-8, 2017: We’re sponsoring KubeCon 2017! This is the must-attend conference for cloud native computing professionals. KubeCon + CloudNativeCon brings together leading contributors in:

  • Cloud native applications and computing
  • Containers
  • Microservices
  • Central orchestration processing
  • And more.

Buy Tickets

How to Use Open Source Projects for Performance Monitoring | Webinar
Nov. 29, 1pm EST:
Check out how you can use popular open source projects, for performance monitoring of your Infrastructure, Application, and Cloud faster, easier, and to scale. In this webinar, Daniel Lee from Grafana Labs, and Chris Churilo from InfluxData, will provide you with step by step instruction from download & configure, to collecting metrics and building dashboards and alerts.

RSVP

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. Carl Bergquist is managing the Cloud and Monitoring Devroom, and the CFP is now open. There is no need to register; all are welcome. If you’re interested in speaking at FOSDEM, submit your talk now!

GrafanaCon EU

Last, but certainly not least, the next GrafanaCon is right around the corner. GrafanaCon EU (to be held in Amsterdam, Netherlands, March 1-2. 2018),is a two-day event with talks centered around Grafana and the surrounding ecosystem. In addition to the latest features and functionality of Grafana, you can expect to see and hear from members of the monitoring community like Graphite, Prometheus, InfluxData, Elasticsearch Kubernetes, and more. Head to grafanacon.org to see the latest speakers confirmed. We have speakers from Automattic, Bloomberg, CERN, Fastly, Tinder and more!

Conclusion

The Grafana Labs team is spread across the globe. Having a “post-geographic” company structure give us the opportunity to take part in events wherever they may be held in the world. As our team continues to grow, we hope to take part in even more events, and hope you can find the time to join us.

Swedish Data Authority Investigates Piracy Settlement Letters

Post Syndicated from Andy original https://torrentfreak.com/swedish-data-authority-investigates-piracy-settlement-letters-171115/

Companies that aim to turn piracy into profit have been in existence for more than a decade but still the controversy around their practices continues.

Most, known colloquially as ‘copyright trolls’, monitor peer-to-peer networks such as BitTorrent, collecting IP addresses and other data in order to home in on a particular Internet account. From there, ISPs are sued to hand over that particular subscriber’s personal details. Once they’re obtained, the pressure begins.

At this point, trolls are in direct contact with the public, usually by letter. Their tone is almost always semi-aggressive, warning account holders that their actions are undermining entire industries. However, as if by magic, all the harm can be undone if they pay up few hundred dollars, euros, or pounds – quickly.

That’s the case in Sweden, where law firm Njord Law is representing the well-known international copyright trolls behind the movies CELL, IT, London Has Fallen, Mechanic: Resurrection, Criminal, and September of Shiraz.

“Have you, or other people with access to the aforementioned IP address, such as children living at home, viewed or tried to watch [a pirate movie] at the specified time?” Njord Law now writes in its letters to alleged pirates.

“If so, the case can be terminated by paying 4,500 SEK [$550].”

It’s clear that the companies involved are diving directly for cash. Indeed, letter recipients are told they have just two weeks to pay up or face further issues. The big question now is whether these demands are permissible under law, not necessarily from a copyright angle but due to the way they are presented to the alleged pirates.

The Swedish Data Protection Authority (Datainspektionen) is a public authority tasked with protecting the privacy of the individual in the information society. Swedish Radio reports that it has received several complaints from Swedes who have received cash demands and as a result is investigating whether the letters are legal.

As a result, the authority now has to determine whether the letters can be regarded as a debt collection measure. If so, they will have to comply with special laws and would also require special permission.

“They have not classified this as a debt collection fee, but it is not that element that is crucial. A debt collection measure is determined by whether there is any kind of pressure on the recipient to make a payment. Then there is the question of whether such pressure can be considered a debt collection measure,” says lawyer Camilla Sparr.

Of course, the notion that the letters exist for the purposes of collecting a debt is rejected by Njord Law. Lawyer Jeppe Brogaard Clausen says that his company has had no problems in this respect in other jurisdictions.

“We have encountered the same issue in Denmark and Finland and it was judged by the authorities that there is no talk about a debt collection letter,” Clausen told SR.

A lot hinges on the investigation of the Data Protection Authority. Njord Law has already obtained permission to find out the identities behind tens of thousands of IP addresses, including a single batch where 25,000 customers of ISP Telia were targeted.

At least 5,000 letters demanding payment have been sent out already and another 5,000 are lined up for the next few months. Clausen says their purpose is to change Swedes’ attitude towards illegal file sharing but there’s a broad belief that they’re part of a global network of companies whose aims are to generate profit from piracy.

But while the Data Protection Authority does its work, there is plenty of advice for letter recipients who don’t want to cave into demands for cash. Last month, Copyright Professor Sanna Wolk advised them to ignore the letters entirely.

“Do not pay. You do not even have to answer it,” Wolk told people receiving a letter.

“In the end, it’s the court that will decide whether you have to pay or not. We have seen this type of letter in the past, and only very few times those in charge of the claims have taken it to court.”

Of course, should copyright holders actually take a matter to court, then recipients must contest the claim since failure to do so could result in a default judgment. This means they lose the case without even having had the opportunity to mount a defense.

Importantly, one such defense could be that the individual didn’t carry out the offense, perhaps because their WiFi isn’t password protected or that they share their account with others.

“Someone who has an open network cannot be held responsible for copyright violations – such as downloading movies – if they provide others with access to their internet connection. This has been decided in a European Court ruling last year,” Wolk noted.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

The Decision on Transparency

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/transparency-in-business/

Backblaze transparency

This post by Backblaze’s CEO and co-founder Gleb Budman is the seventh in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants
  7. The Decision on Transparency

Use the Join button above to receive notification of new posts in this series.

“Are you crazy?” “Why would you do that?!” “You shouldn’t share that!”

These are just a few of the common questions and comments we heard after posting some of the information we have shared over the years. So was it crazy? Misguided? Should you do it?

With that background I’d like to dig into the decision to become so transparent, from releasing stats on hard drive failures, to storage pod specs, to publishing our cloud storage costs, and open sourcing the Reed-Solomon code. What was the thought process behind becoming so transparent when most companies work so hard to hide their inner workings, especially information such as the Storage Pod specs that would normally be considered a proprietary advantage? Most importantly I’d like to explore the positives and negatives of being so transparent.

Sharing Intellectual Property

The first “transparency” that garnered a flurry of “why would you share that?!” came as a result of us deciding to open source our Storage Pod design: publishing the specs, parts, prices, and how to build it yourself. The Storage Pod was a key component of our infrastructure, gave us a cost (and thus competitive) advantage, took significant effort to develop, and had a fair bit of intellectual property: the “IP.”

The negatives of sharing this are obvious: it allows our competitors to use the design to reduce our cost advantage, and it gives away the IP, which could be patentable or have value as a trade secret.

The positives were certainly less obvious, and at the time we couldn’t have guessed how massive they would be.

We wrestled with the decision: prospective users and others online didn’t believe we could offer our service for such a low price, thinking that we would burn through some cash hoard and then go out of business. We wanted to reassure them, but how?

This is how our response evolved:

We’ve built a lower cost storage platform.
But why would anyone believe us?
Because, we’ve designed our own servers and they’re less expensive.
But why would anyone believe they were so low cost and efficient?
Because here’s how much they cost versus others.
But why would anyone believe they cost that little and still enabled us to efficiently store data?
Because here are all the components they’re made of, this is how to build them, and this is how they work.
Ok, you can’t argue with that.

Great — so that would reassure people. But should we do this? Is it worth it?

This was 2009, we were a tiny company of seven people working from our co-founder’s one-bedroom apartment. We decided that the risk of not having potential customers trust us was more impactful than the risk of our competitors possibly deciding to use our server architecture. The former might kill the company in short order; the latter might make it harder for us to compete in the future. Moreover, we figured that most competitors were established on their own platforms and were unlikely to switch to ours, even if it were better.

Takeaway: Build your brand today. There are no assurances you will make it to tomorrow if you can’t make people believe in you today.

A Sharing Success Story — The Backblaze Storage Pod

So with that, we decided to publish everything about the Storage Pod. As for deciding to actually open source it? That was a ‘thank you’ to the open source community upon whose shoulders we stood as we used software such as Linux, Tomcat, etc.

With eight years of hindsight, here’s what happened:

As best as I can tell, none of our direct competitors ever used our Storage Pod design, opting instead to continue paying more for commercial solutions.

  • Hundreds of press articles have been written about Backblaze as a direct result of sharing the Storage Pod design.
  • Millions of people have read press articles or our blog posts about the Storage Pods.
  • Backblaze was established as a storage tech thought leader, and a resource for those looking for information in the space.
  • Our blog became viewed as a resource, not a corporate mouthpiece.
  • Recruiting has been made easier through the awareness of Backblaze, the appreciation for us taking on challenging tech problems in interesting ways, and for our openness.
  • Sourcing for our Storage Pods has become easier because we can point potential vendors to our blog posts and say, “here’s what we need.”

And those are just the direct benefits for us. One of the things that warms my heart is that doing this has helped others:

  • Several companies have started selling servers based on our Storage Pod designs.
  • Netflix credits Backblaze with being the inspiration behind their CDN servers.
  • Many schools, labs, and others have shared that they’ve been able to do what they didn’t think was possible because using our Storage Pod designs provided lower-cost storage.
  • And I want to believe that in general we pushed forward the development of low-cost storage servers in the industry.

So overall, the decision on being transparent and sharing our Storage Pod designs was a clear win.

Takeaway: Never underestimate the value of goodwill. It can help build new markets that fuel your future growth and create new ecosystems.

Sharing An “Almost Acquisition”

Acquisition announcements are par for the course. No company, however, talks about the acquisition that fell through. If rumors appear in the press, the company’s response is always, “no comment.” But in 2010, when Backblaze was almost, but not acquired, we wrote about it in detail. Crazy?

The negatives of sharing this are slightly less obvious, but the two issues most people worried about were, 1) the fact that the company could be acquired would spook customers, and 2) the fact that it wasn’t would signal to potential acquirers that something was wrong.

So, why share this at all? No one was asking “did you almost get acquired?”

First, we had established a culture of transparency and this was a significant event that occurred for us, thus we defaulted to assuming we would share. Second, we learned that acquisitions fall through all the time, not just during the early fishing stage, but even after term sheets are signed, diligence is done, and all the paperwork is complete. I felt we had learned some things about the process that would be valuable to others that were going through it.

As it turned out, we received emails from startup founders saying they saved the post for the future, and from lawyers, VCs, and advisors saying they shared them with their portfolio companies. Among the most touching emails I received was from a founder who said that after an acquisition fell through she felt so alone that she became incredibly depressed, and that reading our post helped her see that this happens and that things could be OK after. Being transparent about almost getting acquired was worth it just to help that one founder.

And what about the concerns? As for spooking customers, maybe some were — but our sign-ups went up, not down, afterward. Any company can be acquired, and many of the world’s largest have been. That we were being both thoughtful about where to go with it, and open about it, I believe gave customers a sense that we would do the right thing if it happened. And as for signaling to potential acquirers? The ones I’ve spoken with all knew this happens regularly enough that it’s not a factor.

Takeaway: Being open and transparent is also a form of giving back to others.

Sharing Strategic Data

For years people have been desperate to know how reliable are hard drives. They could go to Amazon for individual reviews, but someone saying “this drive died for me” doesn’t provide statistical insight. Google published a study that showed annualized drive failure rates, but didn’t break down the results by manufacturer or model. Since Backblaze has deployed about 100,000 hard drives to store customer data, we have been able to collect a wealth of data on the reliability of the drives by make, model, and size. Was Backblaze the only one with this data? Of course not — Google, Amazon, Microsoft, and any other cloud-scale storage provider tracked it. Yet none would publish. Should Backblaze?

Again, starting with the main negatives: 1) sharing which drives we liked could increase demand for them, thus reducing availability or increasing prices, and 2) publishing the data might make the drive vendors unhappy with us, thereby making it difficult for us to buy drives.

But we felt that the largest drive purchasers (Amazon, Google, etc.) already had their own stats and would buy the drives they chose, and if individuals or smaller companies used our stats, they wouldn’t sufficiently move the overall market demand. Also, we hoped that the drive companies would see that we were being fair in our analysis and, if anything, would leverage our data to make drives even better.

Again, publishing the data resulted in tremendous value for Backblaze, with millions of people having read the analysis that we put out quarterly. Also, becoming known as the place to go for drive reliability information is a natural fit with being a backup and storage provider. In addition, in a twist from many people’s expectations, some of the drive companies actually started working closer with us, seeing that we could be a good source of data for them as feedback. We’ve also seen many individuals and companies make more data-based decisions on which drives to buy, and researchers have used the data for a variety of analyses.

traffic spike from hard drive reliability post

Backblaze blog analytics showing spike in readership after a hard drive stats post

Takeaway: Being open and transparent is rarely as risky as it seems.

Sharing Revenue (And Other Metrics)

Journalists always want to publish company revenue and other metrics, and private companies always shy away from sharing. For a long time we did, too. Then, we opened up about that, as well.

The negatives of sharing these numbers are: 1) external parties may otherwise perceive you’re doing better than you are, 2) if you share numbers often, you may show that growth has slowed or worse, 3) it gives your competitors info to compare their own business too.

We decided that, while some may have perceived we were bigger, our scale was plenty significant. Since we choose what we share and when, it’s up to us whether to disclose at any point. And if our competitors compare, what will they actually change that would affect us?

I did wait to share revenue until I felt I had the right person to write about it. At one point a journalist said she wouldn’t write about us unless I disclosed revenue. I suggested we had a lot to offer for the story, but didn’t want to share revenue yet. She refused to budge and I walked away from the article. Several year later, I reached out to a journalist who had covered Backblaze before and I felt understood our business and offered to share revenue with him. He wrote a deep-dive about the company, with revenue being one of the components of the story.

Sharing these metrics showed that we were at scale and running a real business, one with positive unit economics and margins, but not one where we were gouging customers.

Takeaway: Being open with the press about items typically not shared can be uncomfortable, but the press can amplify your story.

Should You Share?

For Backblaze, I believe the results of transparency have been staggering. However, it’s not for everyone. Apple has, clearly, been wildly successful taking secrecy to the extreme. In their case, early disclosure combined with the long cycle of hardware releases could significantly impact sales of current products.

“For Backblaze, I believe the results of transparency have been staggering.” — Gleb Budman

I will argue, however, that for most startups transparency wins. Most startups need to establish credibility and trust, build awareness and a fan base, show that they understand what their customers need and be useful to them, and show the soul and passion behind the company. Some startup companies try to buy these virtues with investor money, and sometimes amplifying your brand via paid marketing helps. But, authentic transparency can build awareness and trust not only less expensively, but more deeply than money can buy.

Backblaze was open from the beginning. With no outside investors, as founders we were able to express ourselves and make our decisions. And it’s easier to be a company that shares if you do it from the start, but for any company, here are a few suggestions:

  1. Ask about sharing: If something significant happens — good or bad — ask “should we share this?” If you made a tough decision, ask “should we share the thinking behind the decision and why it was tough?”
  2. Default to yes: It’s often scary to share, but look for the reasons to say ‘yes,’ not the reasons to say ‘no.’ That doesn’t mean you won’t sometimes decide not to, but make that the high bar.
  3. Minimize reviews: Press releases tend to be sanitized and boring because they’ve been endlessly wordsmithed by committee. Establish the few things you don’t want shared, but minimize the number of people that have to see anything else before it can go out. Teach, then trust.
  4. Engage: Sharing will result in comments on your blog, social, articles, etc. Reply to people’s questions and engage. It’ll make the readers more engaged and give you a better understanding of what they’re looking for.
  5. Accept mistakes: Things will become public that aren’t perfectly sanitized. Accept that and don’t punish people for oversharing.

Building a culture of a company that is open to sharing takes time, but continuous practice will build that, and over time the company will navigate its voice and approach to sharing.

The post The Decision on Transparency appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Pirate Bay & 1337x Must Be Blocked, Austrian Supreme Court Rules

Post Syndicated from Andy original https://torrentfreak.com/the-pirate-bay-1337x-must-be-blocked-austrian-supreme-court-rules-171014/

Following a long-running case, in 2015 Austrian ISPs were ordered by the Commercial Court to block The Pirate Bay and other “structurally-infringing” sites including 1337x.to, isohunt.to, and h33t.to.

The decision was welcomed by the music industry, which looked forward to having more sites blocked in due course.

Soon after, local music rights group LSG sent its lawyers after several other large ISPs urging them to follow suit, or else. However, the ISPs dug in and a year later, in May 2016, things began to unravel. The Vienna Higher Regional Court overruled the earlier decision of the Commercial Court, meaning that local ISPs were free to unblock the previously blocked sites.

The Court concluded that ISP blocks are only warranted if copyright holders have exhausted all their options to take action against those actually carrying out the infringement. This decision was welcomed by the Internet Service Providers Austria (ISPA), which described the decision as an important milestone.

The ISPs argued that only torrent files, not the content itself, was available on the portals. They also had a problem with the restriction of access to legitimate content.

“A problem in this context is that the offending pages also have legal content and it is no longer possible to access that if barriers are put in place,” said ISPA Secretary General Maximilian Schubert.

Taking the case to its ultimate conclusion, the music companies appealed to the Supreme Court. Another year on and its decision has just been published and for the rightsholders, who represent 3,000 artists including The Beatles, Justin Bieber, Eric Clapton, Coldplay, David Guetta, Iggy Azalea, Michael Jackson, Lady Gaga, Metallica, George Michael, One Direction, Katy Perry, and Queen, to name a few, it was worth the effort.

The Court looked at whether “the provision and operation of a BitTorrent platform with the purpose of online file sharing [of non-public domain works]” represents a “communication to the public” under the EU Copyright Directive. Citing the now-familiar BREIN v Filmspeler and BREIN v Ziggo and XS4All cases that both received European Court of Justice rulings earlier this year, the Supreme Court concluded it was.

Citing another Dutch case, in which Playboy publisher Sanoma took on the blog GeenStijl.nl, the Court noted that linking to copyrighted content hosted elsewhere also amounted to a “communication to the public”, a situation mirrored on torrent sites like The Pirate Bay.

“The similarity of the technical procedure in this case when compared to BitTorrent platforms lies in the fact that in both cases the operators of the website did not provide any copyrighted works themselves, but merely provided further information on sites where the protected works were available,” the Court notes in its ruling.

In respect of the potential for blocking legitimate content as well as that infringing copyright, the Court turned the ISPs’ own arguments against them somewhat.

The ISPs had previously argued that blocking The Pirate Bay and other sites was pointless since the torrents they host would still be available elsewhere. The Court noted that point and also found that people can easily upload their torrents to sites that aren’t blocked, since there’s plenty of choice.

The ISPA criticized the Supreme Court’s ruling, noting that in future ISPs will still find themselves being held responsible for decisions concerning blocking.

“We do not support illegal content on the Internet in any way, but consider it extremely questionable that the decision on what is illegal and what is not falls to ISPs, instead of a court,” said ISPA Secretary General Maximilian.

“Although we find it positive that a court of last resort has taken the decision, the assessment of the website in the first instance continues to be left to the Internet provider. The Supreme Court’s expansion of the circle of sites that be potentially blocked further complicates this task for the operator and furthers the privatization of law enforcement.

“It is extremely unpleasant that even after more than 10 years of fierce discussion, there is still no compelling legal basis for a court decision on Internet blocking, which puts providers in the role of both judge and hangman.”

Also of interest is ISPA’s stance on how blocking of content fails to solve the underlying issue. When content is blocked, rather than removed, it simply displaces the problem, leaving others to pick up the pieces, the Internet body argues.

“Illegal content is permanently removed from the network by deletion. Everything else is a placebo with extremely dangerous side effects, which can easily be bypassed by both providers and consumers. The only thing that remains is a blocking infrastructure that can be misused for many purposes and, unfortunately, will be used in many places,” Schubert says.

“The current situation, where providers have to block the rightsholders quasi on the spot, if they do not want to engage in a time-consuming and cost-intensive litigation, is really not sustainable so we issue a call to action to the legislature.”

The domains that were listed in the case, many of which are already defunct, are: thepiratebay.se, thepiratebay.gd, thepiratebay.la, thepiratebay.mn, thepiratebay.mu, thepiratebay.sh, thepiratebay.tw, thepiratebay.fm, thepiratebay.ms, thepiratebay.vg, isohunt.to, 1337x.to and h33t.to.

Whether it will be added later is unclear, but the only domain currently used by The Pirate Bay (thepiratebay.org) is not included in the list.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Building a Multi-region Serverless Application with Amazon API Gateway and AWS Lambda

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/building-a-multi-region-serverless-application-with-amazon-api-gateway-and-aws-lambda/

This post written by: Magnus Bjorkman – Solutions Architect

Many customers are looking to run their services at global scale, deploying their backend to multiple regions. In this post, we describe how to deploy a Serverless API into multiple regions and how to leverage Amazon Route 53 to route the traffic between regions. We use latency-based routing and health checks to achieve an active-active setup that can fail over between regions in case of an issue. We leverage the new regional API endpoint feature in Amazon API Gateway to make this a seamless process for the API client making the requests. This post does not cover the replication of your data, which is another aspect to consider when deploying applications across regions.

Solution overview

Currently, the default API endpoint type in API Gateway is the edge-optimized API endpoint, which enables clients to access an API through an Amazon CloudFront distribution. This typically improves connection time for geographically diverse clients. By default, a custom domain name is globally unique and the edge-optimized API endpoint would invoke a Lambda function in a single region in the case of Lambda integration. You can’t use this type of endpoint with a Route 53 active-active setup and fail-over.

The new regional API endpoint in API Gateway moves the API endpoint into the region and the custom domain name is unique per region. This makes it possible to run a full copy of an API in each region and then use Route 53 to use an active-active setup and failover. The following diagram shows how you do this:

Active/active multi region architecture

  • Deploy your Rest API stack, consisting of API Gateway and Lambda, in two regions, such as us-east-1 and us-west-2.
  • Choose the regional API endpoint type for your API.
  • Create a custom domain name and choose the regional API endpoint type for that one as well. In both regions, you are configuring the custom domain name to be the same, for example, helloworldapi.replacewithyourcompanyname.com
  • Use the host name of the custom domain names from each region, for example, xxxxxx.execute-api.us-east-1.amazonaws.com and xxxxxx.execute-api.us-west-2.amazonaws.com, to configure record sets in Route 53 for your client-facing domain name, for example, helloworldapi.replacewithyourcompanyname.com

The above solution provides an active-active setup for your API across the two regions, but you are not doing failover yet. For that to work, set up a health check in Route 53:

Route 53 Health Check

A Route 53 health check must have an endpoint to call to check the health of a service. You could do a simple ping of your actual Rest API methods, but instead provide a specific method on your Rest API that does a deep ping. That is, it is a Lambda function that checks the status of all the dependencies.

In the case of the Hello World API, you don’t have any other dependencies. In a real-world scenario, you could check on dependencies as databases, other APIs, and external dependencies. Route 53 health checks themselves cannot use your custom domain name endpoint’s DNS address, so you are going to directly call the API endpoints via their region unique endpoint’s DNS address.

Walkthrough

The following sections describe how to set up this solution. You can find the complete solution at the blog-multi-region-serverless-service GitHub repo. Clone or download the repository locally to be able to do the setup as described.

Prerequisites

You need the following resources to set up the solution described in this post:

  • AWS CLI
  • An S3 bucket in each region in which to deploy the solution, which can be used by the AWS Serverless Application Model (SAM). You can use the following CloudFormation templates to create buckets in us-east-1 and us-west-2:
    • us-east-1:
    • us-west-2:
  • A hosted zone registered in Amazon Route 53. This is used for defining the domain name of your API endpoint, for example, helloworldapi.replacewithyourcompanyname.com. You can use a third-party domain name registrar and then configure the DNS in Amazon Route 53, or you can purchase a domain directly from Amazon Route 53.

Deploy API with health checks in two regions

Start by creating a small “Hello World” Lambda function that sends back a message in the region in which it has been deployed.


"""Return message."""
import logging

logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """Lambda handler for getting the hello world message."""

    region = context.invoked_function_arn.split(':')[3]

    logger.info("message: " + "Hello from " + region)
    
    return {
		"message": "Hello from " + region
    }

Also create a Lambda function for doing a health check that returns a value based on another environment variable (either “ok” or “fail”) to allow for ease of testing:


"""Return health."""
import logging
import os

logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """Lambda handler for getting the health."""

    logger.info("status: " + os.environ['STATUS'])
    
    return {
		"status": os.environ['STATUS']
    }

Deploy both of these using an AWS Serverless Application Model (SAM) template. SAM is a CloudFormation extension that is optimized for serverless, and provides a standard way to create a complete serverless application. You can find the full helloworld-sam.yaml template in the blog-multi-region-serverless-service GitHub repo.

A few things to highlight:

  • You are using inline Swagger to define your API so you can substitute the current region in the x-amazon-apigateway-integration section.
  • Most of the Swagger template covers CORS to allow you to test this from a browser.
  • You are also using substitution to populate the environment variable used by the “Hello World” method with the region into which it is being deployed.

The Swagger allows you to use the same SAM template in both regions.

You can only use SAM from the AWS CLI, so do the following from the command prompt. First, deploy the SAM template in us-east-1 with the following commands, replacing “<your bucket in us-east-1>” with a bucket in your account:


> cd helloworld-api
> aws cloudformation package --template-file helloworld-sam.yaml --output-template-file /tmp/cf-helloworld-sam.yaml --s3-bucket <your bucket in us-east-1> --region us-east-1
> aws cloudformation deploy --template-file /tmp/cf-helloworld-sam.yaml --stack-name multiregionhelloworld --capabilities CAPABILITY_IAM --region us-east-1

Second, do the same in us-west-2:


> aws cloudformation package --template-file helloworld-sam.yaml --output-template-file /tmp/cf-helloworld-sam.yaml --s3-bucket <your bucket in us-west-2> --region us-west-2
> aws cloudformation deploy --template-file /tmp/cf-helloworld-sam.yaml --stack-name multiregionhelloworld --capabilities CAPABILITY_IAM --region us-west-2

The API was created with the default endpoint type of Edge Optimized. Switch it to Regional. In the Amazon API Gateway console, select the API that you just created and choose the wheel-icon to edit it.

API Gateway edit API settings

In the edit screen, select the Regional endpoint type and save the API. Do the same in both regions.

Grab the URL for the API in the console by navigating to the method in the prod stage.

API Gateway endpoint link

You can now test this with curl:


> curl https://2wkt1cxxxx.execute-api.us-west-2.amazonaws.com/prod/helloworld
{"message": "Hello from us-west-2"}

Write down the domain name for the URL in each region (for example, 2wkt1cxxxx.execute-api.us-west-2.amazonaws.com), as you need that later when you deploy the Route 53 setup.

Create the custom domain name

Next, create an Amazon API Gateway custom domain name endpoint. As part of using this feature, you must have a hosted zone and domain available to use in Route 53 as well as an SSL certificate that you use with your specific domain name.

You can create the SSL certificate by using AWS Certificate Manager. In the ACM console, choose Get started (if you have no existing certificates) or Request a certificate. Fill out the form with the domain name to use for the custom domain name endpoint, which is the same across the two regions:

Amazon Certificate Manager request new certificate

Go through the remaining steps and validate the certificate for each region before moving on.

You are now ready to create the endpoints. In the Amazon API Gateway console, choose Custom Domain Names, Create Custom Domain Name.

API Gateway create custom domain name

A few things to highlight:

  • The domain name is the same as what you requested earlier through ACM.
  • The endpoint configuration should be regional.
  • Select the ACM Certificate that you created earlier.
  • You need to create a base path mapping that connects back to your earlier API Gateway endpoint. Set the base path to v1 so you can version your API, and then select the API and the prod stage.

Choose Save. You should see your newly created custom domain name:

API Gateway custom domain setup

Note the value for Target Domain Name as you need that for the next step. Do this for both regions.

Deploy Route 53 setup

Use the global Route 53 service to provide DNS lookup for the Rest API, distributing the traffic in an active-active setup based on latency. You can find the full CloudFormation template in the blog-multi-region-serverless-service GitHub repo.

The template sets up health checks, for example, for us-east-1:


HealthcheckRegion1:
  Type: "AWS::Route53::HealthCheck"
  Properties:
    HealthCheckConfig:
      Port: "443"
      Type: "HTTPS_STR_MATCH"
      SearchString: "ok"
      ResourcePath: "/prod/healthcheck"
      FullyQualifiedDomainName: !Ref Region1HealthEndpoint
      RequestInterval: "30"
      FailureThreshold: "2"

Use the health check when you set up the record set and the latency routing, for example, for us-east-1:


Region1EndpointRecord:
  Type: AWS::Route53::RecordSet
  Properties:
    Region: us-east-1
    HealthCheckId: !Ref HealthcheckRegion1
    SetIdentifier: "endpoint-region1"
    HostedZoneId: !Ref HostedZoneId
    Name: !Ref MultiregionEndpoint
    Type: CNAME
    TTL: 60
    ResourceRecords:
      - !Ref Region1Endpoint

You can create the stack by using the following link, copying in the domain names from the previous section, your existing hosted zone name, and the main domain name that is created (for example, hellowordapi.replacewithyourcompanyname.com):

The following screenshot shows what the parameters might look like:
Serverless multi region Route 53 health check

Specifically, the domain names that you collected earlier would map according to following:

  • The domain names from the API Gateway “prod”-stage go into Region1HealthEndpoint and Region2HealthEndpoint.
  • The domain names from the custom domain name’s target domain name goes into Region1Endpoint and Region2Endpoint.

Using the Rest API from server-side applications

You are now ready to use your setup. First, demonstrate the use of the API from server-side clients. You can demonstrate this by using curl from the command line:


> curl https://hellowordapi.replacewithyourcompanyname.com/v1/helloworld/
{"message": "Hello from us-east-1"}

Testing failover of Rest API in browser

Here’s how you can use this from the browser and test the failover. Find all of the files for this test in the browser-client folder of the blog-multi-region-serverless-service GitHub repo.

Use this html file:


<!DOCTYPE HTML>
<html>
<head>
    <meta charset="utf-8"/>
    <meta http-equiv="X-UA-Compatible" content="IE=edge"/>
    <meta name="viewport" content="width=device-width, initial-scale=1"/>
    <title>Multi-Region Client</title>
</head>
<body>
<div>
   <h1>Test Client</h1>

    <p id="client_result">

    </p>

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <script src="settings.js"></script>
    <script src="client.js"></script>
</body>
</html>

The html file uses this JavaScript file to repeatedly call the API and print the history of messages:


var messageHistory = "";

(function call_service() {

   $.ajax({
      url: helloworldMultiregionendpoint+'v1/helloworld/',
      dataType: "json",
      cache: false,
      success: function(data) {
         messageHistory+="<p>"+data['message']+"</p>";
         $('#client_result').html(messageHistory);
      },
      complete: function() {
         // Schedule the next request when the current one's complete
         setTimeout(call_service, 10000);
      },
      error: function(xhr, status, error) {
         $('#client_result').html('ERROR: '+status);
      }
   });

})();

Also, make sure to update the settings in settings.js to match with the API Gateway endpoints for the DNS-proxy and the multi-regional endpoint for the Hello World API: var helloworldMultiregionendpoint = "https://hellowordapi.replacewithyourcompanyname.com/";

You can now open the HTML file in the browser (you can do this directly from the file system) and you should see something like the following screenshot:

Serverless multi region browser test

You can test failover by changing the environment variable in your health check Lambda function. In the Lambda console, select your health check function and scroll down to the Environment variables section. For the STATUS key, modify the value to fail.

Lambda update environment variable

You should see the region switch in the test client:

Serverless multi region broker test switchover

During an emulated failure like this, the browser might take some additional time to switch over due to connection keep-alive functionality. If you are using a browser like Chrome, you can kill all the connections to see a more immediate fail-over: chrome://net-internals/#sockets

Summary

You have implemented a simple way to do multi-regional serverless applications that fail over seamlessly between regions, either being accessed from the browser or from other applications/services. You achieved this by using the capabilities of Amazon Route 53 to do latency based routing and health checks for fail-over. You unlocked the use of these features in a serverless application by leveraging the new regional endpoint feature of Amazon API Gateway.

The setup was fully scripted using CloudFormation, the AWS Serverless Application Model (SAM), and the AWS CLI, and it can be integrated into deployment tools to push the code across the regions to make sure it is available in all the needed regions. For more information about cross-region deployments, see Building a Cross-Region/Cross-Account Code Deployment Solution on AWS on the AWS DevOps blog.