Tag Archives: authorization

GoDaddy to Suspend ‘Pirate’ Domain Following Music Industry Complaints

Post Syndicated from Andy original https://torrentfreak.com/godaddy-to-suspend-pirate-domain-following-music-industry-complaints-180601/

Most piracy-focused sites online conduct their business with minimal interference from outside parties. In many cases, a heap of DMCA notices filed with Google represents the most visible irritant.

Others, particularly those with large audiences, can find themselves on the end of a web blockade. Mostly court-ordered, blocking measures restrict the ability of Internet users to visit a site due to ISPs restricting traffic.

In some regions, where copyright holders have the means to do so, they choose to tackle a site’s infrastructure instead, which could mean complaints to webhosts or other service providers. At times, this has included domain registries, who are asked to disable domains on copyright grounds.

This is exactly what has happened to Fox-MusicaGratis.com, a Spanish-language music piracy site that incurred the wrath of IFPI member UNIMPRO – the Peruvian Union of Phonographic Producers.

Pirate music, suspended domain

In a process that’s becoming more common in the region, UNIMPRO initially filed a complaint with the Copyright Commission (Comisión de Derecho de Autor (CDA)) which conducted an investigation into the platform’s activities.

“The CDA considered, among other things, the irreparable damage that would have been caused to the legitimate rights owners, taking into account the large number of users who could potentially have visited said website, which was making available endless musical recordings for commercial purposes, without authorization of the holders of rights,” a statement from CDA reads.

The administrative process was carried out locally with the involvement of the National Institute for the Defense of Competition and the Protection of Intellectual Property (Indecopi), an autonomous public body tasked with handling anti-competitive behavior, unfair competition, and intellectual property matters.

Indecopi HQ

The matter was decided in favor of the rightsholders and a subsequent ruling included an instruction for US-based domain name registry GoDaddy to suspend Fox-MusicaGratis.com. According to the copyright protection entity, GoDaddy agreed to comply, to prevent further infringement.

This latest action involving a music piracy site registered with GoDaddy follows on the heels of a similar enforcement process back in March.

Mp3Juices-Download-Free.com, Melodiavip.net, Foxmusica.site and Fulltono.me were all music sites offering MP3 content without copyright holders’ permission. They too were the subject of an UNIMPRO complaint which resulted in orders for GoDaddy to suspend their domains.

In the cases of all five websites, GoDaddy was given the chance to appeal but there is no indication that the company has done so. GoDaddy did not respond to a request for comment.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

FCC Asks Amazon & eBay to Help Eliminate Pirate Media Box Sales

Post Syndicated from Andy original https://torrentfreak.com/fcc-asks-amazon-ebay-to-help-eliminate-pirate-media-box-sales-180530/

Over the past several years, anyone looking for a piracy-configured set-top box could do worse than search for one on Amazon or eBay.

Historically, people deploying search terms including “Kodi” or “fully-loaded” were greeted by page after page of Android-type boxes, each ready for illicit plug-and-play entertainment consumption following delivery.

Although the problem persists on both platforms, people are now much less likely to find infringing devices than they were 12 to 24 months ago. Under pressure from entertainment industry groups, both Amazon and eBay have tightened the screws on sellers of such devices. Now, however, both companies have received requests to stem sales from a completetey different direction.

In a letter to eBay CEO Devin Wenig and Amazon CEO Jeff Bezos first spotted by Ars, FCC Commissioner Michael O’Rielly calls on the platforms to take action against piracy-configured boxes that fail to comply with FCC equipment authorization requirements or falsely display FCC logos, contrary to United States law.

“Disturbingly, some rogue set-top box manufacturers and distributors are exploiting the FCC’s trusted logo by fraudulently placing it on devices that have not been approved via the Commission’s equipment authorization process,” O’Rielly’s letter reads.

“Specifically, nine set-top box distributors were referred to the FCC in October for enabling the unlawful streaming of copyrighted material, seven of which displayed the FCC logo, although there was no record of such compliance.”

While O’Rielly admits that the copyright infringement aspects fall outside the jurisdiction of the FCC, he says it’s troubling that many of these devices are used to stream infringing content, “exacerbating the theft of billions of dollars in American innovation and creativity.”

As noted above, both Amazon and eBay have taken steps to reduce sales of pirate boxes on their respective platforms on copyright infringement grounds, something which is duly noted by O’Rielly. However, he points out that devices continue to be sold to members of the public who may believe that the devices are legal since they’re available for sale from legitimate companies.

“For these reasons, I am seeking your further cooperation in assisting the FCC in taking steps to eliminate the non-FCC compliant devices or devices that fraudulently bear the FCC logo,” the Commissioner writes (pdf).

“Moreover, if your company is made aware by the Commission, with supporting evidence, that a particular device is using a fraudulent FCC label or has not been appropriately certified and labeled with a valid FCC logo, I respectfully request that you commit to swiftly removing these products from your sites.”

In the event that Amazon and eBay take action under this request, O’Rielly asks both platforms to hand over information they hold on offending manufacturers, distributors, and suppliers.

Amazon was quick to respond to the FCC. In a letter published by Ars, Amazon’s Public Policy Vice President Brian Huseman assured O’Rielly that the company is not only dedicated to tackling rogue devices on copyright-infringement grounds but also when there is fraudulent use of the FCC’s logos.

Noting that Amazon is a key member of the Alliance for Creativity and Entertainment (ACE) – a group that has been taking legal action against sellers of infringing streaming devices (ISDs) and those who make infringing addons for Kodi-type systems – Huseman says that dealing with the problem is a top priority.

“Our goal is to prevent the sale of ISDs anywhere, as we seek to protect our customers from the risks posed by these devices, in addition to our interest in protecting Amazon Studios content,” Huseman writes.

“In 2017, Amazon became the first online marketplace to prohibit the sale of streaming media players that promote or facilitate piracy. To prevent the sale of these devices, we proactively scan product listings for signs of potentially infringing products, and we also invest heavily in sophisticated, automated real-time tools to review a variety of data sources and signals to identify inauthentic goods.

“These automated tools are supplemented by human reviewers that conduct manual investigations. When we suspect infringement, we take immediate action to remove suspected listings, and we also take enforcement action against sellers’ entire accounts when appropriate.”

Huseman also reveals that since implementing a proactive policy against such devices, “tens of thousands” of listings have been blocked from Amazon. In addition, the platform has been making criminal referrals to law enforcement as well as taking civil action (1,2,3) as part of ACE.

“As noted in your letter, we would also appreciate the opportunity to collaborate further with the FCC to remove non-compliant devices that improperly use the FCC logo or falsely claim FCC certification. If any FCC non-compliant devices are identified, we seek to work with you to ensure they are not offered for sale,” Huseman concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

How to centralize DNS management in a multi-account environment

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/security/how-to-centralize-dns-management-in-a-multi-account-environment/

In a multi-account environment where you require connectivity between accounts, and perhaps connectivity between cloud and on-premises workloads, the demand for a robust Domain Name Service (DNS) that’s capable of name resolution across all connected environments will be high.

The most common solution is to implement local DNS in each account and use conditional forwarders for DNS resolutions outside of this account. While this solution might be efficient for a single-account environment, it becomes complex in a multi-account environment.

In this post, I will provide a solution to implement central DNS for multiple accounts. This solution reduces the number of DNS servers and forwarders needed to implement cross-account domain resolution. I will show you how to configure this solution in four steps:

  1. Set up your Central DNS account.
  2. Set up each participating account.
  3. Create Route53 associations.
  4. Configure on-premises DNS (if applicable).

Solution overview

In this solution, you use AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) as a DNS service in a dedicated account in a Virtual Private Cloud (DNS-VPC).

The DNS service included in AWS Managed Microsoft AD uses conditional forwarders to forward domain resolution to either Amazon Route 53 (for domains in the awscloud.com zone) or to on-premises DNS servers (for domains in the example.com zone). You’ll use AWS Managed Microsoft AD as the primary DNS server for other application accounts in the multi-account environment (participating accounts).

A participating account is any application account that hosts a VPC and uses the centralized AWS Managed Microsoft AD as the primary DNS server for that VPC. Each participating account has a private, hosted zone with a unique zone name to represent this account (for example, business_unit.awscloud.com).

You associate the DNS-VPC with the unique hosted zone in each of the participating accounts, this allows AWS Managed Microsoft AD to use Route 53 to resolve all registered domains in private, hosted zones in participating accounts.

The following diagram shows how the various services work together:
 

Diagram showing the relationship between all the various services

Figure 1: Diagram showing the relationship between all the various services

 

In this diagram, all VPCs in participating accounts use Dynamic Host Configuration Protocol (DHCP) option sets. The option sets configure EC2 instances to use the centralized AWS Managed Microsoft AD in DNS-VPC as their default DNS Server. You also configure AWS Managed Microsoft AD to use conditional forwarders to send domain queries to Route53 or on-premises DNS servers based on query zone. For domain resolution across accounts to work, we associate DNS-VPC with each hosted zone in participating accounts.

If, for example, server.pa1.awscloud.com needs to resolve addresses in the pa3.awscloud.com domain, the sequence shown in the following diagram happens:
 

How domain resolution across accounts works

Figure 2: How domain resolution across accounts works

 

  • 1.1: server.pa1.awscloud.com sends domain name lookup to default DNS server for the name server.pa3.awscloud.com. The request is forwarded to the DNS server defined in the DHCP option set (AWS Managed Microsoft AD in DNS-VPC).
  • 1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
  • 1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.

Similarly, if server.example.com needs to resolve server.pa3.awscloud.com, the following happens:

  • 2.1: server.example.com sends domain name lookup to on-premise DNS server for the name server.pa3.awscloud.com.
  • 2.2: on-premise DNS server using conditional forwarder forwards domain lookup to AWS Managed Microsoft AD in DNS-VPC.
  • 1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
  • 1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.

Step 1: Set up a centralized DNS account

In previous AWS Security Blog posts, Drew Dennis covered a couple of options for establishing DNS resolution between on-premises networks and Amazon VPC. In this post, he showed how you can use AWS Managed Microsoft AD (provisioned with AWS Directory Service) to provide DNS resolution with forwarding capabilities.

To set up a centralized DNS account, you can follow the same steps in Drew’s post to create AWS Managed Microsoft AD and configure the forwarders to send DNS queries for awscloud.com to default, VPC-provided DNS and to forward example.com queries to the on-premise DNS server.

Here are a few considerations while setting up central DNS:

  • The VPC that hosts AWS Managed Microsoft AD (DNS-VPC) will be associated with all private hosted zones in participating accounts.
  • To be able to resolve domain names across AWS and on-premises, connectivity through Direct Connect or VPN must be in place.

Step 2: Set up participating accounts

The steps I suggest in this section should be applied individually in each application account that’s participating in central DNS resolution.

  1. Create the VPC(s) that will host your resources in participating account.
  2. Create VPC Peering between local VPC(s) in each participating account and DNS-VPC.
  3. Create a private hosted zone in Route 53. Hosted zone domain names must be unique across all accounts. In the diagram above, we used pa1.awscloud.com / pa2.awscloud.com / pa3.awscloud.com. You could also use a combination of environment and business unit: for example, you could use pa1.dev.awscloud.com to achieve uniqueness.
  4. Associate VPC(s) in each participating account with the local private hosted zone.

The next step is to change the default DNS servers on each VPC using DHCP option set:

  1. Follow these steps to create a new DHCP option set. Make sure in the DNS Servers to put the private IP addresses of the two AWS Managed Microsoft AD servers that were created in DNS-VPC:
     
    The "Create DHCP options set" dialog box

    Figure 3: The “Create DHCP options set” dialog box

     

  2. Follow these steps to assign the DHCP option set to your VPC(s) in participating account.

Step 3: Associate DNS-VPC with private hosted zones in each participating account

The next steps will associate DNS-VPC with the private, hosted zone in each participating account. This allows instances in DNS-VPC to resolve domain records created in these hosted zones. If you need them, here are more details on associating a private, hosted zone with VPC on a different account.

  1. In each participating account, create the authorization using the private hosted zone ID from the previous step, the region, and the VPC ID that you want to associate (DNS-VPC).
     
    aws route53 create-vpc-association-authorization –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id>
     
  2. In the centralized DNS account, associate DNS-VPC with the hosted zone in each participating account.
     
    aws route53 associate-vpc-with-hosted-zone –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id>
     

After completing these steps, AWS Managed Microsoft AD in the centralized DNS account should be able to resolve domain records in the private, hosted zone in each participating account.

Step 4: Setting up on-premises DNS servers

This step is necessary if you would like to resolve AWS private domains from on-premises servers and this task comes down to configuring forwarders on-premise to forward DNS queries to AWS Managed Microsoft AD in DNS-VPC for all domains in the awscloud.com zone.

The steps to implement conditional forwarders vary by DNS product. Follow your product’s documentation to complete this configuration.

Summary

I introduced a simplified solution to implement central DNS resolution in a multi-account environment that could be also extended to support DNS resolution between on-premise resources and AWS. This can help reduce operations effort and the number of resources needed to implement cross-account domain resolution.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Announcing the new AWS Certified Security – Specialty exam

Post Syndicated from Janna Pellegrino original https://aws.amazon.com/blogs/architecture/announcing-the-new-aws-certified-security-specialty-exam/

Good news for cloud security experts: following our most popular beta exam ever, the AWS Certified Security – Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.

About the exam
The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

The exam validates:

  • An understanding of specialized data classifications and AWS data protection mechanisms.
  • An understanding of data encryption methods and AWS mechanisms to implement them.
  • An understanding of secure Internet protocols and AWS mechanisms to implement them.
  • A working knowledge of AWS security services and features of services to provide a secure production environment.
  • Competency gained from two or more years of production deployment experience using AWS security services and features.
  • Ability to make trade-off decisions with regard to cost, security, and deployment complexity given a set of application requirements.
  • An understanding of security operations and risk.

Learn more and register >>

How to prepare
We have training and other resources to help you prepare for the exam:

AWS Training (aws.amazon.com/training)

Additional Resources

Learn more and register >>

Please contact us if you have questions about exam registration.

Good luck!

Announcing the new AWS Certified Security – Specialty exam

Post Syndicated from Ozlem Yilmaz original https://aws.amazon.com/blogs/security/announcing-the-new-aws-certified-security-specialty-exam/

Good news for cloud security experts: the AWS Certified Security — Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.

About the exam

The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

The exam validates your understanding of:

  • Specialized data classifications and AWS data protection mechanisms
  • Data encryption methods and AWS mechanisms to implement them
  • Secure Internet protocols and AWS mechanisms to implement them
  • AWS security services and features of services to provide a secure production environment
  • Making tradeoff decisions with regard to cost, security, and deployment complexity given a set of application requirements
  • Security operations and risk

How to prepare

We have training and other resources to help you prepare for the exam.

AWS Training that includes:

Additional Resources

Learn more and register here, and please contact us if you have questions about exam registration.

Want more AWS Security news? Follow us on Twitter.

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

How to migrate a Hue database from an existing Amazon EMR cluster

Post Syndicated from Anvesh Ragi original https://aws.amazon.com/blogs/big-data/how-to-migrate-a-hue-database-from-an-existing-amazon-emr-cluster/

Hadoop User Experience (Hue) is an open-source, web-based, graphical user interface for use with Amazon EMR and Apache Hadoop. The Hue database stores things like users, groups, authorization permissions, Apache Hive queries, Apache Oozie workflows, and so on.

There might come a time when you want to migrate your Hue database to a new EMR cluster. For example, you might want to upgrade from an older version of the Amazon EMR AMI (Amazon Machine Image), but your Hue application and its database have had a lot of customization.You can avoid re-creating these user entities and retain query/workflow histories in Hue by migrating the existing Hue database, or remote database in Amazon RDS, to a new cluster.

By default, Hue user information and query histories are stored in a local MySQL database on the EMR cluster’s master node. However, you can create one or more Hue-enabled clusters using a configuration stored in Amazon S3 and a remote MySQL database in Amazon RDS. This allows you to preserve user information and query history that Hue creates without keeping your Amazon EMR cluster running.

This post describes the step-by-step process for migrating the Hue database from an existing EMR cluster.

Note: Amazon EMR supports different Hue versions across different AMI releases. Keep in mind the compatibility of Hue versions between the old and new clusters in this migration activity. Currently, Hue 3.x.x versions are not compatible with Hue 4.x.x versions, and therefore a migration between these two Hue versions might create issues. In addition, Hue 3.10.0 is not backward compatible with its previous 3.x.x versions.

Before you begin

First, let’s create a new testUser in Hue on an existing EMR cluster, as shown following:

You will use these credentials later to log in to Hue on the new EMR cluster and validate whether you have successfully migrated the Hue database.

Let’s get started!

Migration how-to

Follow these steps to migrate your database to a new EMR cluster and then validate the migration process.

1.) Make a backup of the existing Hue database.

Use SSH to connect to the master node of the old cluster, as shown following (if you are using Linux/Unix/macOS), and dump the Hue database to a JSON file.

$ ssh -i ~/key.pem [email protected]
$ /usr/lib/hue/build/env/bin/hue dumpdata > ./hue-mysql.json

Edit the hue-mysql.json output file by removing all JSON objects that have useradmin.userprofile in the model field, and save the file. For example, remove the objects as shown following:

{
  "pk": 1,
  "model": "useradmin.userprofile",
  "fields": {
    "last_activity": "2018-01-10T11:41:04",
    "creation_method": "HUE",
    "first_login": false,
    "user": 1,
    "home_directory": "/user/hue_admin"
  }
},

2.) Store the hue-mysql.json file on persistent storage like Amazon S3.

You can copy the file from the old EMR cluster to Amazon S3 using the AWS CLI or Secure Copy (SCP) client. For example, the following uses the AWS CLI:

$ aws s3 cp ./hue-mysql.json s3://YourBucketName/folder/

3.) Recover/reload the backed-up Hue database into the new EMR cluster.

a.) Use SSH to connect to the master node of the new EMR cluster, and stop the Hue service that is already running.

$ ssh -i ~/key.pem [email protected]
$ sudo stop hue
hue stop/waiting

b.) Connect to the Hue database—either the local MySQL database or the remote database in Amazon RDS for your cluster as shown following, using the mysql client.

$ mysql -h HOST –u USER –pPASSWORD

For a local MySQL database, you can find the hostname, user name, and password for connecting to the database in the /etc/hue/conf/hue.ini file on the master node.

[[database]]
    engine = mysql
    name = huedb
    case_insensitive_collation = utf8_unicode_ci
    test_charset = utf8
    test_collation = utf8_bin
    host = ip-172-31-37-133.us-west-2.compute.internal
    user = hue
    test_name = test_huedb
    password = QdWbL3Ai6GcBqk26
    port = 3306

Based on the preceding example configuration, the sample command is as follows. (Replace the host, user, and password details based on your EMR cluster settings.)

$ mysql -h ip-172-31-37-133.us-west-2.compute.internal -u hue -pQdWbL3Ai6GcBqk26

c.) Drop the existing Hue database with the name huedb from the MySQL server.

mysql> DROP DATABASE IF EXISTS huedb;

d.) Create a new empty database with the same name huedb.

mysql> CREATE DATABASE huedb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE=utf8_bin;

e.) Now, synchronize Hue with its database huedb.

$ sudo /usr/lib/hue/build/env/bin/hue syncdb --noinput
$ sudo /usr/lib/hue/build/env/bin/hue migrate

(This populates the new huedb with all Hue tables that are required.)

f.) Log in to MySQL again, and drop the foreign key to clean tables.

mysql> SHOW CREATE TABLE huedb.auth_permission;

In the following example, replace <id value> with the actual value from the preceding output.

mysql> ALTER TABLE huedb.auth_permission DROP FOREIGN KEY
content_type_id_refs_id_<id value>;

g.) Delete the contents of the django_content_type

mysql> DELETE FROM huedb.django_content_type;

h.) Download the backed-up Hue database dump from Amazon S3 to the new EMR cluster, and load it into Hue.

$ aws s3 cp s3://YourBucketName/folder/hue-mysql.json ./
$ sudo /usr/lib/hue/build/env/bin/hue loaddata ./hue-mysql.json

i.) In MySQL, add the foreign key content_type_id back to the auth_permission

mysql> use huedb;
mysql> ALTER TABLE huedb.auth_permission ADD FOREIGN KEY (`content_type_id`) REFERENCES `django_content_type` (`id`);

j.) Start the Hue service again.

$ sudo start hue
hue start/running, process XXXX

That’s it! Now, verify whether you can successfully access the Hue UI, and sign in using your existing testUser credentials.

After a successful sign in to Hue on the new EMR cluster, you should see a similar Hue homepage as shown following with testUser as the user signed in:

Conclusion

You have now learned how to migrate an existing Hue database to a new Amazon EMR cluster and validate the migration process. If you have any similar Amazon EMR administration topics that you want to see covered in a future post, please let us know in the comments below.


Additional Reading

If you found this post useful, be sure to check out Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR and Dynamically Create Friendly URLs for Your Amazon EMR Web Interfaces.


About the Author


Anvesh Ragi is a Big Data Support Engineer with Amazon Web Services. He works closely with AWS customers to provide them architectural and engineering assistance for their data processing workflows. In his free time, he enjoys traveling and going for hikes.

Serverless Dynamic Web Pages in AWS: Provisioned with CloudFormation

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/architecture/serverless-dynamic-web-pages-in-aws-provisioned-with-cloudformation/

***This blog is authored by Mike Okner of Monsanto, an AWS customer. It originally appeared on the Monsanto company blog. Minor edits were made to the original post.***

Recently, I was looking to create a status page app to monitor a few important internal services. I wanted this app to be as lightweight, reliable, and hassle-free as possible, so using a “serverless” architecture that doesn’t require any patching or other maintenance was quite appealing.

I also don’t deploy anything in a production AWS environment outside of some sort of template (usually CloudFormation) as a rule. I don’t want to have to come back to something I created ad hoc in the console after 6 months and try to recall exactly how I architected all of the resources. I’ll inevitably forget something and create more problems before solving the original one. So building the status page in a template was a requirement.

The Design
I settled on a design using two Lambda functions, both written in Python 3.6.

The first Lambda function makes requests out to a list of important services and writes their current status to a DynamoDB table. This function is executed once per minute via CloudWatch Event Rule.

The second Lambda function reads each service’s status & uptime information from DynamoDB and renders a Jinja template. This function is behind an API Gateway that has been configured to return text/html instead of its default application/json Content-Type.

The CloudFormation Template
AWS provides a Serverless Application Model template transformer to streamline the templating of Lambda + API Gateway designs, but it assumes (like everything else about the API Gateway) that you’re actually serving an API that returns JSON content. So, unfortunately, it won’t work for this use-case because we want to return HTML content. Instead, we’ll have to enumerate every resource like usual.

The Skeleton
We’ll be using YAML for the template in this example. I find it easier to read than JSON, but you can easily convert between the two with a converter if you disagree.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Serverless status page app
Resources:
  # [...Resources]

The Status-Checker Lambda Resource
This one is triggered on a schedule by CloudWatch, and looks like:

# Status Checker Lambda
CheckerLambda:
  Type: AWS::Lambda::Function
  Properties:
    Code: ./lambda.zip
    Environment:
      Variables:
        TABLE_NAME: !Ref DynamoTable
    Handler: checker.handler
    Role:
      Fn::GetAtt:
      - CheckerLambdaRole
      - Arn
    Runtime: python3.6
    Timeout: 45
CheckerLambdaRole:
  Type: AWS::IAM::Role
  Properties:
    ManagedPolicyArns:
    - arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
    - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Action:
        - sts:AssumeRole
        Effect: Allow
        Principal:
          Service:
          - lambda.amazonaws.com
CheckerLambdaTimer:
  Type: AWS::Events::Rule
  Properties:
    ScheduleExpression: rate(1 minute)
    Targets:
    - Id: CheckerLambdaTimerLambdaTarget
      Arn:
        Fn::GetAtt:
        - CheckerLambda
        - Arn
CheckerLambdaTimerPermission:
  Type: AWS::Lambda::Permission
  Properties:
    Action: lambda:invokeFunction
    FunctionName: !Ref CheckerLambda
    SourceArn:
      Fn::GetAtt:
      - CheckerLambdaTimer
      - Arn
    Principal: events.amazonaws.com

Let’s break that down a bit.

The CheckerLambda is the actual Lambda function. The Code section is a local path to a ZIP file containing the code and its dependencies. I’m using CloudFormation’s packaging feature to automatically push the deployable to S3.

The CheckerLambdaRole is the IAM role the Lambda will assume which grants it access to DynamoDB in addition to the usual Lambda logging permissions.

The CheckerLambdaTimer is the CloudWatch Events Rule that triggers the checker to run once per minute.

The CheckerLambdaTimerPermission grants CloudWatch the ability to invoke the checker Lambda function on its interval.

The Web Page Gateway
The API Gateway handles incoming requests for the web page, invokes the Lambda, and then returns the Lambda’s results as HTML content. Its template looks like:

# API Gateway for Web Page Lambda
PageGateway:
  Type: AWS::ApiGateway::RestApi
  Properties:
    Name: Service Checker Gateway
PageResource:
  Type: AWS::ApiGateway::Resource
  Properties:
    RestApiId: !Ref PageGateway
    ParentId:
      Fn::GetAtt:
      - PageGateway
      - RootResourceId
    PathPart: page
PageGatewayMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    AuthorizationType: NONE
    HttpMethod: GET
    Integration:
      Type: AWS
      IntegrationHttpMethod: POST
      Uri:
        Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${WebRenderLambda.Arn}/invocations
      RequestTemplates:
        application/json: |
          {
              "method": "$context.httpMethod",
              "body" : $input.json('$'),
              "headers": {
                  #foreach($param in $input.params().header.keySet())
                  "$param": "$util.escapeJavaScript($input.params().header.get($param))"
                  #if($foreach.hasNext),#end
                  #end
              }
          }
      IntegrationResponses:
      - StatusCode: 200
        ResponseParameters:
          method.response.header.Content-Type: "'text/html'"
        ResponseTemplates:
          text/html: "$input.path('$')"
    ResourceId: !Ref PageResource
    RestApiId: !Ref PageGateway
    MethodResponses:
    - StatusCode: 200
      ResponseParameters:
        method.response.header.Content-Type: true
PageGatewayProdStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    DeploymentId: !Ref PageGatewayDeployment
    RestApiId: !Ref PageGateway
    StageName: Prod
PageGatewayDeployment:
  Type: AWS::ApiGateway::Deployment
  DependsOn: PageGatewayMethod
  Properties:
    RestApiId: !Ref PageGateway
    Description: PageGateway deployment
    StageName: Stage

There’s a lot going on here, but the real meat is in the PageGatewayMethod section. There are a couple properties that deviate from the default which is why we couldn’t use the SAM transformer.

First, we’re passing request headers through to the Lambda in theRequestTemplates section. I’m doing this so I can validate incoming auth headers. The API Gateway can do some types of auth, but I found it easier to check auth myself in the Lambda function since the Gateway is designed to handle API calls and not browser requests.

Next, note that in the IntegrationResponses section we’re defining the Content-Type header to be ‘text/html’ (with single-quotes) and defining the ResponseTemplate to be $input.path(‘$’). This is what makes the request render as a HTML page in your browser instead of just raw text.

Due to the StageName and PathPart values in the other sections, your actual page will be accessible at https://someId.execute-api.region.amazonaws.com/Prod/page. I have the page behind an existing reverse-proxy and give it a saner URL for end-users. The reverse proxy also attaches the auth header I mentioned above. If that header isn’t present, the Lambda will render an error page instead so the proxy can’t be bypassed.

The Web Page Rendering Lambda
This Lambda is invoked by calls to the API Gateway and looks like:

# Web Page Lambda
WebRenderLambda:
  Type: AWS::Lambda::Function
  Properties:
    Code: ./lambda.zip
    Environment:
      Variables:
        TABLE_NAME: !Ref DynamoTable
    Handler: web.handler
    Role:
      Fn::GetAtt:
      - WebRenderLambdaRole
      - Arn
    Runtime: python3.6
    Timeout: 30
WebRenderLambdaRole:
  Type: AWS::IAM::Role
  Properties:
    ManagedPolicyArns:
    - arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess
    - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Action:
        - sts:AssumeRole
        Effect: Allow
        Principal:
          Service:
          - lambda.amazonaws.com
WebRenderLambdaGatewayPermission:
  Type: AWS::Lambda::Permission
  Properties:
    FunctionName: !Ref WebRenderLambda
    Action: lambda:invokeFunction
    Principal: apigateway.amazonaws.com
    SourceArn:
      Fn::Sub:
      - arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${__ApiId__}/*/*/*
      - __ApiId__: !Ref PageGateway

The WebRenderLambda and WebRenderLambdaRole should look familiar.

The WebRenderLambdaGatewayPermission is similar to the Status Checker’s CloudWatch permission, only this time it allows the API Gateway to invoke this Lambda.

The DynamoDB Table
This one is straightforward.

# DynamoDB table
DynamoTable:
  Type: AWS::DynamoDB::Table
  Properties:
    AttributeDefinitions:
    - AttributeName: name
      AttributeType: S
    ProvisionedThroughput:
      WriteCapacityUnits: 1
      ReadCapacityUnits: 1
    TableName: status-page-checker-results
    KeySchema:
    - KeyType: HASH
      AttributeName: name

The Deployment
We’ve made it this far defining every resource in a template that we can check in to version control, so we might as well script the deployment as well rather than manually manage the CloudFormation Stack via the AWS web console.

Since I’m using the packaging feature, I first run:

$ aws cloudformation package \
    --template-file template.yaml \
    --s3-bucket <some-bucket-name> \
    --output-template-file template-packaged.yaml
Uploading to 34cd6e82c5e8205f9b35e71afd9e1548 1922559 / 1922559.0 (100.00%) Successfully packaged artifacts and wrote output template to file template-packaged.yaml.

Then to deploy the template (whether new or modified), I run:

$ aws cloudformation deploy \
    --region '<aws-region>' \
    --template-file template-packaged.yaml \
    --stack-name '<some-name>' \
    --capabilities CAPABILITY_IAM
Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - <some-name>

And that’s it! You’ve just created a dynamic web page that will never require you to SSH anywhere, patch a server, recover from a disaster after Amazon terminates your unhealthy EC2, or any other number of pitfalls that are now the problem of some ops person at AWS. And you can reproduce deployments and make changes with confidence because everything is defined in the template and can be tracked in version control.

Best Practices for Running Apache Kafka on AWS

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.

Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.

The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed.

Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this blog post, we cover the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.

Deployment considerations and patterns

In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.

Single AWS Region, Three Availability Zones, All Active

One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers and Kafka cluster are deployed on each AZ.
  • Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
  • Kafka consumers aggregate data from all three Kafka clusters.

Kafka cluster failover occurs this way:

  • Mark down all Kafka producers
  • Stop consumers
  • Debug and restack Kafka
  • Restart consumers
  • Restart Kafka producers

Following are the pros and cons of this pattern.

ProsCons
  • Highly available
  • Can sustain the failure of two AZs
  • No message loss during failover
  • Simple deployment

 

  • Very high operational overhead:
    • All changes need to be deployed three times, one for each Kafka cluster
    • Maintaining and monitoring three Kafka clusters
    • Maintaining and monitoring three consumer clusters

A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers are deployed on all three AZs.
  • Only one Kafka cluster is deployed across three AZs (active).
  • ZooKeeper instances are deployed on each AZ.
  • Brokers are spread evenly across all three AZs.
  • Kafka consumers can be deployed across all three AZs.
  • Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

  • Switch traffic to standby Kafka producers cluster and Kafka cluster.
  • Restart consumers to consume from standby Kafka cluster.

Following are the pros and cons of this pattern.

ProsCons
  • Less operational overhead when compared to the first option
  • Only one Kafka cluster to manage and consume data from
  • Can handle single AZ failures without activating a standby Kafka cluster
  • Added latency due to cross-AZ data transfer among Kafka brokers
  • For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
  • The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
  • Possibility of in-transit message loss during failover

Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.

Storage options

There are two storage options for file storage in Amazon EC2:

Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.

Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.

The following table contrasts the benefits of using an instance store versus using EBS for storage.

Instance storeEBS
  • Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
  • Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.

 

  • The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
  • Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.

 

 

Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.

Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.

Instance types

The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.

Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.

Here are sample benchmark numbers from Intuit tests.

ConfigurationBroker bytes (MB/s)
  • r3.xlarge
  • ST1 EBS
  • 12 brokers
  • 12 partitions

 

Aggregate 346.9

If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:

  • It has a faster processor (Broadwell).
  • EBS is optimized by default.
  • It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
  • It costs 20 percent less than R3.

Note: It’s always best practice to check for the latest changes in instance types.

Networking

The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.

Upgrades

Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.

Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

Intuit followed the blue/green deployment model for their workloads, as described following.

If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.

The following illustration shows a blue/green upgrade.

In this scenario, the upgrade plan works like this:

  • Create a new Kafka cluster on AWS.
  • Create a new Kafka producers stack to point to the new Kafka cluster.
  • Create topics on the new Kafka cluster.
  • Test the green deployment end to end (sanity check).
  • Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.

The roll-back plan works like this:

  • Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.

For additional details on blue/green deployment architecture using Kafka, see the re:Invent presentation Leveraging the Cloud with a Blue-Green Deployment Architecture.

Performance tuning

You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.

 These are some general performance tuning techniques:

  • If throughput is less than network capacity, try the following:
    • Add more threads
    • Increase batch size
    • Add more producer instances
    • Add more partitions
  • To improve latency when acks =-1, increase your num.replica.fetches value.
  • For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
  • Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
  • Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
  • Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.

For Java and JVM tuning, try the following:

  • Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
  • Try to keep the Kafka heap size below 4 GB.

Monitoring

Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.

For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.

For system metrics, we recommend that you monitor:

  • CPU load
  • Network metrics
  • File handle usage
  • Disk space
  • Disk I/O performance
  • Garbage collection
  • ZooKeeper

For producers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

For consumers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

Security

Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.

Encryption at rest

For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.

Encryption in transit

Kafka uses TLS for client and internode communications.

Authentication

Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).

Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.

Authorization

In Kafka, authorization is pluggable and integration with external authorization services is supported.

Backup and restore

The type of storage used in your deployment dictates your backup and restore strategy.

The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.

For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.

For more information on how to back up in Kafka, see the Kafka documentation.

Conclusion

In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.


About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

ResourceCassandraAWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
DatacenterA group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/nodeA physical virtual machine running Cassandra software.An EC2 instance.
TokenConceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.Managed within Cassandra.
Virtual node (vnode)Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.Managed within Cassandra.
Replication factorThe total number of replicas across the cluster.Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

ProsCons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

ProsCons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

ProsCons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

 Ephemeral storageAmazon EBSComments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance typesI3Compute optimized, C5Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recoveryCustomBasic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.