Tag Archives: ads

Announcing the new AWS Certified Security – Specialty exam

Post Syndicated from Janna Pellegrino original https://aws.amazon.com/blogs/architecture/announcing-the-new-aws-certified-security-specialty-exam/

Good news for cloud security experts: following our most popular beta exam ever, the AWS Certified Security – Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.

About the exam
The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

The exam validates:

  • An understanding of specialized data classifications and AWS data protection mechanisms.
  • An understanding of data encryption methods and AWS mechanisms to implement them.
  • An understanding of secure Internet protocols and AWS mechanisms to implement them.
  • A working knowledge of AWS security services and features of services to provide a secure production environment.
  • Competency gained from two or more years of production deployment experience using AWS security services and features.
  • Ability to make trade-off decisions with regard to cost, security, and deployment complexity given a set of application requirements.
  • An understanding of security operations and risk.

Learn more and register >>

How to prepare
We have training and other resources to help you prepare for the exam:

AWS Training (aws.amazon.com/training)

Additional Resources

Learn more and register >>

Please contact us if you have questions about exam registration.

Good luck!

Announcing the new AWS Certified Security – Specialty exam

Post Syndicated from Ozlem Yilmaz original https://aws.amazon.com/blogs/security/announcing-the-new-aws-certified-security-specialty-exam/

Good news for cloud security experts: the AWS Certified Security — Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.

About the exam

The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

The exam validates your understanding of:

  • Specialized data classifications and AWS data protection mechanisms
  • Data encryption methods and AWS mechanisms to implement them
  • Secure Internet protocols and AWS mechanisms to implement them
  • AWS security services and features of services to provide a secure production environment
  • Making tradeoff decisions with regard to cost, security, and deployment complexity given a set of application requirements
  • Security operations and risk

How to prepare

We have training and other resources to help you prepare for the exam.

AWS Training that includes:

Additional Resources

Learn more and register here, and please contact us if you have questions about exam registration.

Want more AWS Security news? Follow us on Twitter.

Registrars Suspend 11 Pirate Site Domains, 89 More in the Crosshairs

Post Syndicated from Andy original https://torrentfreak.com/registrars-suspend-11-pirate-site-domains-89-more-in-the-crosshairs-180423/

In addition to website blocking which is running rampant across dozens of countries right now, targeting the domains of pirate sites is considered to be a somewhat effective anti-piracy tool.

The vast majority of websites are found using a recognizable name so when they become inaccessible, site operators have to work quickly to get the message out to fans. That can mean losing visitors, at least in the short term, and also contributes to the rise of copy-cat sites that may not have users’ best interests at heart.

Nevertheless, crime-fighting has always been about disrupting the ability of the enemy to do business so with this in mind, authorities in India began taking advice from the UK’s Police Intellectual Property Crime Unit (PIPCU) a couple of years ago.

After studying the model developed by PIPCU, India formed its Digital Crime Unit (DCU), which follows a multi-stage plan.

Initially, pirate sites and their partners are told to cease-and-desist. Next, complaints are filed with advertisers, who are asked to stop funding site activities. Service providers and domain registrars also receive a written complaint from the DCU, asking them to suspend services to the sites in question.

Last July, the DCU earmarked around 9,000 sites where pirated content was being made available. From there, 1,300 were placed on a shortlist for targeted action. Precisely how many have been contacted thus far is unclear but authorities are now reporting success.

According to local reports, the Maharashtra government’s Digital Crime Unit has managed to have 11 pirate site domains suspended following complaints from players in the entertainment industry.

As is often the case (and to avoid them receiving even more attention) the sites in question aren’t being named but according to Brijesh Singh, special Inspector General of Police in Maharashtra, the sites had a significant number of visitors.

Their domain registrars were sent a notice under Section 149 of the Code Of Criminal Procedure, which grants police the power to take preventative action when a crime is suspected. It’s yet to be confirmed officially but it seems likely that pirate sites utilizing local registrars were targeted by the authorities.

“Responding to our notice, the domain names of all these websites, that had a collective viewership of over 80 million, were suspended,” Singh said.

Laxman Kamble, a police inspector attached to the state government’s Cyber Cell, said the pilot project was launched after the government received complaints from Viacom and Star but back in January there were reports that the MPAA had also become involved.

Using the model pioneered by London’s PIPCU, 19 parameters were applied to list of pirate sites in order to place them on the shortlist. They are reported to include the type of content being uploaded, downloaded, and the number of downloads overall.

Kamble reports that a further 89 websites, that have domains registered abroad but are very popular in India, are now being targeted. Whether overseas registrars will prove as compliant will remain to be seen. After booking initial success, even PIPCU itself experienced problems keeping up the momentum with registrars.

In 2014, information obtained by TorrentFreak following a Freedom of Information request revealed that only five out of 70 domain registrars had complied with police requests to suspend domains.

A year later, PIPCU confirmed that suspending pirate domain names was no longer a priority for them after ICANN ruled that registrars don’t have to suspend domain names without a valid court order.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Netflix, Amazon and Hollywood Sue “SET TV” Over IPTV Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-amazon-and-hollywood-sue-set-tv-over-iptv-piracy-180422/

In recent years, piracy streaming tools and services have become a prime target for copyright enforcers.

This is particularly true for the Alliance for Creativity and Entertainment (ACE), an anti-piracy partnership forged between Hollywood studios, Netflix, Amazon, and more than two dozen other companies.

After taking action against Kodi-powered devices Tickbox and Dragonbox, key ACE members have now filed a similar lawsuit against the Florida-based company Set Broadcast, LLC, which sells the popular IPTV service SET TV.

The complaint, filed at a California federal court on Friday, further lists company owner Jason Labbosiere and employee Nelson Johnson among the defendants.

According to the movie companies, the Set TV software is little more than a pirate tool, allowing buyers to stream copyright infringing content.

“Defendants market and sell subscriptions to ‘Setvnow,’ a software application that Defendants urge their customers to use as a tool for the mass infringement of Plaintiffs’ copyrighted motion pictures and television shows,” the complaint reads.

In addition to the software, the company also offers a preloaded box. Both allow users to connect to live streams of TV channels and ‘on demand’ content. The latter includes movies that are still in theaters, which SET TV allegedly streams through third-party sources.

“For its on-demand options, Setvnow relies on third-party sources that illicitly reproduce copyrighted works and then provide streams of popular content such as movies still exclusively in theaters and television shows.”

From the complaint

The intended use of SET TV is clear, according to the movie companies. They frame it as a pirate service and believe that this is the main draw for consumers.

“Defendants promote the use of Setvnow for overwhelmingly, if not exclusively, infringing purposes, and that is how their customers use Setvnow,” the complaint reads.

Interestingly, the complaint also states that SET TV pays for sponsored reviews to reach a broader audience. The videos, posted by popular YouTubers such as Solo Man, who is quoted in the complaint, advertise the IPTV service.

“[The] sponsored reviewer promotes Setvnow as a quick and easy way to access on demand movies: ‘You have new releases right there and you simply click on the movie … you click it and click on play again and here you have the movie just like that in 1 2 3 in beautiful HD quality’.”

The lawsuit aims to bring an end to this. The movie companies ask the California District for an injunction to shut down the infringing service and impound all pre-loaded devices. In addition, they’re requesting statutory damages which could go up to several million dollars.

At the time of writing the SET TV website is still in the air, selling subscriptions. The company itself has yet to comment on the allegations.

A copy of the complaint is available here (pdf), courtesy of GeekWire.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

The Pirate Bay Suffers Extended Downtime, Tor Domain Is Up

Post Syndicated from Ernesto original https://torrentfreak.com/pirate-bay-suffers-extended-downtime-tor-domain/

pirate bayThe main Pirate Bay domain has been offline for the last one-and-a-half days.

For most people, the site currently displays a Cloudflare error message across the entire site, with the CDN provider referring to a “bad gateway.”

No further details are available to us and there is no known ETA for the site’s full return. Judging from past experience, however, it’s likely a small technical hiccup that needs fixing.

There are no issues with the domain name itself and Cloudflare seems to be fully functional as well.

Pirate Bay downtime, bad gateway

TorrentFreak hasn’t heard anything from the TPB team but these type of outages are not unusual. The Pirate Bay has had quite a few stints of downtime in recent months. The popular torrent site usually returns after several hours.

Amid the downtime, there’s still some good news for those who desperately need to access the notorious torrent site. TPB is still available via its .onion address on the Tor network, accessible using the popular Tor Browser, for example.

The site’s Tor traffic goes through a separate server and works just fine. However, based on the irregular uploads, that’s not going completely smoothly either.

In addition, some of The Pirate Bay’s unofficial proxy sites are still working fine and showing new torrents.

As always, more details on The Pirate Bay’s current status are available on the official forum, but don’t expect any ETA there.

“Patience is the game we are all playing for now,” TPB moderator demonS notes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Russia Blacklists 250 Pirate Sites For Displaying Gambling Ads

Post Syndicated from Andy original https://torrentfreak.com/russia-blacklists-250-pirate-sites-for-displaying-gambling-ads-180421/

Blocking alleged pirate sites is usually a question of proving that they’re involved in infringement and then applying to the courts for an injunction.

In Europe, the process is becoming easier, largely thanks to an EU ruling that permits blocking on copyright grounds.

As reported over the past several years, Russia is taking its blocking processes very seriously. Copyright holders can now have sites blocked in just a few days, if they can show their operators as being unresponsive to takedown demands.

This week, however, Russian authorities have again shown that copyright infringement doesn’t have to be the only Achilles’ heel of pirate sites.

Back in 2006, online gambling was completely banned in Russia. Three years later in 2009, land-based gambling was also made illegal in all but four specified regions. Then, in 2012, the Russian Supreme Court ruled that ISPs must block access to gambling sites, something they had previously refused to do.

That same year, telecoms watchdog Rozcomnadzor began publishing a list of banned domains and within those appeared some of the biggest names in gambling. Many shut down access to customers located in Russia but others did not. In response, Rozcomnadzor also began targeting sites that simply offered information on gambling.

Fast forward more than six years and Russia is still taking a hard line against gambling operators. However, it now finds itself in a position where the existence of gambling material can also assist the state in its quest to take down pirate sites.

Following a complaint from the Federal Tax Service of Russia, Rozcomnadzor has again added a large number of ‘pirate’ sites to the country’s official blocklist after they advertised gambling-related products and services.

“Rozkomnadzor, at the request of the Federal Tax Service of Russia, added more than 250 pirate online cinemas and torrent trackers to the unified register of banned information, which hosted illegal advertising of online casinos and bookmakers,” the telecoms watchdog reported.

Almost immediately, 200 of the sites were blocked by local ISPs since they failed to remove the advertising when told to do so. For the remaining 50 sites, breathing space is still available. Their bans can be suspended if the offending ads are removed within a timeframe specified by the authorities, which has not yet run out.

“Information on a significant number of pirate resources with illegal advertising was received by Rozcomnadzor from citizens and organizations through a hotline that operates on the site of the Unified Register of Prohibited Information, all of which were sent to the Federal Tax Service for making decisions on restricting access,” the watchdog revealed.

Links between pirate sites and gambling companies have traditionally been close over the years, with advertising for many top-tier brands appearing on portals large and small. However, in recent times the prevalence of gambling ads has diminished, in part due to campaigns conducted in the United States, Europe, and the UK.

For pirate site operators in Russia, the decision to carry gambling ads now comes with the added risk of being blocked. Only time will tell whether any reduction in traffic is considered serious enough to warrant a gambling boycott of their own.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Implement continuous integration and delivery of serverless AWS Glue ETL applications using AWS Developer Tools

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/

AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.

AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.

When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:

  • Iterative development with unit tests
  • Continuous integration and build
  • Pushing the ETL pipeline to a test environment
  • Pushing the ETL pipeline to a production environment
  • Testing ETL applications using real data (live test)
  • Exploring and validating data

In this post, I walk you through a solution that implements a CI/CD pipeline for serverless AWS Glue ETL applications supported by AWS Developer Tools (including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild) and AWS CloudFormation.

Solution overview

The following diagram shows the pipeline workflow:

This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:

1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.

To get the ETL job source code and AWS CloudFormation template, download the gluedemoetl.zip file. This solution is developed based on a previous post, Build a Data Lake Foundation with AWS Glue and Amazon S3.

2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.

The LiveTest stage includes the following actions:

  • Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
  • AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
  • LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
  • LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval

3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.

Try it out

This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).

To get started with this solution, choose Launch Stack:

This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.

In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.

After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.

Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.

In the CodeCommit console, choose Code in the navigation pane to view the solution files.

Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.

At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.

At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.

Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:

Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.

SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";

The following shows the raw data:

The following shows the transformed data:

The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.

Add comments as needed, and choose Approve.

The LiveTestApproval stage now appears as Approved on the console.

After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.

Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.

After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.

Conclusion

In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.

If you have questions or suggestions, please comment below.

 


Additional Reading

If you found this post useful, be sure to check out Implement Continuous Integration and Delivery of Apache Spark Applications using AWS and Build a Data Lake Foundation with AWS Glue and Amazon S3.

 


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

Facebook Privacy Fiasco Sees Congress Urged on Anti-Piracy Action

Post Syndicated from Andy original https://torrentfreak.com/facebook-privacy-fiasco-sees-congress-urged-on-anti-piracy-action-180420/

It has been a tumultuous few weeks for Facebook, and some would say quite rightly so. The company is a notorious harvester of personal information but last month’s Cambridge Analytica scandal really brought things to a head.

With Facebook co-founder and Chief Executive Officer Mark Zuckerberg in the midst of a PR nightmare, last Tuesday the entrepreneur appeared before the Senate. A day later he faced a grilling from lawmakers, answering questions concerning the social networking giant’s problems with user privacy and how it responds to breaches.

What practical measures Zuckerberg and his team will take to calm the storm are yet to unfold but the opportunity to broaden the attack on both Facebook and others in the user-generated content field is now being seized upon. Yes, privacy is the number one controversy at the moment but Facebook and others of its ilk need to step up and take responsibility for everything posted on their platforms.

That’s the argument presented by the American Federation of Musicians, the Content Creators Coalition, CreativeFuture, and the Independent Film & Television Alliance, who together represent more than 650 entertainment industry companies and 240,000 members. CreativeFuture alone represents more than 500 companies, including all the big Hollywood studios and major players in the music industry.

In letters sent to the Senate Committee on the Judiciary; the Senate Committee on Commerce, Science, and Transportation; and the House Energy and Commerce Committee, the coalitions urge Congress to not only ensure that Facebook gets its house in order, but that Google, Twitter, and similar platforms do so too.

The letters begin with calls to protect user data and tackle the menace of fake news but given the nature of the coalitions and their entertainment industry members, it’s no surprise to see where this is heading.

“In last week’s hearing, Mr. Zuckerberg stressed several times that Facebook must ‘take a broader view of our responsibility,’ acknowledging that it is ‘responsible for the content’ that appears on its service and must ‘take a more active view in policing the ecosystem’ it created,” the letter reads.

“While most content on Facebook is not produced by Facebook, they are the publisher and distributor of immense amounts of content to billions around the world. It is worth noting that a lot of that content is posted without the consent of the people who created it, including those in the creative industries we represent.”

The letter recalls Zuckerberg as characterizing Facebook’s failure to take a broader view of its responsibilities as a “big mistake” while noting he’s also promised change.

However, the entertainment groups contend that the way the company has conducted itself – and the manner in which many Silicon Valley companies conduct themselves – is supported and encouraged by safe harbors and legal immunities that absolve internet platforms of accountability.

“We agree that change needs to happen – but we must ask ourselves whether we can expect to see real change as long as these companies are allowed to continue to operate in a policy framework that prioritizes the growth of the internet over accountability and protects those that fail to act responsibly. We believe this question must be at the center of any action Congress takes in response to the recent failures,” the groups write.

But while the Facebook fiasco has provided the opportunity for criticism, CreativeFuture and its colleagues see the problem from a much broader perspective. They suck in companies like Google, which is also criticized for shirking its responsibilities, largely because the law doesn’t compel it to act any differently.

“Google, another major global platform that has long resisted meaningful accountability, also needs to step forward and endorse the broader view of responsibility expressed by Mr. Zuckerberg – as do many others,” they continue.

“The real problem is not Facebook, or Mark Zuckerberg, regardless of how sincerely he seeks to own the ‘mistakes’ that led to the hearing last week. The problem is endemic in a system that applies a different set of rules to the internet and fails to impose ordinary norms of accountability on businesses that are built around monetizing other people’s personal information and content.”

Noting that Congress has encouraged technology companies to prosper by using a “light hand” for the past several decades, the groups say their level of success now calls for a fresh approach and a heavier touch.

“Facebook and Google are grown-ups – and it is time they behaved that way. If they will not act, then it is up to you and your colleagues in the House to take action and not let these platforms’ abuses continue to pile up,” they conclude.

But with all that said, there is an interesting conflict that develops when presenting the solution to piracy in the context of a user privacy fiasco.

In the EU, many of the companies involved in the coalitions above are calling for pre-emptive filters to prevent allegedly infringing content being uploaded to Facebook and YouTube. That means that all user uploads to such platforms will have to be opened and scanned to see what they contain before they’re allowed online.

So, user privacy or pro-active anti-piracy filters? It might not be easy or even legal to achieve both.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Married Torrent Tracker Couple Settle With BREIN

Post Syndicated from Ernesto original https://torrentfreak.com/married-torrent-tracker-couple-settles-with-brein-180420/

Dutch anti-piracy group BREIN has targeted operators and uploaders of pirate sites for more than a decade.

The group’s main goal is to shut the sites down. Instead of getting embroiled in dozens of lengthy court battles, it prefers to settle the matter with those responsible.

This week, BREIN announced another victory against a small torrent site, Snuffelland. The private tracker was targeted at a Dutch audience and the anti-piracy group managed to track down its operators.

According to BREIN, the site was run by a married couple from the town of Montfort, a 65-year-old man and a 51-year-old woman. In addition, the group also identified one of the uploaders, a 60-year-old man from Heukelum.

All three are unemployed and their financial position was taken into account in determining the scale of the settlement. The couple agreed to pay 2,500 euros and the uploader settled for 650 euros, with a threat of further penalties if they are caught again.

The private tracker itself was shut down and replaced by a message that was provided by BREIN.

“Making copyright-protected works available infringes the copyrights of the entitled rightsholder. Downloading from unauthorized sources is also prohibited in the Netherlands,” the message reads.

“For providers of legal content, snuffelland.org refers you to thecontentmap.nl and film.nl,” it adds.

These type of shutdowns are nothing new. BREIN has taken down hundreds of smaller sites in the past. However, only in recent years has the group has started to publish these settlement details.

That serves as a deterrent but also provides some more insight into how the group prefers to solve these cases, which appears to be relatively softly. In this case, it also disproves the notion that torrent sites are run by youngsters.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Confused About the Hybrid Cloud? You’re Not Alone

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/confused-about-the-hybrid-cloud-youre-not-alone/

Hybrid Cloud. What is it?

Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.

Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.

Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.

Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.

So, What is the Hybrid Cloud?

The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.

To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”

To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.

To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.

In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!

Private Cloud vs. Public Cloud

The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.

As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.

Hybrid Cloud Benefits

If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.

Diagram of the Components of the Hybrid Cloud

Benefit 1: Flexibility and Scalability

Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.

The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.

For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.

Benefit 2: Cost Savings

The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.

Comparing Private vs Hybrid Cloud Storage Costs

To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.

Scenario 1    100% of data on-premises storage

Data Stored
Data stored On-Premises: 100% 100 TB 1,000 TB 2,000 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $1,200 $12,000 $24,000
High — $20/TB/Month $2,000 $20,000 $40,000

Scenario 2    20% of data on-premises with 80% public cloud storage (B2)

Data Stored
Data stored On-Premises: 20% 20 TB 200 TB 400 TB
Data stored in Cloud: 80% 80 TB 800 TB 1,600 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $240 $2,400 $4,800
High — $20/TB/Month $400 $4,000 $8,000
Public cloud cost range Monthly Cost
Low — $5/TB/Month (B2) $400 $4,000 $8,000
High — $20/TB/Month $1,600 $16,000 $32,000
On-premises + public cloud cost range Monthly Cost
Low $640 $6,400 $12,800
High $2,000 $20,000 $40,000

As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.

When Hybrid Might Not Always Be the Right Fit

There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.

An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.

It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.

The Hybrid Cloud Can Be a Win-Win Solution

From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.

Should You Go Hybrid?

We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?

According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.

In Two Years, More Than Half of Workloads Will Run in Cloud

Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.

If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.

As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.

Tell Us What You’re Doing with the Hybrid Cloud

Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.


[1] Private cloud can be on-premises or a dedicated off-premises facility.

[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.

The post Confused About the Hybrid Cloud? You’re Not Alone appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

TV Broadcaster Wants App Stores Blocked to Prevent Piracy

Post Syndicated from Andy original https://torrentfreak.com/tv-broadcaster-wants-app-stores-blocked-to-prevent-piracy-180416/

After first targeting torrent and regular streaming platforms with blocking injunctions, last year Village Roadshow and studios including Disney, Universal, Warner Bros, Twentieth Century Fox, and Paramount began looking at a new threat.

The action targeted HDSubs+, a reasonably popular IPTV service that provides hundreds of otherwise premium live channels, movies, and sports for a relatively small monthly fee. The application was filed during October 2017 and targeted Australia’s largest ISPs.

In parallel, Hong Kong-based broadcaster Television Broadcasts Limited (TVB) launched a similar action, demanding that the same ISPs (including Telstra, Optus, TPG, and Vocus, plus subsidiaries) block several ‘pirate’ IPTV services, named in court as A1, BlueTV, EVPAD, FunTV, MoonBox, Unblock, and hTV5.

Due to the similarity of the cases, both applications were heard in Federal Court in Sydney on Friday. Neither case is as straightforward as blocking a torrent or basic streaming portal, so both applicants are having to deal with additional complexities.

The TVB case is of particular interest. Up to a couple of dozen URLs maintain the services, which are used to provide the content, an EPG (electronic program guide), updates and sundry other features. While most of these appear to fit the description of an “online location” designed to assist copyright infringement, where the Android-based software for the IPTV services is hosted provides an interesting dilemma.

ComputerWorld reports that the apps – which offer live broadcasts, video-on-demand, and catch-up TV – are hosted on as-yet-unnamed sites which are functionally similar to Google Play or Apple’s App Store. They’re repositories of applications that also carry non-infringing apps, such as those for Netflix and YouTube.

Nevertheless, despite clear knowledge of this dual use, TVB wants to have these app marketplaces blocked by Australian ISPs, which would not only render the illicit apps inaccessible to the public but all of the non-infringing ones too. Part of its argument that this action would be reasonable appears to be that legal apps – such as Netflix’s for example – can also be freely accessed elsewhere.

It will be up to Justice Nicholas to decide whether the “primary purpose” of these marketplaces is to infringe or facilitate the infringement of TVB’s copyrights. However, TVB also appears to have another problem which is directly connected to the copyright status in Australia of its China-focused live programming.

Justice Nicholas questioned whether watching a stream in Australia of TVB’s live Chinese broadcasts would amount to copyright infringement because no copy of that content is being made.

“If most of what is occurring here is a reproduction of broadcasts that are not protected by copyright, then the primary purpose is not to facilitate copyright infringement,” Justice Nicholas said.

One of the problems appears to be that China is not a party to the 1961 Rome Convention for the Protection of Performers, Producers of Phonograms and Broadcasting Organisations. However, TVB is arguing that it should still receive protection because it airs pre-recorded content and the live broadcasts are also archived for re-transmission via catch-up services.

The question over whether unchoreographed live broadcasts receive protection has been raised in other regions but in most cases, a workaround has been found. The presence of broadcaster logos on screen (which receive copyright protection) is a factor and it’s been reported that broadcasters are able to record the ‘live’ action and transmit a copy just a couple of seconds later, thereby broadcasting an already-copyrighted work.

While TVB attempts to overcome its issues, Village Roadshow is facing some of its own in its efforts to take down HDSubs+.

It appears that at least partly in response to the Roadshow legal action, the service has undergone some modifications, including a change of brand to ‘Press Play Extra’. As reported by ZDNet, there have been structural changes too, which means that Roadshow can no longer “see under the hood”.

According to Justice Nicholas, there is no evidence that the latest version of the app infringes copyright but according to counsel for Village Roadshow, the new app is merely transitional and preparing for a possible future change.

“We submit the difference to be drawn is reactive to my clients serving on the operators a notice,” counsel for Roadshow argued, with an expert describing the new app as “almost like a placeholder.”

In short, Roadshow still wants all of the target domains in its original application blocked because the company believes there’s a good chance they’ll be reactivated in the future.

None of the ISPs involved in either case turned up to the hearings on Friday, which removes one layer of complexity in what appears thus far to be less than straightforward cases.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

How Pirates Use New Technologies for Old Sharing Habits

Post Syndicated from Ernesto original https://torrentfreak.com/how-pirates-use-new-technologies-for-old-sharing-habits-180415/

While piracy today is more widespread than ever, the urge to share content online has been around for several decades.

The first generation used relatively primitive tools, such as a bulletin board systems (BBS), newsgroups or IRC. Nothing too fancy, but they worked well for those who got over the initial learning curve.

When Napster came along things started to change. More content became available and with just a few clicks anyone could get an MP3 transferred from one corner of the world to another. The same was true for Kazaa and Limewire, which further popularized online piracy.

After this initial boom of piracy applications, BitTorrent came along, shaking up the sharing landscape even further. As torrent sites are web-based, pirated media became even more public and easy to find.

At the same time, BitTorrent brought back the smaller and more organized sharing culture of the early days through private trackers.

These communities often focused on a specific type of content and put strict rules and guidelines in place. They promoted sharing and avoided the spam that plagued their public counterparts.

That was fifteen years ago.

Today the piracy landscape is more diverse than ever. Private torrent trackers are still around and so are IRC and newsgroups. However, most piracy today takes place in public. Streaming sites and devices are booming, with central hosting platforms offering the majority of the underlying content.

That said, there is still an urge for some pirates to band together and some use newer technologies to do so.

This week The Outline ran an interesting piece on the use of Telegram channels to share pirated media. These groups use the encrypted communication platform to share copies of movies, TV shows, and a wide range of other material.

Telegram allows users to upload files up to 1.5GB in size, but larger ones can be split, in common with the good old newsgroups.

These type of sharing groups are not new. On social media platforms such as Facebook and VK, there are hundreds or thousands of dedicated communities that do the same. Both public and private. And Reddit has similar groups, relying on external links.

According to an administrator of a piracy-focused Telegram channel, the appeal of the platform is that the groups are not shut down so easily. While that may be the case with hyper-private groups, Telegram will still pull the plug if it receives enough complaints about a channel.

The same is true for Discord, another application that can be used to share content in ‘private’ communities. Discord is particularly popular among gamers, but pirates have also found their way to the platform.

While smaller communities are able to thrive, once the word gets out to copyright holders, the party can soon be over. This is also what the /r/piracy subreddit community found out a few days ago when its Discord server was pulled offline.

This triggered a discussion about possible alternatives. Telegram was mentioned by some, although not everyone liked the idea of connecting their phone number to a pirate group. Others mentioned Slack, Weechat, Hexchat and Riot.im.

None of these tools are revolutionary. At least, not for the intended use by this group. Some may be harder to take down than others, but they are all means to share files, directly or through external links.

What really caught our eye, however, were several mentions of an ancient application layer protocol that, apparently, hasn’t lost its use to pirates.

“I’ll make an IRC server and host that,” one user said, with others suggesting the same.

And so we have come full circle…

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

AWS Online Tech Talks – April & Early May 2018

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-early-may-2018/

We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

April & early May — 2018 Schedule

Compute

April 30, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

May 1, 2018 | 01:00 PM – 01:45 PM PTHow to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.

May 2, 2018 | 01:00 PM – 01:45 PM PTDeep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.

Containers

April 23, 2018 | 11:00 AM – 11:45 AM PTNew Features for Building Powerful Containerized Microservices on AWS (300) – Learn about how this new feature works and how you can start using it to build and run modern, containerized applications on AWS.

Databases

April 23, 2018 | 01:00 PM – 01:45 PM PTElastiCache: Deep Dive Best Practices and Usage Patterns (200) – Learn about Redis-compatible in-memory data store and cache with Amazon ElastiCache.

April 25, 2018 | 01:00 PM – 01:45 PM PTIntro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.

DevOps

April 25, 2018 | 09:00 AM – 09:45 AM PTDebug your Container and Serverless Applications with AWS X-Ray in 5 Minutes (300) – Learn how AWS X-Ray makes debugging your Container and Serverless applications fun.

Enterprise & Hybrid

April 23, 2018 | 09:00 AM – 09:45 AM PTAn Overview of Best Practices of Large-Scale Migrations (300) – Learn about the tools and best practices on how to migrate to AWS at scale.

April 24, 2018 | 11:00 AM – 11:45 AM PTDeploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0

IoT

May 2, 2018 | 11:00 AM – 11:45 AM PTHow to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.

Machine Learning

April 24, 2018 | 09:00 AM – 09:45 AM PT Automate for Efficiency with Amazon Transcribe and Amazon Translate (200) – Learn how you can increase the efficiency and reach your operations with Amazon Translate and Amazon Transcribe.

April 26, 2018 | 09:00 AM – 09:45 AM PT Perform Machine Learning at the IoT Edge using AWS Greengrass and Amazon Sagemaker (200) – Learn more about developing machine learning applications for the IoT edge.

Mobile

April 30, 2018 | 11:00 AM – 11:45 AM PTOffline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.

Networking

May 2, 2018 | 09:00 AM – 09:45 AM PT Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.

Security, Identity & Compliance

April 30, 2018 | 09:00 AM – 09:45 AM PTAmazon GuardDuty – Let’s Attack My Account! (300) – Amazon GuardDuty Test Drive – Practical steps on generating test findings.

May 3, 2018 | 09:00 AM – 09:45 AM PTProtect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.

Serverless

April 24, 2018 | 01:00 PM – 01:45 PM PTTips and Tricks for Building and Deploying Serverless Apps In Minutes (200) – Learn how to build and deploy apps in minutes.

Storage

May 1, 2018 | 11:00 AM – 11:45 AM PTBuilding Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.

May 3, 2018 | 11:00 AM – 11:45 AM PTIntegrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.