Tag Archives: filtering

China’s Website and VPN Blocking Hurts Business, US Says

Post Syndicated from Ernesto original https://torrentfreak.com/chinas-website-and-vpn-blocking-hurts-business-us-says-180407/

The Chinese government is known to keep a tight grip on the websites its citizens are allowed to see on the Internet.

The so-called ‘Great Firewall’ blocks pirate sites, but also a wide variety of other websites which the government believes could have a negative influence on society.

While the exact scope of the blocking effort is unknown, it’s certain that thousands of websites are affected.

The US Government, however, is not happy with this type of censorship. In its latest Trade Barriers report, the Office of the United States Trade Representative (USTR) notes that it has a detrimental impact on businesses around the world.

“China continues to engage in extensive blocking of legitimate websites, imposing significant costs on both suppliers and users of web-based services and products,” the report reads.

The Chinese blocking efforts are affecting billions of dollars in business according to the US. The services that are affected include app stores, news sites, as well as communication services.

While many of these are targeted intentionally, some are hit by over-blocking. This happens when a blocked site shares an IP-address with other sites, which are then censored as collateral damage.

“While becoming more sophisticated over time, the technical means of blocking, dubbed the Great Firewall, still often appears to affect sites that may not be the intended target, but that may share the same Internet Protocol address,” USTR writes.

According to industry figures, twelve of the top thirty most popular sites on the Internet are currently censored in China. And while it used to be relatively easy to bypass these measures with a VPN, that is changing too.

Starting this month, all unauthorized VPN services are banned. Companies can only operate a VPN if they lease state-approved services via the Government. This is hurting even more businesses, according to the US. Not just in their pockets, but also in terms of privacy.

“In the past, consumers and business have been able to avoid government-run filtering through the use of VPN services, but a crackdown in 2017 has all but eliminated that option, with popular VPN applications now banned,” USTR writes.

“This development has had a particularly dire effect on foreign businesses, which routinely use VPN services to connect to locations and services outside of China, and which depend on VPN technology to ensure confidentiality of communications.”

Ironically, US companies are assisting the Chinese Government to keep their Great Firewall up. For example, last year VPN applications started to disappear from Apple’s iOS store following pressure from Chinese authorities.

It’s clear that the United States is not happy with China’s censorship regime. However, it’s unlikely that we’ll see a reversal anytime soon. As long as China is willing to jail its citizens for operating VPN services, there’s still a long way to go.

A copy of USTR’s 2018 National Trade Estimate Report on Foreign Trade Barriers is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Cloudflare Fails to Eliminate ‘Moot’ Pirate Site Blocking Threat

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-fails-eliminate-moot-pirate-site-blocking-threat/

Representing various major record labels, the RIAA filed a lawsuit against pirate site MP3Skull three years ago.

With millions of visitors per month the MP3 download site had been one of the prime sources of pirated music for a long time.

In 2016, the record labels won their case against the MP3 download portal but the site initially ignored the court order and continued to operate. This prompted the RIAA to go after third-party services including Cloudflare, demanding that they block associated domain names.

Cloudflare objected and argued that the DMCA shielded the company from the broad blocking requirements. However, the court ruled that the DMCA doesn’t apply in this case, opening the door to widespread anti-piracy filtering.

The court stressed that, before issuing an injunction against Cloudflare, it still had to be determined whether the CDN provider is “in active concert or participation” with the pirate site. However, this has yet to happen. Since MP3Skull has ceased its operations the RIAA has shown little interest in pursuing the matter any further.

While there is no longer an immediate site blocking threat, it makes it easier for rightsholders to request similar blocking requests in the future. Cloudflare, therefore, asked the court to throw the order out, arguing that since MP3Skull is no longer available the issue is moot.

This week, US District Court Judge Marcia Cooke denied that request.

Denied

This is, of course, music to the ears of the RIAA and its members.

The RIAA wants to keep the door open for similar blocking requests in the future. This potential liability for pirates sites is the main reason why the CDN provider asked the court to vacate the order, the RIAA said previously.

While the order remains in place, Judge Cooke suggests that both parties are working on some kind of compromise or clarification and gave two weeks to draft this into a new proposal.

“The parties may draft and submit a joint proposed order addressing the issues raised at the hearing on or before April 10, 2018,” Judge Cooke writes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Google Adds ‘Kodi’ to Autocomplete Piracy Filter

Post Syndicated from Ernesto original https://torrentfreak.com/google-adds-kodi-to-autocomplete-piracy-filter-180328/

In recent years entertainment industry groups have repeatedly urged Google to ramp up its anti-piracy efforts.

These remarks haven’t fallen on deaf ears and Google has made several changes to its search algorithms to make copyright-infringing material less visible.

The company demotes results from domain names for which it receives many DMCA takedown notices, for example, and it has also removed several piracy-related terms from its autocomplete feature.

The latter means that when one types “pirate ba” it won’t suggest pirate bay. Instead, people see “pirate bays” or “pirate books” as suggestions. Whether that’s very effective is up for debate, but it’s intentional.

“Google has taken steps to prevent terms closely associated with piracy from appearing in Autocomplete and Related Search,” the company previously explained.

“This is similar to the approach we have taken for a narrow class of terms related to pornography, violence, and hate speech.”

When the piracy filter was first implemented, several seemingly neutral terms such as BitTorrent and uTorrent were also targeted. While these were later reinstated, we recently noticed another autocomplete ban that’s rather broad.

It turns out that Google has recently removed the term “Kodi” from its autocomplete results. While Kodi can be abused through pirate add-ons, the media player software itself is perfectly legal, which makes it an odd decision.

Users who type in “Kod” get a list of suggestions including “Kodak” and “Kodiak,” but not the much more popular search term Kodi.

Kodiak?

Similarly, when typing “addons for k” Google suggests addons for Kokotime and Krypton 17.6. While the latter is a Kodi version, the name of the media player itself doesn’t come up as a suggestion.

Once users type the full Kodi term and add a space, plenty of suggestions suddenly appear, which is similar to other banned terms.

Kokotime

Ironically enough, the Kokotime app is frequently used by pirates as well. Also, the names of all of the pirate Kodi addons we checked still show up fine in the autosuggest feature.

Unfortunately, Google doesn’t document its autocomplete removal decisions, nor does it publish the full list of banned words. However, the search engine confirms that Kodi’s piracy stigma is to blame here.

“Since 2011, we have been filtering certain terms closely associated with copyright infringement from Google Autocomplete. This action is consistent with that long-standing strategy,” a spokesperson told us.

The Kodi team, operated by the XBMC Foundation, is disappointed with the decision and points out that their software does not cross any lines.

“We are surprised and disappointed to discover Kodi has been removed from autocomplete, as Kodi is perfectly legal open source software,” XBMC Foundation President Nathan Betzen told us.

The Kodi team has been actively trying to distance itself from pirate elements. They enforce their trademark against sellers of pirate boxes and are in good contact with Hollywood’s industry group, the MPAA.

“We have a professional relationship with the MPAA, who have specifically made clear in the past their own position that Kodi is legal software,” Betzen notes.

“We hope Google will reconsider this decision in the future, or at a minimum limit their removal to search terms where the legality is actually in dispute.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

GetAltName – Discover Sub-Domains From SSL Certificates

Post Syndicated from Darknet original https://www.darknet.org.uk/2018/03/getaltname-discover-sub-domains-from-ssl-certificates/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

GetAltName – Discover Sub-Domains From SSL Certificates

GetAltName it’s a little script to discover sub-domains that can extract Subject Alt Names for SSL Certificates directly from HTTPS websites which can provide you with DNS names or virtual servers.

It’s useful in a discovery phase of a pen-testing assessment, this tool can provide you with more information about your target and scope.

Features of GetAltName to Discover Sub-Domains

  • Strips wildcards and www’s
  • Returns a unique list (no duplicates)
  • Works on verified and self-signed certs
  • Domain matching system
  • Filtering for main domains and TLDs
  • Gets additional sub-domains from crt.sh
  • Outputs to clipboard

GetAltName Subdomain Exctraction Tool Usage

You can output to a text file and also copy the output to your clipboard as a List or a Single line string, which is useful if you’re trying to make a quick scan with Nmap or other tools.

Read the rest of GetAltName – Discover Sub-Domains From SSL Certificates now! Only available at Darknet.

Message Filtering Operators for Numeric Matching, Prefix Matching, and Blacklisting in Amazon SNS

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/message-filtering-operators-for-numeric-matching-prefix-matching-and-blacklisting-in-amazon-sns/

This blog was contributed by Otavio Ferreira, Software Development Manager for Amazon SNS

Message filtering simplifies the overall pub/sub messaging architecture by offloading message filtering logic from subscribers, as well as message routing logic from publishers. The initial launch of message filtering provided a basic operator that was based on exact string comparison. For more information, see Simplify Your Pub/Sub Messaging with Amazon SNS Message Filtering.

Today, AWS is announcing an additional set of filtering operators that bring even more power and flexibility to your pub/sub messaging use cases.

Message filtering operators

Amazon SNS now supports both numeric and string matching. Specifically, string matching operators allow for exact, prefix, and “anything-but” comparisons, while numeric matching operators allow for exact and range comparisons, as outlined below. Numeric matching operators work for values between -10e9 and +10e9 inclusive, with five digits of accuracy right of the decimal point.

  • Exact matching on string values (Whitelisting): Subscription filter policy   {"sport": ["rugby"]} matches message attribute {"sport": "rugby"} only.
  • Anything-but matching on string values (Blacklisting): Subscription filter policy {"sport": [{"anything-but": "rugby"}]} matches message attributes such as {"sport": "baseball"} and {"sport": "basketball"} and {"sport": "football"} but not {"sport": "rugby"}
  • Prefix matching on string values: Subscription filter policy {"sport": [{"prefix": "bas"}]} matches message attributes such as {"sport": "baseball"} and {"sport": "basketball"}
  • Exact matching on numeric values: Subscription filter policy {"balance": [{"numeric": ["=", 301.5]}]} matches message attributes {"balance": 301.500} and {"balance": 3.015e2}
  • Range matching on numeric values: Subscription filter policy {"balance": [{"numeric": ["<", 0]}]} matches negative numbers only, and {"balance": [{"numeric": [">", 0, "<=", 150]}]} matches any positive number up to 150.

As usual, you may apply the “AND” logic by appending multiple keys in the subscription filter policy, and the “OR” logic by appending multiple values for the same key, as follows:

  • AND logic: Subscription filter policy {"sport": ["rugby"], "language": ["English"]} matches only messages that carry both attributes {"sport": "rugby"} and {"language": "English"}
  • OR logic: Subscription filter policy {"sport": ["rugby", "football"]} matches messages that carry either the attribute {"sport": "rugby"} or {"sport": "football"}

Message filtering operators in action

Here’s how this new set of filtering operators works. The following example is based on a pharmaceutical company that develops, produces, and markets a variety of prescription drugs, with research labs located in Asia Pacific and Europe. The company built an internal procurement system to manage the purchasing of lab supplies (for example, chemicals and utensils), office supplies (for example, paper, folders, and markers) and tech supplies (for example, laptops, monitors, and printers) from global suppliers.

This distributed system is composed of the four following subsystems:

  • A requisition system that presents the catalog of products from suppliers, and takes orders from buyers
  • An approval system for orders targeted to Asia Pacific labs
  • Another approval system for orders targeted to European labs
  • A fulfillment system that integrates with shipping partners

As shown in the following diagram, the company leverages AWS messaging services to integrate these distributed systems.

  • Firstly, an SNS topic named “Orders” was created to take all orders placed by buyers on the requisition system.
  • Secondly, two Amazon SQS queues, named “Lab-Orders-AP” and “Lab-Orders-EU” (for Asia Pacific and Europe respectively), were created to backlog orders that are up for review on the approval systems.
  • Lastly, an SQS queue named “Common-Orders” was created to backlog orders that aren’t related to lab supplies, which can already be picked up by shipping partners on the fulfillment system.

The company also uses AWS Lambda functions to automatically process lab supply orders that don’t require approval or which are invalid.

In this example, because different types of orders have been published to the SNS topic, the subscribing endpoints have had to set advanced filter policies on their SNS subscriptions, to have SNS automatically filter out orders they can’t deal with.

As depicted in the above diagram, the following five filter policies have been created:

  • The SNS subscription that points to the SQS queue “Lab-Orders-AP” sets a filter policy that matches lab supply orders, with a total value greater than $1,000, and that target Asia Pacific labs only. These more expensive transactions require an approver to review orders placed by buyers.
  • The SNS subscription that points to the SQS queue “Lab-Orders-EU” sets a filter policy that matches lab supply orders, also with a total value greater than $1,000, but that target European labs instead.
  • The SNS subscription that points to the Lambda function “Lab-Preapproved” sets a filter policy that only matches lab supply orders that aren’t as expensive, up to $1,000, regardless of their target lab location. These orders simply don’t require approval and can be automatically processed.
  • The SNS subscription that points to the Lambda function “Lab-Cancelled” sets a filter policy that only matches lab supply orders with total value of $0 (zero), regardless of their target lab location. These orders carry no actual items, obviously need neither approval nor fulfillment, and as such can be automatically canceled.
  • The SNS subscription that points to the SQS queue “Common-Orders” sets a filter policy that blacklists lab supply orders. Hence, this policy matches only office and tech supply orders, which have a more streamlined fulfillment process, and require no approval, regardless of price or target location.

After the company finished building this advanced pub/sub architecture, they were then able to launch their internal procurement system and allow buyers to begin placing orders. The diagram above shows six example orders published to the SNS topic. Each order contains message attributes that describe the order, and cause them to be filtered in a different manner, as follows:

  • Message #1 is a lab supply order, with a total value of $15,700 and targeting a research lab in Singapore. Because the value is greater than $1,000, and the location “Asia-Pacific-Southeast” matches the prefix “Asia-Pacific-“, this message matches the first SNS subscription and is delivered to SQS queue “Lab-Orders-AP”.
  • Message #2 is a lab supply order, with a total value of $1,833 and targeting a research lab in Ireland. Because the value is greater than $1,000, and the location “Europe-West” matches the prefix “Europe-“, this message matches the second SNS subscription and is delivered to SQS queue “Lab-Orders-EU”.
  • Message #3 is a lab supply order, with a total value of $415. Because the value is greater than $0 and less than $1,000, this message matches the third SNS subscription and is delivered to Lambda function “Lab-Preapproved”.
  • Message #4 is a lab supply order, but with a total value of $0. Therefore, it only matches the fourth SNS subscription, and is delivered to Lambda function “Lab-Cancelled”.
  • Messages #5 and #6 aren’t lab supply orders actually; one is an office supply order, and the other is a tech supply order. Therefore, they only match the fifth SNS subscription, and are both delivered to SQS queue “Common-Orders”.

Although each message only matched a single subscription, each was tested against the filter policy of every subscription in the topic. Hence, depending on which attributes are set on the incoming message, the message might actually match multiple subscriptions, and multiple deliveries will take place. Also, it is important to bear in mind that subscriptions with no filter policies catch every single message published to the topic, as a blank filter policy equates to a catch-all behavior.

Summary

Amazon SNS allows for both string and numeric filtering operators. As explained in this post, string operators allow for exact, prefix, and “anything-but” comparisons, while numeric operators allow for exact and range comparisons. These advanced filtering operators bring even more power and flexibility to your pub/sub messaging functionality and also allow you to simplify your architecture further by removing even more logic from your subscribers.

Message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). SNS filtering operators for numeric matching, prefix matching, and blacklisting are available now in all AWS Regions, for no extra charge.

To experiment with these new filtering operators yourself, and continue learning, try the 10-minute Tutorial Filter Messages Published to Topics. For more information, see Filtering Messages with Amazon SNS in the SNS documentation.

Comcast’s Protected Browsing Blocks TorrentFreak as “Suspicious” Site

Post Syndicated from Ernesto original https://torrentfreak.com/comcasts-protected-browsing-blocks-torrentfreak-as-suspicious-site-18004/

Regular TorrentFreak readers know that website blocking is rampant around the globe.

Thousands of pirate sites have been blocked by court orders for offering access to infringing content. However, there are plenty of voluntary blocking measures as well.

Some Internet providers offer web filtering tools to help their customers avoid malware, adult content, pirate services, or other suspicious content. Comcast’s Xfinity Xfi service, for example, has a “protected browsing” feature.

While this can be useful in some situations, it’s far from perfect. The blocklists that are used can be quite broad. Websites are sometimes miscategorized or flagged as dangerous while that’s not the case.

This also appears to be happening with Xfinity’s protected browsing feature. A reader alerted us that, when he tried to access TorrentFreak, access was denied stating that a “suspicious” site was ahead.

A pirate logo on the blocking page suggests that there’s copyright-infringing activity involved. While it’s no secret that we cover a lot of news related to piracy, it goes a bit far to label this type of news reporting as suspicious.

Suspicious…..

While we don’t know whether the blockade is intentional or a false positive, this is certainly not the only ‘problem’ with Xfinity’s protected browsing feature.

Previously, Comcast users reported that this system prevented people from accessing PayPal as well, which is a bit much, and others reported that it stopped the Steam store from loading properly.

The good news is that the blocking ‘feature’ isn’t mandatory. Subscribers can enable and disable it whenever they please, by changing their network settings.

Unfortunately, Xfinity’s blocking efforts are not unique. We regularly get reports from users who can’t access TorrentFreak because it’s blocked, often on public WiFi networks. In these and other cases, a VPN can always come in handy.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Amazon Redshift – 2017 Recap

Post Syndicated from Larry Heathcote original https://aws.amazon.com/blogs/big-data/amazon-redshift-2017-recap/

We have been busy adding new features and capabilities to Amazon Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your Amazon Redshift implementation.

In 2017, we made more than 30 announcements about Amazon Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of Amazon Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.

To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.

Major launches in 2017

Amazon Redshift Spectrumextend analytics to your data lake, without moving data

We launched Amazon Redshift Spectrum to give you the freedom to store data in Amazon S3, in open file formats, and have it available for analytics without the need to load it into your Amazon Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.

With Redshift Spectrum, you can run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. And you can do it without loading data or resizing the Amazon Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.

For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.

To learn more about Redshift Spectrum, watch our AWS Summit session Intro to Amazon Redshift Spectrum: Now Query Exabytes of Data in S3, and read our announcement blog post Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data.

DC2 nodes—twice the performance of DC1 at the same price

We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.

Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”

Read our customer testimonials to see the performance gains our customers are experiencing with DC2 nodes. To learn more, read our blog post Amazon Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price.

Performance enhancements— 3x-5x faster queries

On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.

We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.

We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.

We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, Amazon Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.

We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.

Customer insights

Amazon Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using Amazon Redshift to drive innovation and business results.

In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with Amazon Redshift:

Partner solutions

You can enhance your Amazon Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with Amazon Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our Amazon Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.

To see what our Partners are saying about Amazon Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:

Resources

Blog posts

Visit the AWS Big Data Blog for a list of all Amazon Redshift articles.

YouTube videos

GitHub

Our community of experts contribute on GitHub to provide tips and hints that can help you get the most out of your deployment. Visit GitHub frequently to get the latest technical guidance, code samples, administrative task automation utilities, the analyze & vacuum schema utility, and more.

Customer support

If you are evaluating or considering a proof of concept with Amazon Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to Amazon Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.

If you are an Amazon Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing Amazon Redshift and Amazon Redshift Spectrum for your specific workloads. To learn more, email us at [email protected].

If you have any questions, email us at [email protected].

 


Additional Reading

If you found this post useful, be sure to check out Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data, Using Amazon Redshift for Fast Analytical Reports and How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS.


About the Author

Larry Heathcote is a Principle Product Marketing Manager at Amazon Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.

 

 

 

[$] BPF comes to firewalls

Post Syndicated from corbet original https://lwn.net/Articles/747551/rss

The Linux kernel currently supports two separate network packet-filtering
mechanisms: iptables and nftables. For the last few years, it has been
generally assumed that nftables would eventually replace the older iptables
implementation; few people expected that the kernel developers would,
instead, add a third packet filter. But that would appear to be what is
happening with the newly announced bpfilter
mechanism. Bpfilter may eventually replace both iptables and nftables, but
there are a lot of questions that will need to be answered first.

EFF Urges US Copyright Office To Reject Proactive ‘Piracy’ Filters

Post Syndicated from Andy original https://torrentfreak.com/eff-urges-us-copyright-office-to-reject-proactive-piracy-filters-180213/

Faced with millions of individuals consuming unlicensed audiovisual content from a variety of sources, entertainment industry groups have been seeking solutions closer to the roots of the problem.

As widespread site-blocking attempts to tackle ‘pirate’ sites in the background, greater attention has turned to legal platforms that host both licensed and unlicensed content.

Under current legislation, these sites and services can do business relatively comfortably due to the so-called safe harbor provisions of the US Digital Millennium Copyright Act (DMCA) and the European Union Copyright Directive (EUCD).

Both sets of legislation ensure that Internet platforms can avoid being held liable for the actions of others provided they themselves address infringement when they are made aware of specific problems. If a video hosting site has a copy of an unlicensed movie uploaded by a user, for example, it must be removed within a reasonable timeframe upon request from the copyright holder.

However, in both the US and EU there is mounting pressure to make it more difficult for online services to achieve ‘safe harbor’ protections.

Entertainment industry groups believe that platforms use the law to turn a blind eye to infringing content uploaded by users, content that is often monetized before being taken down. With this in mind, copyright holders on both sides of the Atlantic are pressing for more proactive regimes, ones that will see Internet platforms install filtering mechanisms to spot and discard infringing content before it can reach the public.

While such a system would be welcomed by rightsholders, Internet companies are fearful of a future in which they could be held more liable for the infringements of others. They’re supported by the EFF, who yesterday presented a petition to the US Copyright Office urging caution over potential changes to the DMCA.

“As Internet users, website owners, and online entrepreneurs, we urge you to preserve and strengthen the Digital Millennium Copyright Act safe harbors for Internet service providers,” the EFF writes.

“The DMCA safe harbors are key to keeping the Internet open to all. They allow anyone to launch a website, app, or other service without fear of crippling liability for copyright infringement by users.”

It is clear that pressure to introduce mandatory filtering is a concern to the EFF. Filters are blunt instruments that cannot fathom the intricacies of fair use and are liable to stifle free speech and stymie innovation, they argue.

“Major media and entertainment companies and their surrogates want Congress to replace today’s DMCA with a new law that would require websites and Internet services to use automated filtering to enforce copyrights.

“Systems like these, no matter how sophisticated, cannot accurately determine the copyright status of a work, nor whether a use is licensed, a fair use, or otherwise non-infringing. Simply put, automated filters censor lawful and important speech,” the EFF warns.

While its introduction was voluntary and doesn’t affect the company’s safe harbor protections, YouTube already has its own content filtering system in place.

ContentID is able to detect the nature of some content uploaded by users and give copyright holders a chance to remove or monetize it. The company says that the majority of copyright disputes are now handled by ContentID but the system is not perfect and mistakes are regularly flagged by users and mentioned in the media.

However, ContentID was also very expensive to implement so expecting smaller companies to deploy something similar on much more limited budgets could be a burden too far, the EFF warns.

“What’s more, even deeply flawed filters are prohibitively expensive for all but the largest Internet services. Requiring all websites to implement filtering would reinforce the market power wielded by today’s large Internet services and allow them to stifle competition. We urge you to preserve effective, usable DMCA safe harbors, and encourage Congress to do the same,” the EFF notes.

The same arguments, for and against, are currently raging in Europe where the EU Commission proposed mandatory upload filtering in 2016. Since then, opposition to the proposals has been fierce, with warnings of potential human rights breaches and conflicts with existing copyright law.

Back in the US, there are additional requirements for a provider to qualify for safe harbor, including having a named designated agent tasked with receiving copyright infringement notifications. This person’s name must be listed on a platform’s website and submitted to the US Copyright Office, which maintains a centralized online directory of designated agents’ contact information.

Under new rules, agents must be re-registered with the Copyright Office every three years, despite that not being a requirement under the DMCA. The EFF is concerned that by simply failing to re-register an agent, an otherwise responsible website could lose its safe harbor protections, even if the agent’s details have remained the same.

“We’re concerned that the new requirement will particularly disadvantage small and nonprofit websites. We ask you to reconsider this rule,” the EFF concludes.

The EFF’s letter to the Copyright Office can be found here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

No Level of Copyright Enforcement Will Ever Be Enough For Big Media

Post Syndicated from Andy original https://torrentfreak.com/no-level-of-copyright-enforcement-will-ever-be-enough-for-big-media-180107/

For more than ten years TorrentFreak has documented a continuous stream of piracy battles so it’s natural that, every now and then, we pause to consider when this war might stop. The answer is always “no time soon” and certainly not in 2018.

When swapping files over the Internet first began it wasn’t a particularly widespread activity. A reasonable amount of content was available, but it was relatively inaccessible. Then peer-to-peer came along and it sparked a revolution.

From the beginning, copyright holders felt that the law would answer their problems, whether that was by suing Napster, Kazaa, or even end users. Some industry players genuinely believed this strategy was just a few steps away from achieving its goals. Just a little bit more pressure and all would be under control.

Then, when the landmark MGM Studios v. Grokster decision was handed down in the studios’ favor during 2005, the excitement online was palpable. As copyright holders rejoiced in this body blow for the pirating masses, file-sharing communities literally shook under the weight of the ruling. For a day, maybe two.

For the majority of file-sharers, the ruling meant absolutely nothing. So what if some company could be held responsible for other people’s infringements? Another will come along, outside of the US if need be, people said. They were right not to be concerned – that’s exactly what happened.

Ever since, this cycle has continued. Eager to stem the tide of content being shared without their permission, rightsholders have advocated stronger anti-piracy enforcement and lobbied for more restrictive interpretations of copyright law. Thus far, however, literally nothing has provided a solution.

One would have thought that given the military-style raid on Kim Dotcom’s Megaupload, a huge void would’ve appeared in the sharing landscape. Instead, the file-locker business took itself apart and reinvented itself in jurisdictions outside the United States. Meanwhile, the BitTorrent scene continued in the background, somewhat obliviously.

With the SOPA debacle still fresh in relatively recent memory, copyright holders are still doggedly pursuing their aims. Site-blocking is rampant, advertisers are being pressured into compliance, and ISPs like Cox Communications now find themselves responsible for the infringements of their users. But has any of this caused any fatal damage to the sharing landscape? Not really.

Instead, we’re seeing a rise in the use of streaming sites, each far more accessible to the newcomer than their predecessors and vastly more difficult for copyright holders to police.

Systems built into Kodi are transforming these platforms into a plug-and-play piracy playground, one in which sites skirt US law and users can consume both at will and in complete privacy. Meanwhile, commercial and unauthorized IPTV offerings are gathering momentum, even as rightsholders try to pull them back.

Faced with problems like these we are now seeing calls for even tougher legislation. While groups like the RIAA dream of filtering the Internet, over in the UK a 2017 consultation had copyright holders excited that end users could be criminalized for simply consuming infringing content, let alone distributing it.

While the introduction of both or either of these measures would cause uproar (and rightly so), history tells us that each would fail in its stated aim of stopping piracy. With that eventuality all but guaranteed, calls for even tougher legislation are being readied for later down the line.

In short, there is no law that can stop piracy and therefore no law that will stop the entertainment industries coming back for harsher measures, pursuing the dream. This much we’ve established from close to two decades of litigation and little to no progress.

But really, is anyone genuinely surprised that they’re still taking this route? Draconian efforts to maintain control over the distribution of content predate the file-sharing wars by a couple of hundred years, at the very least. Why would rightsholders stop now, when the prize is even more valuable?

No one wants a minefield of copyright law. No one wants a restricted Internet. No one wants extended liability for innovators, service providers, or the public. But this is what we’ll get if this problem isn’t solved soon. Something drastic needs to happen, but who will be brave enough to admit it, let alone do something about it?

During a discussion about piracy last year on the BBC, the interviewer challenged a caller who freely admitted to pirating sports content online. The caller’s response was clear:

For far too long, broadcasters and rightsholders have abused their monopoly position, charging ever-increasing amounts for popular content, even while making billions. Piracy is a natural response to that, and effectively a chance for the little guy to get back some control, he argued.

Exactly the same happened in the music market during the late 1990s and 2000s. In response to artificial restriction of the market and the unrealistic hiking of prices, people turned to peer-to-peer networks for their fix. Thanks to this pressure but after years of turmoil, services like Spotify emerged, converting millions of former pirates in the process. Netflix, it appears, is attempting to do the same thing with video.

When people feel that they aren’t getting ripped off and that they have no further use for sub-standard piracy services in the face of stunning legal alternatives, things will change. But be under no illusion, people won’t be bullied there.

If we end up with an Internet stifled in favor of rightsholders, one in which service providers are too scared to innovate, the next generation of consumers will never forget. This will be a major problem for two key reasons. Not only will consumers become enemies but piracy will still exist. We will have come full circle, fueled only by division and hatred.

It’s a natural response to reject monopolistic behavior and it’s a natural response, for most, to be fair when treated with fairness. Destroying freedom is far from fair and will not create a better future – for anyone.

Laws have their place, no sane person will argue against that, but when the entertainment industries are making billions yet still want more, they’ll have to decide whether this will go on forever with building resentment, or if making a bit less profit now makes more sense longer term.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Google Blocks Pirate Search Results Prophylactically

Post Syndicated from Ernesto original https://torrentfreak.com/google-blocks-pirated-search-results-prophylactically-180103/

On an average day, Google processes more than three million takedown notices from copyright holders, and that’s for its search engine alone.

Under the current DMCA legislation, US-based Internet service providers are expected to remove infringing links, if a copyright holder complains.

This process shields these services from direct liability. In recent years there has been a lot of discussion about the effectiveness of the system, but Google has always maintained that it works well.

This was also highlighted by Google’s copyright counsel Caleb Donaldson, in an article he wrote for the American Bar Association’s publication Landslide.

“The DMCA provided Google and other online service providers the legal certainty they needed to grow,” Donaldson writes.

“And the DMCA’s takedown notices help us fight piracy in other ways as well. Indeed, the Web Search notice-and-takedown process provides the cornerstone of Google’s fight against piracy.”

The search engine does indeed go beyond ‘just’ removing links. The takedown notices are also used as a signal to demote domains. Websites for which it receives a lot of takedown notices will be placed lower in search results, for example.

These measures can be expanded and complemented by artificial intelligence in the future, Google’s copyright counsel envisions.

“As we move into a world where artificial intelligence can learn from vast troves of data like these, we will only get better at using the information to better fight against piracy,” Donaldson writes.

Artificial intelligence (AI) is a buzz-term that has a pretty broad meaning nowadays. Donaldson doesn’t go into detail on how AI can fight piracy. It could help to spot erroneous notices, on the one hand, but can also be applied to filter content proactively.

The latter is something Google is slowly opening up to.

Over the past year, we’ve noticed on a few occasions that Google is processing takedown notices for non-indexed links. While we assumed that this was an ‘error’ on the sender’s part, it appears to be a new policy.

“Google has critically expanded notice and takedown in another important way: We accept notices for URLs that are not even in our index in the first place. That way, we can collect information even about pages and domains we have not yet crawled,” Donaldson writes.

In other words, Google blocks URLs before they appear in the search results, as some sort of piracy vaccine.

“We process these URLs as we do the others. Once one of these not-in-index URLs is approved for takedown, we prophylactically block it from appearing in our Search results, and we take all the additional deterrent measures listed above.”

Some submitters are heavily relying on the new feature, Google found. In some cases, the majority of the submitted URLs in a notice are not indexed yet.

The search engine will keep a close eye on these developments. At TorrentFreak, we also found that copyright holders sometimes target links that don’t even exist. Whether Google will also accept these takedown requests in the future, is unknown.

It’s clear that artificial intelligence and proactive filtering are becoming more and more common, but Google says that the company will also keep an eye on possible abuse of the system.

“Google will push back if we suspect a notice is mistaken, fraudulent, or abusive, or if we think fair use or another defense excuses that particular use of copyrighted content,” Donaldson notes.

Artificial intelligence and prophylactic blocking surely add a new dimension to the standard DMCA takedown procedure, but whether it will be enough to convince copyright holders that it works, has yet to be seen.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Our ‘Kodi Box’ Is Legal & Our Users Don’t Break the Law, TickBox Tells Hollywood

Post Syndicated from Andy original https://torrentfreak.com/our-kodi-box-is-legal-our-users-dont-break-the-law-tickbox-tells-hollywood-171229/

Georgia-based TickBox TV is a provider of set-top boxes that allow users to stream all kinds of popular content. Like other similar devices, Tickboxes use the popular Kodi media player alongside instructions how to find and use third-party addons.

Of course, these types of add-ons are considered a thorn in the side of the entertainment industries and as a result, Tickbox found itself on the receiving end of a lawsuit in the United States.

Filed in a California federal court in October, Universal, Columbia Pictures, Disney, 20th Century Fox, Paramount Pictures, Warner Bros, Amazon, and Netflix accused Tickbox of inducing and contributing to copyright infringement.

“TickBox sells ‘TickBox TV,’ a computer hardware device that TickBox urges its customers to use as a tool for the mass infringement of Plaintiffs’ copyrighted motion pictures and television shows,” the complaint reads.

“TickBox promotes the use of TickBox TV for overwhelmingly, if not exclusively, infringing purposes, and that is how its customers use TickBox TV. TickBox advertises TickBox TV as a substitute for authorized and legitimate distribution channels such as cable television or video-on-demand services like Amazon Prime and Netflix.”

The copyright holders reference a TickBox TV video which informs customers how to install ‘themes’, more commonly known as ‘builds’. These ‘builds’ are custom Kodi-setups which contain many popular add-ons that specialize in supplying pirate content. Is that illegal? TickBox TV believes not.

In a response filed yesterday, TickBox underlined its position that its device is not sold with any unauthorized or illegal content and complains that just because users may choose to download and install third-party programs through which they can search for and view unauthorized content, that’s not its fault. It goes on to attack the lawsuit on several fronts.

TickBox argues that plaintiffs’ claims, that TickBox can be held secondarily liable under the theory of contributory infringement or inducement liability as described in the famous Grokster and isoHunt cases, is unlikely to succeed. TickBox says the studios need to show four elements – distribution of a device or product, acts of infringement by users of Tickbox, an object of promoting its use to infringe copyright, and causation.

“Plaintiffs have failed to establish any of these four elements,” TickBox’s lawyers write.

Firstly, TickBox says that while its device can be programmed to infringe, it’s the third party software (the builds/themes containing addons) that do all the dirty work, and TickBox has nothing to do with them.

“The Motion spends a great deal of time describing these third-party ‘Themes’ and how they operate to search for and stream videos. But the ‘Themes’ on which Plaintiffs so heavily focus are not the [TickBox], and they have absolutely nothing to do with Defendant. Rather, they are third-party modifications of the open-source media player software [Kodi] which the Box utilizes,” the response reads.

TickBox says its device is merely a small computer, not unlike a smartphone or tablet. Indeed, when it comes to running the ‘pirate’ builds listed in the lawsuit, a device supplied by one of the plaintiffs can accomplish the same task.

“Plaintiffs have identified certain of these thirdparty ‘builds’ or ‘Themes’ which are available on the internet and which can be downloaded by users to view content streamed by third-party websites; however, this same software can be installed on many different types of devices, even one distributed by affiliates of Plaintiff Amazon Content Services, LLC,” the company adds.

Referencing the Grokster case, TickBox states that particular company was held liable for distributing a device (the Grokster software) “with the object of promoting its use to infringe copyright.” In the isoHunt case, it argues that the provision of torrent files satisfied the first element of inducement liability.

“In contrast, Defendant’s product – the Box – is not software through which users can access unauthorized content, as in Grokster, or even a necessary component of accessing unauthorized content, as in Fung [isoHunt],” TickBox writes.

“Defendant offers a computer, onto which users can voluntarily install legitimate or illegitimate software. The product about which Plaintiffs complain is third-party software which can be downloaded onto a myriad of devices, and which Defendant neither created nor supplies.”

From defending itself, TickBox switches track to highlight weaknesses in the studios’ case against users of its TickBox device. The company states that the plaintiffs have not presented any evidence that buyers of the TickBox streaming unit have actually accessed any copyrighted material.

Interestingly, however, the company also notes that even if people had streamed ‘pirate’ content, that might not constitute infringement.

First up, the company notes that there are no allegations that anyone – from TickBox itself to TickBox device owners – ever violated the plaintiffs’ exclusive right to perform its copyrighted works.

TickBox then further argues that copyright law does not impose liability for viewing streaming content, stating that an infringer is one who violates any of the exclusive rights of the copyright holder, in this case, the right to “perform the copyrighted work publicly.”

“Plaintiffs do not allege that Defendant, Defendant’s product, or the users of Defendant’s product ‘transmit or otherwise communicate a performance’ to the public; instead, Plaintiffs allege that users view streaming material on the Box.

“It is clear precedent [Perfect 10 v Google] in this Circuit that merely viewing copyrighted material online, without downloading, copying, or retransmitting such material, is not actionable.”

Taking this argument to its logical conclusion, TickBox insists that if its users aren’t infringing copyright, it’s impossible to argue that TickBox induced its customers to violate the plaintiffs’ rights. In that respect, plaintiffs’ complaints that TickBox failed to develop “filtering tools” to diminish its customers’ infringing activity are moot, since in TickBox’s eyes no infringement took place.

TickBox also argues that unlike in Grokster, where the defendant profited when users’ accessed infringing content, it does not. And, just to underline the earlier point, it claims that its place in the market is not to compete with entertainment companies, it’s actually to compete with devices such as Amazon’s Firestick – another similar Android-powered device.

Finally, TickBox notes that it has zero connection with any third-party sites that transmit copyrighted works in violation of the plaintiffs’ rights.

“Plaintiff has not alleged any element of contributory infringement vis-à-vis these unknown third-parties. Plaintiff has not alleged that Defendant has distributed any product to those third parties, that Defendant has committed any act which encourages those third parties’ infringement, or that any act of Defendant has, in fact, caused those third parties to infringe,” its response adds.

But even given the above defenses, TickBox says that it “voluntarily took steps” to remove links to the allegedly infringing Kodi builds from its device, following the plaintiffs’ lawsuit. It also claims to have modified its advertising and webpage “to attempt to appease Plaintiffs and resolve their complaint amicably.”

Given the above, TickBox says that the plaintiffs’ application for injunction is both vague and overly broad and would impose “imperssible hardship” on the company by effectively shutting it down while requiring it to “hack into and delete content” which TickBox users may have downloaded to their boxes.

TickBox raises some very interesting points around some obvious weaknesses so it will be intriguing to see how the Court handles its claims and what effect that has on the market for these devices in the US. In particular, the thorny issue of how they are advertised and promoted, which is nearly always the final stumbling block.

A copy of Tickbox’s response is available here (pdf), via Variety

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Might Google Class “Torrent” a Dirty Word? France is About to Find Out

Post Syndicated from Andy original https://torrentfreak.com/might-google-class-torrent-a-dirty-word-france-is-about-to-find-out-171223/

Like most countries, France is struggling to find ways to stop online piracy running rampant. A number of options have been tested thus far, with varying results.

One of the more interesting cases has been running since 2015, when music industry group SNEP took Google and Microsoft to court demanding automated filtering of ‘pirate’ search results featuring three local artists.

Before the High Court of Paris, SNEP argued that searches for the artists’ names plus the word “torrent” returned mainly infringing results on Google and Bing. Filtering out results with both sets of terms would reduce the impact of people finding pirate content through search, they said.

While SNEP claimed that its request was in line with Article L336-2 of France’s intellectual property code, which allows for “all appropriate measures” to prevent infringement, both Google and Microsoft fought back, arguing that such filtering would be disproportionate and could restrict freedom of expression.

The Court eventually sided with the search engines, noting that torrent is a common noun that refers to a neutral communication protocol.

“The requested measures are thus tantamount to general monitoring and may block access to lawful websites,” the High Court said.

Despite being told that its demands were too broad, SNEP decided to appeal. The case was heard in November where concerns were expressed over potential false positives.

Since SNEP even wants sites with “torrent” in their URL filtered out via a “fully automated procedures that do not require human intervention”, this very site – TorrentFreak.com – could be sucked in. To counter that eventuality, SNEP proposed some kind of whitelist, NextInpact reports.

With no real consensus on how to move forward, the parties were advised to enter discussions on how to get closer to the aim of reducing piracy but without causing collateral damage. Last week the parties agreed to enter negotiations so the details will now have to be hammered out between their respective law firms. Failing that, they will face a ruling from the court.

If this last scenario plays out, the situation appears to favor the search engines, who have a High Court ruling in their favor and already offer comprehensive takedown tools for copyright holders to combat the exploitation of their content online.

Meanwhile, other elements of the French recording industry have booked a notable success against several pirate sites.

SCPP, which represents Warner, Universal, Sony and thousands of others, went to court in February this year demanding that local ISPs Bouygues, Free, Orange, SFR and Numéricable prevent their subscribers from accessing ExtraTorrent, isoHunt, Torrent9 and Cpasbien.

Like SNEP in the filtering case, SCPP also cited Article L336-2 of France’s intellectual property code, demanding that the sites plus their variants, mirrors and proxies should be blocked by the ISPs so that their subscribers can no longer gain access.

This week the Paris Court of First Instance sided with the industry group, ordering the ISPs to block the sites. The service providers were also told to pick up the bill for costs.

These latest cases are yet more examples of France’s determination to crack down on piracy.

Early December it was revealed that since its inception, nine million piracy warnings have been sent to citizens via the Hadopi anti-piracy agency. Since the launch of its graduated response regime in 2010, more than 2,000 cases have been referred to prosecutors, resulting in 189 criminal convictions.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Google & Facebook Excluded From Aussie Safe Harbor Copyright Amendments

Post Syndicated from Andy original https://torrentfreak.com/google-facebook-excluded-from-aussie-safe-harbor-copyright-amendments-171205/

Due to a supposed drafting error in Australia’s implementation of the Australia – US Free Trade Agreement (AUSFTA), copyright safe harbor provisions currently only apply to commercial Internet service providers.

This means that while local ISPs such as Telstra receive protection from copyright infringement complaints, services such as Google, Facebook and YouTube face legal uncertainty.

Proposed amendments to the Copyright Act earlier this year would’ve seen enhanced safe harbor protections for such platforms but they were withdrawn at the eleventh hour so that the government could consider “further feedback” from interested parties.

Shortly after the government embarked on a detailed consultation with entertainment industry groups. They accuse platforms like YouTube of exploiting safe harbor provisions in the US and Europe, which forces copyright holders into an expensive battle to have infringing content taken down. They do not want that in Australia and at least for now, they appear to have achieved their aims.

According to a report from AFR (paywall), the Australian government is set to introduce new legislation Wednesday which will expand safe harbors for some organizations but will exclude companies such as Google, Facebook, and similar platforms.

Communications Minister Mitch Fifield confirmed the exclusions while noting that additional safeguards will be available to institutions, libraries, and organizations in the disability, archive and culture sectors.

“The measures in the bill will ensure these sectors are protected from legal liability where they can demonstrate that they have taken reasonable steps to deal with copyright infringement by users of their online platforms,” Senator Fifield told AFR.

“Extending the safe harbor scheme in this way will provide greater certainty to institutions in these sectors and enhance their ability to provide more innovative and creative services for all Australians.”

According to the Senator, the government will continue its work with stakeholders to further reform safe harbor provisions, before applying them to other service providers.

The news that Google, Facebook, and similar platforms are to be denied access to the new safe harbor rules will be seen as a victory for rightsholders. They’re desperately trying to tighten up legislation in other regions where such safeguards are already in place, arguing that platforms utilizing user-generated content for profit should obtain appropriate licensing first.

This so-called ‘Value Gap’ (1,2,3) and associated proactive filtering proposals are among the hottest copyright topics right now, generating intense debate across Europe and the United States.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons