Tag Archives: 360

Abandon Proactive Copyright Filters, Huge Coalition Tells EU Heavyweights

Post Syndicated from Andy original https://torrentfreak.com/abandon-proactive-copyright-filters-huge-coalition-tells-eu-heavyweights-171017/

Last September, EU Commission President Jean-Claude Juncker announced plans to modernize copyright law in Europe.

The proposals (pdf) are part of the Digital Single Market reforms, which have been under development for the past several years.

One of the proposals is causing significant concern. Article 13 would require some online service providers to become ‘Internet police’, proactively detecting and filtering allegedly infringing copyright works, uploaded to their platforms by users.

Currently, users are generally able to share whatever they like but should a copyright holder take exception to their upload, mechanisms are available for that content to be taken down. It’s envisioned that proactive filtering, whereby user uploads are routinely scanned and compared to a database of existing protected content, will prevent content becoming available in the first place.

These proposals are of great concern to digital rights groups, who believe that such filters will not only undermine users’ rights but will also place unfair burdens on Internet platforms, many of which will struggle to fund such a program. Yesterday, in the latest wave of opposition to Article 13, a huge coalition of international rights groups came together to underline their concerns.

Headed up by Civil Liberties Union for Europe (Liberties) and European Digital Rights (EDRi), the coalition is formed of dozens of influential groups, including Electronic Frontier Foundation (EFF), Human Rights Watch, Reporters without Borders, and Open Rights Group (ORG), to name just a few.

In an open letter to European Commission President Jean-Claude Juncker, President of the European Parliament Antonio Tajani, President of the European Council Donald Tusk and a string of others, the groups warn that the proposals undermine the trust established between EU member states.

“Fundamental rights, justice and the rule of law are intrinsically linked and constitute
core values on which the EU is founded,” the letter begins.

“Any attempt to disregard these values undermines the mutual trust between member states required for the EU to function. Any such attempt would also undermine the commitments made by the European Union and national governments to their citizens.”

Those citizens, the letter warns, would have their basic rights undermined, should the new proposals be written into EU law.

“Article 13 of the proposal on Copyright in the Digital Single Market include obligations on internet companies that would be impossible to respect without the imposition of excessive restrictions on citizens’ fundamental rights,” it notes.

A major concern is that by placing new obligations on Internet service providers that allow users to upload content – think YouTube, Facebook, Twitter and Instagram – they will be forced to err on the side of caution. Should there be any concern whatsoever that content might be infringing, fair use considerations and exceptions will be abandoned in favor of staying on the right side of the law.

“Article 13 appears to provoke such legal uncertainty that online services will have no other option than to monitor, filter and block EU citizens’ communications if they are to have any chance of staying in business,” the letter warns.

But while the potential problems for service providers and users are numerous, the groups warn that Article 13 could also be illegal since it contradicts case law of the Court of Justice.

According to the E-Commerce Directive, platforms are already required to remove infringing content, once they have been advised it exists. The new proposal, should it go ahead, would force the monitoring of uploads, something which goes against the ‘no general obligation to monitor‘ rules present in the Directive.

“The requirement to install a system for filtering electronic communications has twice been rejected by the Court of Justice, in the cases Scarlet Extended (C70/10) and Netlog/Sabam (C 360/10),” the rights groups warn.

“Therefore, a legislative provision that requires internet companies to install a filtering system would almost certainly be rejected by the Court of Justice because it would contravene the requirement that a fair balance be struck between the right to intellectual property on the one hand, and the freedom to conduct business and the right to freedom of expression, such as to receive or impart information, on the other.”

Specifically, the groups note that the proactive filtering of content would violate freedom of expression set out in Article 11 of the Charter of Fundamental Rights. That being the case, the groups expect national courts to disapply it and the rule to be annulled by the Court of Justice.

The latest protests against Article 13 come in the wake of large-scale objections earlier in the year, voicing similar concerns. However, despite the groups’ fears, they have powerful adversaries, each determined to stop the flood of copyrighted content currently being uploaded to the Internet.

Front and center in support of Article 13 is the music industry and its current hot-topic, the so-called Value Gap(1,2,3). The industry feels that platforms like YouTube are able to avoid paying expensive licensing fees (for music in particular) by exploiting the safe harbor protections of the DMCA and similar legislation.

They believe that proactively filtering uploads would significantly help to diminish this problem, which may very well be the case. But at what cost to the general public and the platforms they rely upon? Citizens and scholars feel that freedoms will be affected and it’s likely the outcry will continue.

The ball is now with the EU, whose members will soon have to make what could be the most important decision in recent copyright history. The rights groups, who are urging for Article 13 to be deleted, are clear where they stand.

The full letter is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

EU Piracy Report Suppression Raises Questions Over Transparency

Post Syndicated from Andy original https://torrentfreak.com/eu-piracy-report-suppression-raises-questions-transparency-170922/

Over the years, copyright holders have made hundreds of statements against piracy, mainly that it risks bringing industries to their knees through widespread and uncontrolled downloading from the Internet.

But while TV shows like Game of Thrones have been downloaded millions of times, the big question (one could argue the only really important question) is whether this activity actually affects sales. After all, if piracy has a massive negative effect on industry, something needs to be done. If it does not, why all the panic?

Quite clearly, the EU Commission wanted to find out the answer to this potential multi-billion dollar question when it made the decision to invest a staggering 360,000 euros in a dedicated study back in January 2014.

With a final title of ‘Estimating displacement rates of copyrighted content in the EU’, the completed study is an intimidating 307 pages deep. Shockingly, until this week, few people even knew it existed because, for reasons unknown, the EU Commission decided not to release it.

However, thanks to the sheer persistence of Member of the European Parliament Julia Reda, the public now has a copy and it contains quite a few interesting conclusions. But first, some background.

The study uses data from 2014 and covers four broad types of content: music,
audio-visual material, books and videogames. Unlike other reports, the study also considered live attendances of music and cinema visits in the key regions of Germany, UK, Spain, France, Poland and Sweden.

On average, 51% of adults and 72% of minors in the EU were found to have illegally downloaded or streamed any form of creative content, with Poland and Spain coming out as the worst offenders. However, here’s the kicker.

“In general, the results do not show robust statistical evidence of displacement of sales by online copyright infringements,” the study notes.

“That does not necessarily mean that piracy has no effect but only that the statistical analysis does not prove with sufficient reliability that there is an effect.”

For a study commissioned by the EU with huge sums of public money, this is a potentially damaging conclusion, not least for the countless industry bodies that lobby day in, day out, for tougher copyright law based on the “fact” that piracy is damaging to sales.

That being said, the study did find that certain sectors can be affected by piracy, notably recent top movies.

“The results show a displacement rate of 40 per cent which means that for every ten recent top films watched illegally, four fewer films are consumed legally,” the study notes.

“People do not watch many recent top films a second time but if it happens, displacement is lower: two legal consumptions are displaced by every ten illegal second views. This suggests that the displacement rate for older films is lower than the 40 per cent for recent top films. All in all, the estimated loss for recent top films is 5 per cent of current sales volumes.”

But while there is some negative effect on the movie industry, others can benefit. The study found that piracy had a slightly positive effect on the videogames industry, suggesting that those who play pirate games eventually become buyers of official content.

On top of displacement rates, the study also looked at the public’s willingness to pay for content, to assess whether price influences pirate consumption. Interestingly, the industry that had the most displaced sales – the movie industry – had the greatest number of people unhappy with its pricing model.

“Overall, the analysis indicates that for films and TV-series current prices are higher than 80 per cent of the illegal downloaders and streamers are willing to pay,” the study notes.

For other industries, where sales were not found to have been displaced or were positively affected by piracy, consumer satisfaction with pricing was greatest.

“For books, music and games, prices are at a level broadly corresponding to the
willingness to pay of illegal downloaders and streamers. This suggests that a
decrease in the price level would not change piracy rates for books, music and
games but that prices can have an effect on displacement rates for films and
TV-series,” the study concludes.

So, it appears that products that are priced fairly do not suffer significant displacement from piracy. Those that are priced too high, on the other hand, can expect to lose some sales.

Now that it’s been released, the findings of the study should help to paint a more comprehensive picture of the infringement climate in the EU, while laying to rest some of the wild claims of the copyright lobby. That being said, it shouldn’t have taken the toils of Julia Reda to bring them to light.

“This study may have remained buried in a drawer for several more years to come if it weren’t for an access to documents request I filed under the European Union’s Freedom of Information law on July 27, 2017, after having become aware of the public tender for this study dating back to 2013,” Reda explains.

“I would like to invite the Commission to become a provider of more solid and timely evidence to the copyright debate. Such data that is valuable both financially and in terms of its applicability should be available to everyone when it is financed by the European Union – it should not be gathering dust on a shelf until someone actively requests it.”

The full study can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

No, Google Drive is Definitely Not The New Pirate Bay

Post Syndicated from Andy original https://torrentfreak.com/no-google-drive-is-definitely-not-the-new-pirate-bay-170910/

Running close to two decades old, the world of true mainstream file-sharing is less of a mystery to the general public than it’s ever been.

Most people now understand the concept of shifting files from one place to another, and a significant majority will be aware of the opportunities to do so with infringing content.

Unsurprisingly, this is a major thorn in the side of rightsholders all over the world, who have been scrambling since the turn of the century in a considerable effort to stem the tide. The results of their work have varied, with some sectors hit harder than others.

One area that has taken a bit of a battering recently involves the dominant peer-to-peer platforms reliant on underlying BitTorrent transfers. Several large-scale sites have shut down recently, not least KickassTorrents, Torrentz, and ExtraTorrent, raising questions of what bad news may arrive next for inhabitants of Torrent Land.

Of course, like any other Internet-related activity, sharing has continued to evolve over the years, with streaming and cloud-hosting now a major hit with consumers. In the main, sites which skirt the borders of legality have been the major hosting and streaming players over the years, but more recently it’s become clear that even the most legitimate companies can become unwittingly involved in the piracy scene.

As reported here on TF back in 2014 and again several times this year (1,2,3), cloud-hosting services operated by Google, including Google Drive, are being used to store and distribute pirate content.

That news was echoed again this week, with a report on Gadgets360 reiterating that Google Drive is still being used for movie piracy. What followed were a string of follow up reports, some of which declared Google’s service to be ‘The New Pirate Bay.’

No. Just no.

While it’s always tempting for publications to squeeze a reference to The Pirate Bay into a piracy article due to the site’s popularity, it’s particularly out of place in this comparison. In no way, shape, or form can a centralized store of data like Google Drive ever replace the underlying technology of sites like The Pirate Bay.

While the casual pirate might love the idea of streaming a movie with a couple of clicks to a browser of his or her choice, the weakness of the cloud system cannot be understated. To begin with, anything hosted by Google is vulnerable to immediate takedown on demand, usually within a matter of hours.

“Google Drive has a variety of piracy counter-measures in place,” a spokesperson told Mashable this week, “and we are continuously working to improve our protections to prevent piracy across all of our products.”

When will we ever hear anything like that from The Pirate Bay? Answer: When hell freezes over. But it’s not just compliance with takedown requests that make Google Drive-hosted files vulnerable.

At the point Google Drive responds to a takedown request, it takes down the actual file. On the other hand, even if Pirate Bay responded to notices (which it doesn’t), it would be unable to do anything about the sharing going on underneath. Removing a torrent file or magnet link from TPB does nothing to negatively affect the decentralized swarm of people sharing files among themselves. Those files stay intact and sharing continues, no matter what happens to the links above.

Importantly, people sharing using BitTorrent do so without any need for central servers – the whole process is decentralized as long as a user can lay his or her hands on a torrent file or magnet link. Those using Google Drive, however, rely on a totally centralized system, where not only is Google king, but it can and will stop the entire party after receiving a few lines of text from a rightsholder.

There is a very good reason why sites like The Pirate Bay have been around for close to 15 years while platforms such as Megaupload, Hotfile, Rapidshare, and similar platforms have all met their makers. File-hosting platforms are expensive-to-run warehouses full of files, each of which brings direct liability for their hosts, once they’re made aware that those files are infringing. These days the choice is clear – take the files down or get brought down, it’s as simple as that.

The Pirate Bay, on the other hand, is nothing more than a treasure map (albeit a valuable one) that points the way to content spread all around the globe in the most decentralized way possible. There are no files to delete, no content to disappear. Comparing a vulnerable Google Drive to this kind of robust system couldn’t be further from the mark.

That being said, this is the way things are going. The cloud, it seems, is here to stay in all its forms. Everyone has access to it and uploading content is easier – much easier – than uploading it to a BitTorrent network. A Google Drive upload is simplicity itself for anyone with a mouse and a file; the same cannot be said about The Pirate Bay.

For this reason alone, platforms like Google Drive and the many dozens of others offering a similar service will continue to become havens for pirated content, until the next big round of legislative change. At the moment, each piece of content has to be removed individually but in the future, it’s possible that pre-emptive filters will kill uploads of pirated content before they see the light of day.

When this comes to pass, millions of people will understand why Google Drive, with its bots checking every file upload for alleged infringement, is not The Pirate Bay. At this point, if people have left it too long, it might be too late to reinvigorate BitTorrent networks to their former glory.

People will try to rebuild them, of course, but realizing why they shouldn’t have been left behind at all is probably the best protection.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

Awesome Raspberry Pi cases to 3D print at home

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/3d-printed-raspberry-pi-cases/

Unless you’re planning to fit your Raspberry Pi inside a build, you may find yourself in need of a case to protect it from dust, damage and/or the occasional pet attack. Here are some of our favourite 3D-printed cases, for which files are available online so you can recreate them at home.

TARDIS

TARDIS Raspberry PI 3 case – 3D Printing Time lapse

Every Tuesday we’ll 3D print designs from the community and showcase slicer settings, use cases and of course, Time-lapses! This week: TARDIS Raspberry PI 3 case By: https://www.thingiverse.com/Jason3030 https://www.thingiverse.com/thing:2430122/ BCN3D Sigma Blue PLA 3hrs 20min X:73 Y:73 Z:165mm .4mm layer / .6mm nozzle 0% Infill / 4mm retract 230C / 0C 114G 60mm/s —————————————– Shop for parts for your own DIY projects http://adafru.it/3dprinting Download Autodesk Fusion 360 – 1 Year Free License (renew it after that for more free use!)

Since I am an avid Whovian, it’s not surprising that this case made its way onto the list. Its outside is aesthetically pleasing to the aspiring Time Lord, and it snugly fits your treasured Pi.



Pop this case on your desk and chuckle with glee every time someone asks what’s inside it:

Person: What’s that?
You: My Raspberry Pi.
Person: What’s a Raspberry Pi?
You: It’s a computer!
Person: There’s a whole computer in that tiny case?
You: Yes…it’s BIGGER ON THE INSIDE!

I’ll get my coat.

Pi crust

Yes, we all wish we’d thought of it first. What better case for a Raspberry Pi than a pie crust?

3D-printed Raspberry Pi cases

While the case is designed to fit the Raspberry Pi Model B, you will be able to upgrade the build to accommodate newer models with a few tweaks.



Just make sure that if you do, you credit Marco Valenzuela, its original baker.

Consoles

Since many people use the Raspberry Pi to run RetroPie, there is a growing trend of 3D-printed console-style Pi cases.

3D-printed Raspberry Pi cases

So why not pop your Raspberry Pi into a case made to look like your favourite vintage console, such as the Nintendo NES or N64?



You could also use an adapter to fit a Raspberry Pi Zero within an actual Atari cartridge, or go modern and print a PlayStation 4 case!

Functional

Maybe you’re looking to use your Raspberry Pi as a component of a larger project, such as a home automation system, learning suite, or makerspace. In that case you may need to attach it to a wall, under a desk, or behind a monitor.

3D-printed Raspberry Pi cases

Coo! Coo!

The Pidgeon, shown above, allows you to turn your Zero W into a surveillance camera, while the piPad lets you keep a breadboard attached for easy access to your Pi’s GPIO pins.



Functional cases with added brackets are great for incorporating your Pi on the sly. The VESA mount case will allow you to attach your Pi to any VESA-compatible monitor, and the Fallout 4 Terminal is just really cool.

Cute

You might want your case to just look cute, especially if it’s going to sit in full view on your desk or shelf.

3D-printed Raspberry Pi cases

The tired cube above is the only one of our featured 3D prints for which you have to buy the files ($1.30), but its adorable face begged to be shared anyway.



If you’d rather save your money for another day, you may want to check out this adorable monster from Adafruit. Be aware that this case will also need some altering to fit newer versions of the Pi.

Our cases

Finally, there are great options for you if you don’t have access to a 3D printer, or if you would like to help the Raspberry Pi Foundation’s mission. You can buy one of the official Raspberry Pi cases for the Raspberry Pi 3 and Raspberry Pi Zero (and Zero W)!

3D-printed Raspberry Pi cases



As with all official Raspberry Pi accessories (and with the Pi itself), your money goes toward helping the Foundation to put the power of digital making into the hands of people all over the world.

3D-printed Raspberry Pi cases

You could also print a replica of the official Astro Pi cases, in which two Pis are currently orbiting the earth on the International Space Station.

Design your own Raspberry Pi case!

If you’ve built a case for your Raspberry Pi, be it with a 3D printer, laser-cutter, or your bare hands, make sure to share it with us in the comments below, or via our social media channels.

And if you’d like to give 3D printing a go, there are plenty of free online learning resources, and sites that offer tutorials and software to get you started, such as TinkerCAD, Instructables, and Adafruit.

The post Awesome Raspberry Pi cases to 3D print at home appeared first on Raspberry Pi.

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the

s3discovery.properties

file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the

s3discovery.properties

file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:<your access key id here>
credentials-secret-access-key:<your secret access key here>

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the s3discovery.properties file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the s3discovery.properties file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:
credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Bitcoin, UASF… и политиката

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2064

Напоследък се заговори из Нета за UASF при Bitcoin. Надали обаче много хора са обърнали внимание на тия акроними. (Обикновено статиите по въпроса на свой ред са салата от други акроними, което също не улеснява разбирането им.) Какво, по дяволите, значи това? И важно ли е?

Всъщност не е особено важно, освен за хора, които сериозно се занимават с криптовалути. Останалите спокойно могат да не му обръщат внимание.

Поне на пръв поглед. Защото дава и сериозно разбиране за ефективността на някои фундаментални политически понятия. Затова смятам да му посветя тук част от времето си – и да изгубя част от вашето.

1. Проблемите на Bitcoin

Електронна валута, която се контролира не от политикани и меринджеи, а от строги правила – мечта, нали? Край на страховете, че поредният популист ще отвори печатницата за пари и ще превърне спестяванията ви в шарена тоалетна хартия… Но идеи без проблеми няма (за реализациите им да не говорим). Така е и с Bitcoin.

Всички транзакции в биткойни се записват в блокове, които образуват верига – така нареченият блокчейн. По този начин всяка стотинка (пардон, сатоши 🙂 ) може да бъде проследена до самото ѝ създаване. Адресите, между които се обменят парите, са анонимни, но самите обмени са публични и явни. Може да ги проследи и провери за валидност всеки, които има нужния софтуер (достъпен свободно) и поддържа „пълен възел“ (full node), тоест е склонен да отдели стотина гигабайта на диска си.

Проблемът е, че блокът на Bitcoin има фиксиран максимален размер – до 1 мегабайт. Той побира максимум 2-3 хиляди транзакции. При 6 блока на час това означава около 15 000 транзакции на час, или около 360 000 на денонощие. Звучи много, но всъщност е абсолютно недостатъчно – доста големи банки правят по повече транзакции на секунда. Та, от известно време насам нуждата от транзакции надхвърля капацитета на блокчейна. Което създава проблем за потребителите на валутата. Някои от тях започват да я изоставят и да се насочват към традиционни валути, или към други криптовалути. Съответно, влиянието и ролята ѝ спада.

2. Положението с решенията

Предлагани са немалко решения на този проблем. Последното се нарича SegWit (segregated witness). Срещу всички тях (и конкретно срещу това) обаче има сериозна съпротива от ключови фактори в Bitcoin.

Сравнително скоро след създаването на Bitcoin в него беше въведено правилото, че транзакциите са платени. (Иначе беше много лесно да бъдат генерирани огромен брой транзакции за минимална сума напред-назад, и така да бъде задръстен блокчейнът.) Всяка транзакция указва колко ще плати за включването си в блок. (Това е, което я „узаконява“.)

Кои транзакции от чакащите реда си ще включи в блок решава този, който създава блока. Това е „копачът“, който е решил целта от предишния блок. Той прибира заплащането за включените транзакции, освен стандартната „награда“ за блока. Затова копачите имат изгода транзакциите да са колкото се може по-скъпи – тоест, капацитетът на блокчейна да е недостатъчен.

В добавка, немалко копачи използват „хак“ в технологията на системата – така нареченият ASICBOOST. Едно от предимствата на SegWit е, че пречи на подобни хакове – тоест, на тези „копачи“. (Подробности можете да намерите тук.)

Резултатът е, че някои копачи се съпротивляват на въвеждането на SegWit. А „копаещата мощност“ е, която служи като „демократичен глас“ в системата на Bitcoin. Вече е правен опит да се въведе SegWit, който не сполучи. За да е по-добър консенсусът, този опит изискваше SegWit да се приеме когато 95% от копаещата мощност го подкрепи. Скоро стана ясно, че това няма да се случи.

3. UASF? WTF? (Демек, кво е тва UASF?)

Не зная колко точно е процентът на отхвърлящите SegWit копачи. Но към момента копаенето е централизирано до степен да се върши почти всичкото от малък брой мощни компании. Напълно е възможно отхвърлящите SegWit да са над 50% от копаещата мощност. Ако е така, въвеждането на SegWit чрез подкрепа от нея би било невъзможно. (Разбира се, това ще значи в близко бъдеще упадъка на Bitcoin и превръщането му от „царя на криптовалутите“ в евтин музеен експонат. В крайна сметка тези копачи ще са си изкопали гроба. Но ако има на света нещо, на което може да се разчита винаги и докрай, това е човешката глупост.)

За да се избегне такъв сценарий, девелоперите от Bitcoin Core Team предложиха т.нар. User-Activated Soft Fork, съкратено UASF. Същността му е, че от 1 август нататък възлите в мрежата на Bitcoin, които подкрепят SegWit, ще започнат да смятат блокове, които не потвърждават че го поддържат, за невалидни.

Отхвърлящите SegWit копачи могат да продължат да си копаят по старому. Поддържащите го ще продължат по новому. Съответно блокчейнът на Bitcoin от този момент нататък ще се раздели на два – клон без SegWit и клон с него.

4. Какъв ще е резултатът?

Преобладаващата копаеща мощност може да се окаже в първия – тоест, по правилата на Сатоши Накамото той ще е основният. Но ако мрежата е разделена на две, всяка ще има своя основен клон, така че няма да бъдат технически обединени. Ще има две различни валути на име Bitcoin, и всяка ще претендира, че е основната.

Как ще се разреши този спор? Потребителите на Bitcoin търсят по-ниски цени за транзакции, така че огромният процент от тях бързо ще се ориентират към веригата със SegWit. А ценността и приетостта на Bitcoin се дължи просто на факта, че хората го приемат и са склонни да го използват. Затова и Segwit-натият Bitcoin ще запази ролята (и цената) на оригиналния Bitcoin, докато този без SegWit ще поевтинее и ще загуби повечето от релевантността си.

(Всъщност, подобно „разцепление“ вече се е случвало с No. 2 в света на криптовалутите – Ethereum. Затова има Ethereum и Ethereum Classic. Вторите изгубиха борбата да са наследникът на оригиналния Ethereum, но продължава да ги има, макар и да са с много по-малка роля и цена.)

Отхвърлилите SegWit копачи скоро ще се окажат в положение да копаят нещо, което струва жълти стотинки. Затова вероятно те шумно или тихо ще преминат към поддръжка на SegWit. Не бих се учудил дори доста от тях да го направят още на 1 август. (Въпреки че някои сигурно ще продължат да опищяват света колко лошо е решението и какви загуби понасят от него. Може да има дори съдебни процеси… Подробностите ще ги видим.)

5. Политиката

Ако сте издържали дотук, четете внимателно – същността на този запис е в тази част.

Наскоро си говорих с горда випускничка на български икономически ВУЗ. Изслушах обяснение как икономията от мащаба не съществува и е точно обратното. Как малките фирми са по-ефективни от големите и т.н…

Нищо чудно, че ги учат на глупости. Който плаща, дори зад сцената, той поръчва музиката. Странно ми е, че обучаваните вярват на тези глупости при положение, че реалността е пред очите им. И че в нея големите фирми разоряват и/или купуват малките, а не обратното. Няма как да е иначе. Както законите на Нютон важат еднакво за лабораторни тежести и за търговски контейнери, така и дисипативните закони важат еднакво за тенджери с вода и за икономически системи.

В ИТ бизнеса динамиката е много над средната. Където не е и няма как да бъде регулиран лесно, където нещата са по-laissez-faire, както е примерно в копаенето на биткойни, е още по-голяма. Нищо чудно, че копаенето премина толкова бързо от милиони индивидуални участници към малък брой лесно картелиращи се тиранозаври. Всяка система еволюира вътрешно в такава посока… Затова „перфектна система“ и „щастие завинаги“ няма как да съществуват. Затова, ако щете, свободата трябва да се замесва и изпича всеки ден.

„Преобладаващата копаеща мощност“, било като преобладаващият брой индивиди във вида, било като основната маса пари, било като управление на най-популярните сред гласоподавателите мемове, лесно може да се съсредоточи в тесен кръг ръце. И законите на вътрешната еволюция на системите, като конкретно изражение на дисипативните закони, водят именно натам… Тогава всяко гласуване започва да подкрепя статуквото. Демокрацията престава да бъде възможност за промяна – такава остава само разделянето на възгледите в отделни системи. Единствено тогава новото получава възможност реално да конкурира старото.

Затова и всеки биологичен вид наоколо е започнал някога като миниатюрна различна клонка от могъщото тогава стъбло на друг вид. Който днес познават само палеобиолозите. И всяка могъща банка, или производствена или медийна фирма е започнала – като сума пари, или производствен капацитет, или интелектуална собственост – като обикновена будка за заеми, или работилничка, или ателие. В сянката на тогавашните тиранозаври, помнени днес само от историците. Намерили начин да се отделят и скрият някак от тях, за да съберат мощта да ги конкурират…

Който разбрал – разбрал.

Perl 5.26.0 released

Post Syndicated from corbet original https://lwn.net/Articles/724363/rss

The Perl 5.26.0 release is out. “Perl 5.26.0 represents approximately 13 months of development since Perl
5.24.0 and contains approximately 360,000 lines of changes across 2,600
files from 86 authors
“. See this
page
for a list of changes in this release; new features include
indented here-documents, the ability to declare references to variables,
Unicode 9.0 support, and the removal of the current directory
(“.“) from @INC by default.

Product or Project?

Post Syndicated from Matt Richardson original https://www.raspberrypi.org/blog/product-or-project/

This column is from The MagPi issue 57. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.

Image of MagPi magazine and AIY Project Kit

Taking inspiration from a widely known inspirational phrase, I like to tell people, “make the thing you wish to see in the world.” In other words, you don’t have to wait for a company to create the exact product you want. You can be a maker as well as a consumer! Prototyping with hardware has become easier and more affordable, empowering people to make products that suit their needs perfectly. And the people making these things aren’t necessarily electrical engineers, computer scientists, or product designers. They’re not even necessarily adults. They’re often self-taught hobbyists who are empowered by maker-friendly technology.

It’s a subject I’ve been very interested in, and I have written about it before. Here’s what I’ve noticed: the flow between maker project and consumer product moves in both directions. In other words, consumer products can start off as maker projects. Just take a look at the story behind many of the crowdfunded products on sites such as Kickstarter. Conversely, consumer products can evolve into maker products as well. The cover story for the latest issue of The MagPi is a perfect example of that. Google has given you the resources you need to build your own dedicated Google Assistant device. How cool is that?

David Pride on Twitter

@Raspberry_Pi @TheMagP1 Oh this is going to be a ridiculous amount of fun. 😊 #AIYProjects #woodchuck https://t.co/2sWYmpi6T1

But consumer products becoming hackable hardware isn’t always an intentional move by the product’s maker. In the 2000s, TiVo set-top DVRs were a hot product and their most enthusiastic fans figured out how to hack the product to customise it to meet their needs without any kind of support from TiVo.

Embracing change

But since then, things have changed. For example, when Microsoft’s Kinect for the Xbox 360 was released in 2010, makers were immediately enticed by its capabilities. It not only acted as a camera, but it could also sense depth, a feature that would be useful for identifying the position of objects in a space. At first, there was no hacker support from Microsoft, so Adafruit Industries announced a $3,000 bounty to create open-source drivers so that anyone could access the features of Kinect for their own projects. Since then, Microsoft has embraced the use of Kinect for these purposes.

The Create 2 from iRobot

iRobot’s Create 2, a hackable version of the Roomba

Consumer product companies even make versions of their products that are specifically meant for hacking, making, and learning. Belkin’s WeMo home automation product line includes the WeMo Maker, a device that can act as a remote relay or sensor and hook into your home automation system. And iRobot offers Create 2, a hackable version of its Roomba floor-cleaning robot. While iRobot aimed the robot at STEM educators, you could use it for personal projects too. Electronic instrument maker Korg takes its maker-friendly approach to the next level by releasing the schematics for some of its analogue synthesiser products.

Why would a company want to do this? There are a few possible reasons. For one, it’s a way of encouraging consumers to create a community around a product. It could be a way for innovation with the product to continue, unchecked by the firm’s own limits on resources. For certain, it’s an awesome feel-good way for a company to empower their own users. Whatever the reason these products exist, it’s the digital maker who comes out ahead. They have more affordable tools, materials, and resources to create their own customised products and possibly learn a thing or two along the way.

With maker-friendly, hackable products, being a creator and a consumer aren’t mutually exclusive. In fact, you’re probably getting the best of both worlds: great products and great opportunities to make the thing you wish to see in the world.

The post Product or Project? appeared first on Raspberry Pi.

Tinkernut’s do-it-yourself Pi Zero audio HAT

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/tinkernut-diy-pi-zero-audio/

Why buy a Raspberry Pi Zero audio HAT when Tinkernut can show you how to make your own?

Adding Audio Output To The Raspberry Pi Zero – Tinkernut Workbench

The Raspberry Pi Zero W is an amazing miniature computer piece of technology. I want to turn it into an epic portable Spotify radio that displays visuals such as Album Art. So in this new series called “Tinkernut Workbench”, I show you step by step what it takes to build a product from the ground up.

Raspberry Pi Zero audio

Unlike their grown-up siblings, the Pi Zero and Zero W lack an onboard audio jack, but that doesn’t stop you from using them to run an audio output. Various audio HATs exist on the market, from Adafruit, Pimoroni and Pi Supply to name a few, providing easy audio output for the Zero. But where would the fun be in a Tinkernut video that shows you how to attach a HAT?

Tinkernut Pi Zero Audio

“Take this audio HAT, press it onto the header pins and, errr, done? So … how was your day?”

DIY Audio: Tinkernut style

For the first video in his Hipster Spotify Radio using a Raspberry Pi Tinkernut Workbench series, Tinkernut – real name Daniel Davis – goes through the steps of researching, prototyping and finishing his own audio HAT for his newly acquired Raspberry Pi Zero W.

The build utilises the GPIO pins on the Zero W, specifically pins #18 and #13. FYI, this hidden gem of information comes from the Adafruit Pi Zero PWM Audio guide. Before he can use #18 and #13, header pins need to be soldered. If the thought of soldering pins to the Pi is somewhat daunting, check out the Pimoroni Hammer Header.

Pimoroni Hammer Header for Raspberry Pi

You’re welcome.

Once complete, with Raspbian installed on the micro SD, and SSH enabled for remote access, he’s ready to start prototyping.

Ingredients

Tinkernut uses two 270 ohm resistors, two 150 ohm resistors, two 10μf electrolytic capacitors, two 0.01 μf polyester film capacitors, an audio jack and some wire. You’ll also need a breadboard for prototyping. For the final build, you’ll need a single row female pin header and some prototyping board, if you want to join in at home.

Tinkernut audio board Raspberry Pi Zero W

It should look like this…hopefully.

Once the prototype is working to run audio through to a cheap speaker (thanks to an edit of the config.txt file), the final board can be finished.

What’s next?

The audio board is just one step in the build.

Spotify is such an awesome music service. Raspberry Pi Zero is such an awesome ultra-mini computing device. Obviously, combining the two is something I must do!!! The idea here is to make something that’s stylish, portable, can play Spotify, and hopefully also display visuals such as album art.

Subscribe to Tinkernut’s YouTube channel to keep up to date with the build, and check out some of his other Raspberry Pi builds, such as his cheap 360 video camera, security camera and digital vintage camera.

Have you made your own Raspberry Pi HAT? Show it off in the comments below!

The post Tinkernut’s do-it-yourself Pi Zero audio HAT appeared first on Raspberry Pi.

Introducing DnsControl – “DNS as Code” has Arrived

Post Syndicated from Craig Peterson original http://blog.serverfault.com/2017/04/11/introducing-dnscontrol-dns-as-code-has-arrived/

DNS at Stack Overflow is… complex.  We have hundreds of DNS domains and thousands of DNS records. We have gone from running our own BIND server to hosting DNS with multiple cloud providers, and we change things fairly often. Keeping everything up to date and synced at multiple DNS providers is difficult. We built DnsControl to allow us to perform updates easily and automatically across all providers we use.

The old way

Originally, our DNS was hosted by our own BIND servers, using artisanal, hand crafted zone files. Large changes involved liberal sed usage, and every change was pretty error prone. We decided to start using cloud DNS providers for performance reasons, but those each have their own web panels, which are universally painful to use. Web interfaces rarely have any import/export functionality, and generally lack change control, history tracking, or comments. We quickly decided that web panels were not how we wanted to manage our zones. 

Introducing DnsControl

DNSControl is the system we built to manage our DNS. It permits “describe once, use anywhere” DNS management. It consists of a few key components:

  1. A Domain Specific Language (DSL) for describing domains in a single, provider-independent way.
  2. An “interpreter” application that executes the DSL and creates a standardized representation of your desired DNS state.
  3. Back-end “providers” that sync the desired state to a DNS provider.

At the time of this writing we have 9 different providers implemented, with 3 more on the way shortly. We use it to manage our domains with our own BIND servers, as well as Route 53, Google Cloud DNS, name.com, Cloudflare, and more.

A sample might look like this description of stackoverflow.com:

D(“stackoverflow.com”, REG_NAMEDOTCOM, DnsProvider(R53), DnsProvider(GCLOUD),
    A(“@”, “198.252.206.16”),
    A(“blog”, “198.252.206.20”),
    CNAME(“chat”, “chat.stackexchange.com.”),
    CNAME(“www”, “@”, TTL(3600)),
    A(“meta”, “198.252.206.16”)
)

This is just a small, simple example. The DSL is a fully-featured way to express your DNS config. It is actually just javascript with some helpful functions. We have an examples page with more examples of the power of the language.

Running “dnscontrol preview” with this input will show what updates would be needed to bring DNS providers up to date with the new, desired, configuration. “dnscontrol push” will actually make the changes.

This allows us to manage our DNS configuration as code. Storing it this way has a bunch of advantages:

  • We can use variables to store common IP addresses or repeated data. We can make complicated changes, like failing-over services between data centers, by changing a single variable. We can activate or deactivate our CDN, which involves thousands of record changes, by commenting or uncommenting a single line of code.
  • We are not locked into any single provider, since the automation can sync to any of them. Keeping records synchronized between different cloud providers requires no manual steps.
  • We store our DNS config in git. Our build server runs all changes. We have central logging, access control, and history for our DNS changes. We’re trying to apply DevOps best practices to an area that has not seen those benefits so much yet.

I think the biggest benefit to this tool though is the freedom it has given us with our DNS.  It has allowed us to:

  • Switch providers with no fear of breaking things. We have changed CDNs or DNS providers at least 4 times in the last two years, and it has never been scary at all.
  • Dual-host our DNS with multiple providers simultaneously. The tool keeps them in sync for us.
  • Test fail-over procedures before an emergency happens. We are confident we can point DNS at our secondary datacenter easily, and we can quickly switch providers if one is being DDOSed.

DNS configuration is often difficult and error-prone.  We hope DnsControl makes it easy and more reliable. It has for us.

Some resources:

Online Piracy Can Boost Comic Book Sales, Research Finds

Post Syndicated from Ernesto original https://torrentfreak.com/online-piracy-can-boost-comic-book-sales-research-finds/

yenResearch into online piracy comes in all shapes and sizes, with equally mixed results. Often the main question is whether piracy hurts legitimate revenue streams.

In recent years we have seen a plethora of studies and most are focused on the effects on movies, TV-shows and music revenues. But what about comic books?

Manga in particular has traditionally been very popular on file-sharing networks and sites. These are dozens of large sites dedicated to the comics, which are downloaded in their millions.

According to the anti-piracy group CODA, which represents Japanese comic publishers, piracy losses overseas are estimated to be double the size of overseas legal revenue.

With this in mind, Professor Tatsuo Tanaka of the Faculty of Economics at Keio University decided to look more closely at how piracy interacts with legal sales. In a natural experiment, he examined how the availability of pirated comic books affected revenue.

The research uses a massive takedown campaign conducted by CODA in 2015, which directly impacted the availability of many pirated comics on various download sites, to see how this affected sales of 3,360 comic book volumes.

Interestingly, the results show that decreased availability of pirated comics doesn’t always help sales. In fact, for comics that no longer release new volumes, the effect is reversed.

“Piracy decreases sales of ongoing comics, but it increases sales of completed comics,” Professor Tanaka writes.

“To put this another way, displacement effect is dominant for ongoing comics, and advertisement effect is dominant for completed comics,” he adds.

For these finished comic seasons, the promotional element weighs heavier. According to the Professor, this suggests that piracy can effectively be seen as a form of advertising.

“Since completed comics series have already ended, and publishers no longer do any promotion for them, consumers almost forget completed comics. We can interpret that piracy reminds consumers of past comics and stimulates sales.”

The question that remains is whether the overall effect on the industry is positive or negative. The current study provided no answer to this effect, as it’s unknown how big the sales share is for ongoing versus completed comics, but future research could look into this.

Professor Tanaka stresses that there is an important policy implication of his findings. Since piracy doesn’t affect all sales the same (it’s heterogeneous), anti-piracy strategies may have to be adapted.

“If the effect of piracy is heterogeneous, it is not the best solution to shut down the piracy sites but to delete harmful piracy files selectively if possible,” Professor Tanaka adds

“In this case, deleting piracy files of ongoing comics only is the first best strategy for publishers regardless of whether the total effect is positive or negative, because the availability of piracy files of completed comics is beneficial to both publishers and consumers,” he adds.

The research shows once again that piracy is a complex phenomenon that can have a positive or negative impact depending on the context. This isn’t limited to comics of course, as previous studies have shown similar effects in the movie and music industries.

The full paper titled The Effects of Internet Book Piracy: The Case of Japanese Comics is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Implementing Serverless Manual Approval Steps in AWS Step Functions and Amazon API Gateway

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/


Ali Baghani, Software Development Engineer

A common use case for AWS Step Functions is a task that requires human intervention (for example, an approval process). Step Functions makes it easy to coordinate the components of distributed applications as a series of steps in a visual workflow called a state machine. You can quickly build and run state machines to execute the steps of your application in a reliable and scalable fashion.

In this post, I describe a serverless design pattern for implementing manual approval steps. You can use a Step Functions activity task to generate a unique token that can be returned later indicating either approval or rejection by the person making the decision.

Key steps to implementation

When the execution of a Step Functions state machine reaches an activity task state, Step Functions schedules the activity and waits for an activity worker. An activity worker is an application that polls for activity tasks by calling GetActivityTask. When the worker successfully calls the API action, the activity is vended to that worker as a JSON blob that includes a token for callback.

At this point, the activity task state and the branch of the execution that contains the state is paused. Unless a timeout is specified in the state machine definition, which can be up to one year, the activity task state waits until the activity worker calls either SendTaskSuccess or SendTaskFailure using the vended token. This pause is the first key to implementing a manual approval step.

The second key is the ability in a serverless environment to separate the code that fetches the work and acquires the token from the code that responds with the completion status and sends the token back, as long as the token can be shared, i.e., the activity worker in this example is a serverless application supervised by a single activity task state.

In this walkthrough, you use a short-lived AWS Lambda function invoked on a schedule to implement the activity worker, which acquires the token associated with the approval step, and prepares and sends an email to the approver using Amazon SES.

It is very convenient if the application that returns the token can directly call the SendTaskSuccess and SendTaskFailure API actions on Step Functions. This can be achieved more easily by exposing these two actions through Amazon API Gateway so that an email client or web browser can return the token to Step Functions. By combining a Lambda function that acquires the token with the application that returns the token through API Gateway, you can implement a serverless manual approval step, as shown below.

In this pattern, when the execution reaches a state that requires manual approval, the Lambda function prepares and sends an email to the user with two embedded hyperlinks for approval and rejection.

If the authorized user clicks on the approval hyperlink, the state succeeds. If the authorized user clicks on the rejection link, the state fails. You can also choose to set a timeout for approval and, upon timeout, take action, such as resending the email request using retry/catch conditions in the activity task state.

Employee promotion process

As an example pattern use case, you can design a simple employee promotion process which involves a single task: getting a manager’s approval through email. When an employee is nominated for promotion, a new execution starts. The name of the employee and the email address of the employee’s manager are provided to the execution.

You’ll use the design pattern to implement the manual approval step, and SES to send the email to the manager. After acquiring the task token, the Lambda function generates and sends an email to the manager with embedded hyperlinks to URIs hosted by API Gateway.

In this example, I have administrative access to my account, so that I can create IAM roles. Moreover, I have already registered my email address with SES, so that I can send emails with the address as the sender/recipient. For detailed instructions, see Send an Email with Amazon SES.

Here is a list of what you do:

  1. Create an activity
  2. Create a state machine
  3. Create and deploy an API
  4. Create an activity worker Lambda function
  5. Test that the process works

Create an activity

In the Step Functions console, choose Tasks and create an activity called ManualStep.

stepfunctionsfirst_1.png

Remember to keep the ARN of this activity at hand.

stepfunctionsfirst_2.png

Create a state machine

Next, create the state machine that models the promotion process on the Step Functions console. Use StatesExecutionRole-us-east-1, the default role created by the console. Name the state machine PromotionApproval, and use the following code. Remember to replace the value for Resource with your activity ARN.

{
  "Comment": "Employee promotion process!",
  "StartAt": "ManualApproval",
  "States": {
    "ManualApproval": {
      "Type": "Task",
      "Resource": "arn:aws:states:us-east-1:ACCOUNT_ID:activity:ManualStep",
      "TimeoutSeconds": 3600,
      "End": true
    }
  }
}

Create and deploy an API

Next, create and deploy public URIs for calling the SendTaskSuccess or SendTaskFailure API action using API Gateway.

First, navigate to the IAM console and create the role that API Gateway can use to call Step Functions. Name the role APIGatewayToStepFunctions, choose Amazon API Gateway as the role type, and create the role.

After the role has been created, attach the managed policy AWSStepFunctionsFullAccess to it.

stepfunctionsfirst_3.png

In the API Gateway console, create a new API called StepFunctionsAPI. Create two new resources under the root (/) called succeed and fail, and for each resource, create a GET method.

stepfunctionsfirst_4.png

You now need to configure each method. Start by the /fail GET method and configure it with the following values:

  • For Integration type, choose AWS Service.
  • For AWS Service, choose Step Functions.
  • For HTTP method, choose POST.
  • For Region, choose your region of interest instead of us-east-1. (For a list of regions where Step Functions is available, see AWS Region Table.)
  • For Action Type, enter SendTaskFailure.
  • For Execution, enter the APIGatewayToStepFunctions role ARN.

stepfunctionsfirst_5.png

To be able to pass the taskToken through the URI, navigate to the Method Request section, and add a URL Query String parameter called taskToken.

stepfunctionsfirst_6.png

Then, navigate to the Integration Request section and add a Body Mapping Template of type application/json to inject the query string parameter into the body of the request. Accept the change suggested by the security warning. This sets the body pass-through behavior to When there are no templates defined (Recommended). The following code does the mapping:

{
   "cause": "Reject link was clicked.",
   "error": "Rejected",
   "taskToken": "$input.params('taskToken')"
}

When you are finished, choose Save.

Next, configure the /succeed GET method. The configuration is very similar to the /fail GET method. The only difference is for Action: choose SendTaskSuccess, and set the mapping as follows:

{
   "output": "\"Approve link was clicked.\"",
   "taskToken": "$input.params('taskToken')"
}

The last step on the API Gateway console after configuring your API actions is to deploy them to a new stage called respond. You can test our API by choosing the Invoke URL links under either of the GET methods. Because no token is provided in the URI, a ValidationException message should be displayed.

stepfunctionsfirst_7.png

Create an activity worker Lambda function

In the Lambda console, create a Lambda function with a CloudWatch Events Schedule trigger using a blank function blueprint for the Node.js 4.3 runtime. The rate entered for Schedule expression is the poll rate for the activity. This should be above the rate at which the activities are scheduled by a safety margin.

The safety margin accounts for the possibility of lost tokens, retried activities, and polls that happen while no activities are scheduled. For example, if you expect 3 promotions to happen, in a certain week, you can schedule the Lambda function to run 4 times a day during that week. Alternatively, a single Lambda function can poll for multiple activities, either in parallel or in series. For this example, use a rate of one time per minute but do not enable the trigger yet.

stepfunctionsfirst_8.png

Next, create the Lambda function ManualStepActivityWorker using the following Node.js 4.3 code. The function receives the taskToken, employee name, and manager’s email from StepFunctions. It embeds the information into an email, and sends out the email to the manager.


'use strict';
console.log('Loading function');
const aws = require('aws-sdk');
const stepfunctions = new aws.StepFunctions();
const ses = new aws.SES();
exports.handler = (event, context, callback) => {
    
    var taskParams = {
        activityArn: 'arn:aws:states:us-east-1:ACCOUNT_ID:activity:ManualStep'
    };
    
    stepfunctions.getActivityTask(taskParams, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            context.fail('An error occured while calling getActivityTask.');
        } else {
            if (data === null) {
                // No activities scheduled
                context.succeed('No activities received after 60 seconds.');
            } else {
                var input = JSON.parse(data.input);
                var emailParams = {
                    Destination: {
                        ToAddresses: [
                            input.managerEmailAddress
                            ]
                    },
                    Message: {
                        Subject: {
                            Data: 'Your Approval Needed for Promotion!',
                            Charset: 'UTF-8'
                        },
                        Body: {
                            Html: {
                                Data: 'Hi!<br />' +
                                    input.employeeName + ' has been nominated for promotion!<br />' +
                                    'Can you please approve:<br />' +
                                    'https://API_DEPLOYMENT_ID.execute-api.us-east-1.amazonaws.com/respond/succeed?taskToken=' + encodeURIComponent(data.taskToken) + '<br />' +
                                    'Or reject:<br />' +
                                    'https://API_DEPLOYMENT_ID.execute-api.us-east-1.amazonaws.com/respond/fail?taskToken=' + encodeURIComponent(data.taskToken),
                                Charset: 'UTF-8'
                            }
                        }
                    },
                    Source: input.managerEmailAddress,
                    ReplyToAddresses: [
                            input.managerEmailAddress
                        ]
                };
                    
                ses.sendEmail(emailParams, function (err, data) {
                    if (err) {
                        console.log(err, err.stack);
                        context.fail('Internal Error: The email could not be sent.');
                    } else {
                        console.log(data);
                        context.succeed('The email was successfully sent.');
                    }
                });
            }
        }
    });
};

In the Lambda function handler and role section, for Role, choose Create a new role, LambdaManualStepActivityWorkerRole.

stepfunctionsfirst_9.png

Add two policies to the role: one to allow the Lambda function to call the GetActivityTask API action by calling Step Functions, and one to send an email by calling SES. The result should look as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": "states:GetActivityTask",
      "Resource": "arn:aws:states:*:*:activity:ManualStep"
    },
    {
      "Effect": "Allow",
      "Action": "ses:SendEmail",
      "Resource": "*"
    }
  ]
}

In addition, as the GetActivityTask API action performs long-polling with a timeout of 60 seconds, increase the timeout of the Lambda function to 1 minute 15 seconds. This allows the function to wait for an activity to become available, and gives it extra time to call SES to send the email. For all other settings, use the Lambda console defaults.

stepfunctionsfirst_10.png

After this, you can create your activity worker Lambda function.

Test the process

You are now ready to test the employee promotion process.

In the Lambda console, enable the ManualStepPollSchedule trigger on the ManualStepActivityWorker Lambda function.

In the Step Functions console, start a new execution of the state machine with the following input:

{ "managerEmailAddress": "[email protected]", "employeeName" : "Jim" } 

Within a minute, you should receive an email with links to approve or reject Jim’s promotion. Choosing one of those links should succeed or fail the execution.

stepfunctionsfirst_11.png

Summary

In this post, you created a state machine containing an activity task with Step Functions, an API with API Gateway, and a Lambda function to dispatch the approval/failure process. Your Step Functions activity task generated a unique token that was returned later indicating either approval or rejection by the person making the decision. Your Lambda function acquired the task token by polling the activity task, and then generated and sent an email to the manager for approval or rejection with embedded hyperlinks to URIs hosted by API Gateway.

If you have questions or suggestions, please comment below.

US and KickassTorrents Go Head to Head in Court

Post Syndicated from Ernesto original https://torrentfreak.com/us-and-kickasstorrents-go-head-to-head-in-court-170202/

kickasstorrents_500x500This week KickassTorrents’ alleged owner Artem Vaulin asked the Illinois District Court to dismiss the criminal indictment and set him free.

The fundamental flaw of the case, according to defense lawyer Ira Rothken, is that torrent files themselves are not copyrighted content.

In addition, he argued that the secondary copyright infringement claims would fail as these are non-existent under criminal law.

District Court Judge John Lee previously questioned the evidence in the case and according to Rothken, it is certainly not enough to keep his client behind bars. This is also what he told the court during the hearing this week, stressing that torrents themselves are not copyrighted.

“We believe that the indictment against Artem Vaulin in the KAT torrent files case is defective and should be dismissed. Torrent files are not content files. The reproduction and distribution of torrent files are not a crime,” Rothken tells TF.

“If a third party uses torrent files to infringe it is after they leave the KAT site behind and such conduct is too random, inconsistent, and attenuated to impose criminal liability on Mr. Vaulin. The government cannot use the civil judge-made law in Grokster as a theory in a criminal case.”

Furthermore, Rothken argued that the US indictment is flawed because it fails to allege an actual criminal copyright infringement anywhere in the world, the United States included. The defense likened KickassTorrents to general search engines such as Google instead.

On the other side of the aisle stood US Department of Justice prosecutor Devlin Su. He urged the court to wait for the extradition hearing in Poland before ruling on the request, noting that Vaulin should come to the US voluntarily if he wanted to speed things up.

According to the prosecution, KickassTorrents operated as a piracy flea market, with an advertising revenue of about $12.5 million to $22.3 million. Comparing it with Google is nonsense, Su argued.

“Google is not dedicated to uploading and distributing copyrighted works,” Law360 quotes the prosecutor.

It is now up to the Illinois District Court to decide how to move forward. The defense is hoping for an outright dismissal, while the U.S. wants to move forward.

Meanwhile, over in Poland, Vaulin remains in custody after he was denied bail. Facing severe health issues, the Ukrainian was transferred from Polish prison to a local hospital a few weeks ago, where he remains under heavy guard.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

2016: The Year In Tech, And A Sneak Peek Of What’s To Come

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/2016-year-tech-sneak-peek-whats-come/

2016 is safely in our rear-view mirrors. It’s time to take a look back at the year that was and see what technology had the biggest impact on consumers and businesses alike. We also have an eye to 2017 to see what the future holds.

AI and machine learning in the cloud

Truly sentient computers and robots are still the stuff of science fiction (and the premise of one of 2016’s most promising new SF TV series, HBO’s Westworld). Neural networks are nothing new, but 2016 saw huge strides in artificial intelligence and machine learning, especially in the cloud.

Google, Amazon, Apple, IBM, Microsoft and others are developing cloud computing infrastructures designed especially for AI work. It’s this technology that’s underpinning advances in image recognition technology, pattern recognition in cybersecurity, speech recognition, natural language interpretation and other advances.

Microsoft’s newly-formed AI and Research Group is finding ways to get artificial intelligence into Microsoft products like its Bing search engine and Cortana natural language assistant. Some of these efforts, while well-meaning, still need refinement: Early in 2016 Microsoft launched Tay, an AI chatbot designed to mimic the natural language characteristics of a teenage girl and learn from interacting with Twitter users. Microsoft had to shut Tay down after Twitter users exploited vulnerabilities that caused Tay to begin spewing really inappropriate responses. But it paves the way for future efforts that blur the line between man and machine.

Finance, energy, climatology – anywhere you find big data sets you’re going to find uses for machine learning. On the consumer end it can help your grocery app guess what you might want or need based on your spending habits. Financial firms use machine learning to help predict customer credit scores by analyzing profile information. One of the most intriguing uses of machine learning is in security: Pattern recognition helps systems predict malicious intent and figure out where exploits will come from.

Meanwhile we’re still waiting for Rosie the Robot from the Jetsons. And flying cars. So if Elon Musk has any spare time in 2017, maybe he can get on that.

AR Games

Augmented Reality (AR) games have been around for a good long time – ever since smartphone makers put cameras on them, game makers have been toying with the mix of real life and games.

AR games took a giant step forward with a game released in 2016 that you couldn’t get away from, at least for a little while. We’re talking about Pokémon GO, of course. Niantic, makers of another AR game called Ingress, used the framework they built for that game to power Pokémon GO. Kids, parents, young, old, it seemed like everyone with an iPhone that could run the game caught wild Pokémon, hatched eggs by walking, and battled each other in Pokémon gyms.

For a few weeks, anyway.

Technical glitches, problems with scale and limited gameplay value ultimately hurt Pokémon GO’s longevity. Today the game only garners a fraction of the public interest it did at peak. It continues to be successful, albeit not at the stratospheric pace it first set.

Niantic, the game’s developer, was able to tie together several factors to bring such an explosive and – if you’ll pardon the overused euphemism – disruptive – game to bear. One was its previous work with a game called Ingress, another AR-enhanced game that uses geomap data. In fact, Pokémon GO uses the same geomap data as Ingress, so Niantic had already done a huge amount of legwork needed to get Pokémon GO up and running. Niantic cleverly used Google Maps data to form the basis of both games, relying on already-identified public landmarks and other locations tagged by Ingress players (Ingress has been around since 2011).

Then, of course, there’s the Pokémon connection – an intensely meaningful gaming property that’s been popular with generations of video games and cartoon watchers since the 1990s. The dearth of Pokémon-branded games on smartphones meant an instant explosion of popularity upon Pokémon GO’s release.

2016 also saw the introduction of several new virtual reality (VR) headsets designed for home and mobile use. Samsung Gear VR and Google Daydream View made a splash. As these products continue to make consumer inroads, we’ll see more games push the envelope of what you can achieve with VR and AR.

Hybrid Cloud

Hybrid Cloud services combine public cloud storage (like B2 Cloud Storage) or public compute (like Amazon Web Services) with a private cloud platform. Specialized content and file management software glues it all together, making the experience seamless for the user.

Businesses get the instant access and speed they need to get work done, with the ability to fall back on on-demand cloud-based resources when scale is needed. B2’s hybrid cloud integrations include OpenIO, which helps businesses maintain data storage on-premise until it’s designated for archive and stored in the B2 cloud.

The cost of entry and usage of Hybrid Cloud services have continued to fall. For example, small and medium-sized organizations in the post production industry are finding Hybrid Cloud storage is now a viable strategy in managing the large amounts of information they use on a daily basis. This strategy is enabled by the low cost of B2 Cloud Storage that provides ready access to cloud-stored data.

There are practical deployment and scale issues that have kept Hybrid Cloud services from being used widespread in the largest enterprise environments. Small to medium businesses and vertical markets like Media & Entertainment have found promising, economical opportunities to use it, which bodes well for the future.

Inexpensive 3D printers

3D printing, once a rarified technology, has become increasingly commoditized over the past several years. That’s been in part thanks to the “Maker Movement:” Thousands of folks all around the world who love to tinker and build. XYZprinting is out in front of makers and others with its line of inexpensive desktop da Vinci printers.

The da Vinci Mini is a tabletop model aimed at home users which starts at under $300. You can download and tweak thousands of 3D models to build toys, games, art projects and educational items. They’re built using spools of biodegradable, non-toxic plastics derived from corn starch which dispense sort of like the bobbin on a sewing machine. The da Vinci Mini works with Macs and PCs and can connect via USB or Wi-Fi.

DIY Drones

Quadcopter drones have been fun tech toys for a while now, but the new trend we saw in 2016 was “do it yourself” models. The result was Flybrix, which combines lightweight drone motors with LEGO building toys. Flybrix was so successful that they blew out of inventory for the 2016 holiday season and are backlogged with orders into the new year.

Each Flybrix kit comes with the motors, LEGO building blocks, cables and gear you need to build your own quad, hex or octocopter drone (as well as a cheerful-looking LEGO pilot to command the new vessel). A downloadable app for iOS or Android lets you control your creation. A deluxe kit includes a handheld controller so you don’t have to tie up your phone.

If you already own a 3D printer like the da Vinci Mini, you’ll find plenty of model files available for download and modification so you can print your own parts, though you’ll probably need help from one of the many maker sites to know what else you’ll need to aerial flight and control.

5D Glass Storage

Research at the University of Southampton may yield the next big leap in optical storage technology meant for long-term archival. The boffins at the Optoelectronics Research Centre have developed a new data storage technique that embeds information in glass “nanostructures” on a storage disc the size of a U.S. quarter.

A Blu-Ray Disc can hold 50 GB, but one of the new 5D glass storage discs – only the size of a U.S. quarter – can hold 360 TB – 7200 times more. It’s like a super-stable supercharged version of a CD. Not only is the data inscribed on much smaller structures within the glass, but reflected at multiple angles, hence “5D.”

An upside to this is an absence of bit rot: The glass medium is extremely stable, with a shelf life predicted in billions of years. The downside is that this is still a write-once medium, so it’s intended for long term storage.

This tech is still years away from practical use, but it took a big step forward in 2016 when the University announced the development of a practical information encoding scheme to use with it.

Smart Home Tech

Are you ready to talk to your house to tell it to do things? If you’re not already, you probably will be soon. Google’s Google Home is a $129 voice-activated speaker powered by the Google Assistant. You can use it for everything from streaming music and video to a nearby TV to reading your calendar or to do list. You can also tell it to operate other supported devices like the Nest smart thermostat and Philips Hue lights.

Amazon has its own similar wireless speaker product called the Echo, powered by Amazon’s Alexa information assistant. Amazon has differentiated its Echo offerings by making the Dot – a hockey puck-sized device that connects to a speaker you already own. So Amazon customers can begin to outfit their connected homes for less than $50.

Apple’s HomeKit software kit isn’t a speaker like Amazon Echo or Google Home. It’s software. You use the Home app on your iOS 10-equipped iPhone or iPad to connect and configure supported devices. Use Siri, Apple’s own intelligent assistant, on any supported Apple device. HomeKit turns on lights, turns up the thermostat, operates switches and more.

Smart home tech has been coming in fits and starts for a while – the Nest smart thermostat is already in its third generation, for example. But 2016 was the year we finally saw the “Internet of things” coalescing into a smart home that we can control through voice and gestures in a … well, smart way.

Welcome To The Future

It’s 2017, welcome to our brave new world. While it’s anyone’s guess what the future holds, there are at least a few tech trends that are pretty safe to bet on. They include:

  • Internet of Things: More smart-connected devices are coming online in the home and at work every day, and this trend will accelerate in 2017 with more and more devices requiring some form of Internet connectivity to work. Expect to see a lot more appliances, devices, and accessories that make use of the API’s promoted by Google, Amazon, and Apple to help let you control everything in your life just using your voice and a smart speaker setup.
  • Blockchain security: Blockchain is the digital ledger security technology that makes Bitcoin work. Its distribution methodology and validation system help you make certain that no one’s tampered with the records, which make it well-suited for applications besides cryptocurrency, like make sure your smart thermostat (see above) hasn’t been hacked). Expect 2017 to be the year we see more mainstream acceptance, use, and development of blockchain technology from financial institutions, the creation of new private blockchain networks, and improved usability aimed at making blockchain easier for regular consumers to use. Blockchain-based voting is here too. It also wouldn’t surprise us, given all this movement, to see government regulators take a much deeper interest in blockchain, either.
  • 5G: Verizon is field-testing 5G on its wireless network, which it says deliver speeds 30-50 times faster than 4G LTE. We’ll be hearing a lot more about 5G from Verizon and other wireless players in 2017. In fairness, we’re still a few years away from widescale 5G deployment, but field-testing has already started.

Your Predictions?

Enough of our bloviation. Let’s open the floor to you. What do you think were the biggest technology trends in 2016? What’s coming in 2017 that has you the most excited? Let us know in the comments!

The post 2016: The Year In Tech, And A Sneak Peek Of What’s To Come appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

China Shuts Down 290 Websites in Piracy Crackdown

Post Syndicated from Andy original https://torrentfreak.com/china-shuts-down-290-websites-in-piracy-crackdown-161224/

piratekayOn July 12, China’s State Copyright Administration and four other departments launched “Jian Wang 2016”, a program designed to crack down on Internet-based intellectual property infringement.

According to the government, JW2016 targeted the “unauthorized illegal spread” of film and television works, news and other digital literature in order to protect the rights and interests of rightsholders. The program also aimed to further regulate online music and cloud storage services.

The cloud storage impact was felt immediately, with many providers choosing to “voluntarily” close down in the face of government allegations of illegal activity. In October, one of the largest, Qihoo 360, said it would cease offering accounts to private citizens due to the service being used to spread pirated content and other “illegal information” which inflicted “huge harm on society”.

In a statement on the closure, the government said that Qihoo 360 will wipe all user data by February 2017, a move which reflects how much importance the “360 group of companies’ attach to the protection of copyright works.”

This week, China’s National Copyright Administration announced new successes achieved by JW2016 during a five-month period. According to the department, the authorities handled 514 cases of online copyright infringement between July and November. Fines equal to almost $467,000 were handed down.

Others received a harsher treatment. According to the government, a total of 290 websites said to have engaged in Internet piracy were shut down. None of the sites said to have been closed are named in China’s official announcement.

“The State Copyright Administration has also supervised four batches of a total of 31 cases of copyright infringement, granting subsidies to local cases of more than 1.5 million yuan ($216,000),” the Administration said.

“At home and abroad, Jian Wang 2016 has had a very good effect. The initial results of copyright management on the Internet has greatly improved the environment for copyright and laid good foundations for further action.”

While China says it’s making progress on the copyright enforcement front, that hasn’t stopped it from being criticized by the United States.

In this week’s “Out-of-Cycle Review of Notorious Markets”, the United States Trade Representative (USTR) mentioned China in connection with a number of sites offering either pirate or counterfeit content, including the little-known-in-the-West ‘Beevideo’.

“BeeVideo is an application that facilitates the viewing of allegedly infringing movies and television shows on smart TVs through set-top boxes, and on mobile devices,” the USTR said.

“The app is available through the BeeVideo.tv website portal. BeeVideo has been downloaded more than 12 million times and once downloaded allegedly provides unlimited unauthorized access to infringing content. The developer and operator of BeeVideo is allegedly based in China.”

The USTR also called out China over Nanjing Imperiosus, a company that allegedly provides domain name registration services to around 2,300 illegal pharmacies. In a comment Thursday, the EFF said that while there may be issues with the sites themselves, domain registrars don’t host any content.

“It’s true that domain names can sometimes point to content deemed unlawful, but so too, ironically, does the Notorious Markets List—as well as this blog post, for that matter,” the EFF said.

“Enforcing content laws against intermediaries who merely point to unlawful information is a never-ending and misdirected quest, in which freedom of expression is an inevitable casualty.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Amazon Aurora Update – PostgreSQL Compatibility

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgresql-compatibility/

Just two years ago (it seems like yesterday), I introduced you to Amazon Aurora in my post Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon RDS. In that post I told you how the RDS team took a fresh, unconstrained look at the relational database model and explained how they built a relational database for the cloud.

The feedback that we have received from our customers since then has been heart-warming.  Customers love the MySQL compatibility, the focus on high availability, and the built-in encryption. They count on the fact that Aurora is built around fault-tolerant, self-healing storage that allows them to scale from 10 GB all the way up to 64 TB without pre-provisioning. They know that Aurora makes six copies of their data across three Availability Zones and backs it up to Amazon Simple Storage Service (S3) without impacting performance or availability. As they scale, they know that they can create up to 15 low-latency read replicas that draw from common storage. To learn more about how our customers are using Aurora in world-scale production environments, take some time to read our Amazon Aurora Testimonials.

Of course, customers are always asking for more, and we do our best to understand their needs and to oblige. Here is a look back at some recent updates that were made in response to specific feedback from customers:

And Now, PostgreSQL Compatibility
In addition to the feature-level feedback, we received many requests for additional database compatibility. At the top of the list was compatibility with PostgreSQL. This open source database has been under continuous development for 20 years and has found a home in many enterprises and startups. Customers like the enterprise features (similar to those offered by SQL Server and Oracle), performance benefits, and the geospatial objects associated with PostgreSQL.  They would love to have access to these capabilities while also taking advantage of all that Aurora has to offer.

Today we are launching a preview of Amazon Aurora PostgreSQL-Compatible Edition. It offers all of the benefits that I listed above, including high durability, high availability, and the ability to quickly create and deploy read replicas. Here are some of the things you will love about it:

Performance – Aurora delivers up to 2x the performance of PostgreSQL running in traditional environments.

Compatibility – Aurora is fully compatible with the open source version of PostgreSQL (version 9.6.1). On the stored procedure side, we are planning to support Perl, pgSQL, Tcl, and JavaScript (via the V8 JavaScript engine). We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL.

Cloud Native – Aurora takes full advantage of the fact that it is running within AWS. Here are some of the touch points:

Here’s how you access all of this from the RDS Console. You start by selecting the PostgresSQL Compatible option:

Then you choose your database instance type, decide on Multi-AZ deployment, name your database instance, and set up a user name & password:

We are making a preview of PostgreSQL compatibility for Amazon Aurora available in the US East (Northern Virginia) Region today and you can sign up now for access!

A Quick Comparison
My colleagues David Wein and Grant McAlister ran some tests that compared the performance of PostgreSQL compatibility for Amazon Aurora against PostgreSQL 9.6.1. The database servers were run on m4.16xlarge instances and the test clients were run on c4.8xlarge instances.

PostgreSQL was run using 45K of Provisioned IOPS storage consisting of three 15K IOPS EBS volumes striped into one logical volume, topped off with an ext4 file system. They enabled WAL compression and aggressive autovacuum, both of which improve the performance of PostgreSQL on the workloads that they tested.

David & Grant ran the standard PostgreSQL pgbench benchmarking tool. They used a scaling factor of 2000 which creates a 30 GiB database and uses several different client counts. Each data point ran for one hour, with the database recreated before each run. The graph below shows the results:

David also shared the final seconds of one of his runs:

progress: 3597.0 s, 39048.4 tps, lat 26.075 ms stddev 9.883
progress: 3598.0 s, 38047.7 tps, lat 26.959 ms stddev 10.197
progress: 3599.0 s, 38111.1 tps, lat 27.009 ms stddev 10.257
progress: 3600.0 s, 34371.7 tps, lat 29.363 ms stddev 14.468
transaction type: 
scaling factor: 2000
query mode: prepared
number of clients: 1024
number of threads: 1024
duration: 3600 s
number of transactions actually processed: 137508938
latency average = 26.800 ms
latency stddev = 19.222 ms
tps = 38192.805529 (including connections establishing)
tps = 38201.099738 (excluding connections establishing)

They also shared a per-second throughput graph that covered the last 40 minutes of a similar run:

As you can see, Amazon Aurora delivered higher throughput than PostgreSQL, with about 1/3 of the jitter (standard deviations of 1395 TPS and 5081 TPS, respectively).

David and Grant are now collecting data for a more detailed post that they plan to publish in early 2017.

Coming Soon – Performance Insights
We are also working on a new tool that is designed to help you to understand database performance at a very detailed level. You will be able to look inside of each query and learn more about how your database handles it. Here’s a sneak preview screen shot:

You will be able to access the new Performance Insights as part of the preview. I’ll have more details and a full tour later.

Jeff;