Tag Archives: KBA

McAfee Security Experts Weigh-in Weirdly With “Fresh Kodi Warning”

Post Syndicated from Andy original https://torrentfreak.com/mcafee-security-experts-weigh-in-weirdly-with-fresh-kodi-warning-180311/

Over the past several years, the last couple in particular, piracy has stormed millions of homes around the world.

From being a widespread but still fairly geeky occupation among torrenters, movie and TV show piracy can now be achieved by anyone with the ability to click a mouse or push a button on a remote control. Much of this mainstream interest can be placed at the feet of the Kodi media player.

An entirely legal platform in its own right, Kodi can be augmented with third-party add-ons that enable users to access an endless supply of streaming media. As such, piracy-configured Kodi installations are operated by an estimated 26 million people, according to the MPAA.

This popularity has led to much interest from tabloid newspapers in the UK which, for reasons best known to them, choose to both promote and demonize Kodi almost every week. While writing about news events is clearly par for the course, when one considers some of the reports, their content, and what inspired them, something doesn’t seem right.

This week The Express, which has published many overly sensational stories about Kodi in recent times, published another. The title – as always – promised something special.

Sounds like big news….

Reading the text, however, reveals nothing new whatsoever. The piece simply rehashes some of the historic claims that have been leveled at Kodi that can easily apply to any Internet-enabled software or system. But beyond that, some of its content is pretty weird.

The piece is centered on comments from two McAfee security experts – Chief Scientist Raj Samani and Chief Consumer Security Evangelist Gary Davis. It’s unclear whether The Express approached them for comment (if they did, there is no actual story for McAfee to comment on) or whether McAfee offered the comments and The Express built a story around them. Either way, here’s a taster.

“Kodi has been pretty open about the fact that it’s a streaming site but my view has always been if I use Netflix I know that I’m not going to get any issues, if I use Amazon I’m not going to get any issues,” Samani told the publication.

Ok, stop right there. Kodi admits that it’s a streaming site? Really? Kodi is a piece of software. It’s a media player. It can do many things but Kodi is not a streaming site and no one at Kodi has ever labeled it otherwise. To think that neither McAfee nor the publication caught that one is a bit embarrassing.

The argument that Samani was trying to make is that services like Netflix and Amazon are generally more reliable than third-party sources and there are few people out there who would argue with that.

“Look, ultimately you’ve got to do the research and you’ve got to decide if it’s right for you but personally I don’t use [Kodi] and I know full well that by not using [Kodi] I’m not going to get any issues. If I pay for the service I know exactly what I’m going to get,” he said.

But unlike his colleague who doesn’t use Kodi, Gary Davis has more experience.

McAfee’s Chief Consumer Security Evangelist admits to having used Kodi in the past but more recently decided not to use it when the security issues apparently got too much for him.

“I did use [Kodi] but turned it off as I started getting worried about some of the risks,” he told The Express.

“You may search for something and you may get what you are looking for but you may get something that you are not looking for and that’s where the problem lies with Kodi.”

This idea, that people search for a movie or TV show yet get something else, is bewildering to most experienced Kodi users. If this was indeed the case, on any large scale, people wouldn’t want to use it anymore. That’s clearly not the case.

Also, incorrect content appearing is not the kind of security threat that the likes of McAfee tend to be worried about. However, Davis suggests things can get worse.

“I’m not saying they’ve done anything wrong but if somebody is able to embed code to turn on a microphone or other things or start sending data to a place it shouldn’t go,” he said.

The sentence appears to have some words missing and struggles to make sense but the suggestion is that someone’s Kodi installation could be corrupted to the point that someone people could hijack the user’s microphone.

We are not aware of anything like that happening, ever, via Kodi. There are instances where that has happened completely without it in a completely different context, but that seems here nor there. By the same count, everyone should stop using Windows perhaps?

The big question is why these ‘scary’ Kodi non-stories keep getting published and why experts are prepared to weigh-in on them?

It would be too easy to quickly put it down to some anti-piracy agenda, even though there are plenty of signs that anti-piracy groups have been habitually feeding UK tabloids with information on that front. Indeed, a source at a UK news outlet (that no longer publishes such stories) told TF that they were often prompted to write stories about Kodi and streaming in general, none with a positive spin.

But if it was as simple as that, how does that explain another story run in The Express this week heralding the launch of Kodi’s ‘Leia’ alpha release?

If Kodi is so bad as to warrant an article telling people to avoid it FOREVER on one day, why is it good enough to be promoted on another? It can only come down to the number of clicks – but the clickbait headline should’ve given that away at the start.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Playboy’s Copyright Lawsuit Threatens Online Expression, Boing Boing Argues

Post Syndicated from Ernesto original https://torrentfreak.com/playboys-copyright-lawsuit-threatens-online-expression-boing-boing-argues-180202/

Early 2016, Boing Boing co-editor Xeni Jardin published an article in which she linked to an archive of every Playboy centerfold image till then.

“Kind of amazing to see how our standards of hotness, and the art of commercial erotic photography, have changed over time,” Jardin commented.

While the linked material undoubtedly appealed to many readers, Playboy itself took offense to the fact that infringing copies of their work were being shared in public. While Boing Boing didn’t upload or store the images in question, the publisher filed a complaint.

Playboy accused the blog’s parent company Happy Mutants of various counts of copyright infringement, claiming that it exploited their playmates’ images for commercial purposes.

Last month Boing Boing responded to the allegations with a motion to dismiss. The case should be thrown out, it argued, noting that linking to infringing material for the purpose of reporting and commentary, is not against the law.

This prompted Playboy to fire back, branding Boing Boing a “clickbait” site. Playboy informed the court that the popular blog profits off the work of others and has no fair use defense.

Before the California District Court decides on the matter, Boing Boing took the opportunity to reply to Playboy’s latest response. According to the defense, Playboy’s case is an attack on people’s freedom of expression.

“Playboy claims this is an important case. It is partially correct: if the Court allows this case to go forward, it will send a dangerous message to everyone engaged in ordinary online commentary,” Boing Boing’s reply reads.

Referencing a previous Supreme Court decision, the blog says that the Internet democratizes access to speech, with websites as a form of modern-day pamphlets.

Links to source materials posted by third parties give these “pamphlets” more weight as they allow readers to form their own opinion on the matter, Boing Boing argues. If the court upholds Playboy’s arguments, however, this will become a risky endeavor.

“Playboy, however, would apparently prefer a world in which the ‘pamphleteer’ must ask for permission before linking to primary sources, on pain of expensive litigation,” the defense writes.

“This case merely has to survive a motion to dismiss to launch a thousand more expensive lawsuits, chilling a broad variety of lawful expression and reporting that merely adopts the common practice of linking to the material that is the subject of the report.”

The defense says that there are several problems with Playboy’s arguments. Among other things, Boing Boing argues that did nothing to cause the unauthorized posting of Playboy’s work on Imgur and YouTube.

Another key argument is that linking to copyright-infringing material should be considered fair use, since it was for purposes of criticism, commentary, and news reporting.

“Settled precedent requires dismissal, both because Boing Boing did not induce or materially contribute to any copyright infringement and, in the alternative, because Boing Boing engaged in fair use,” the defense writes.

Instead of going after Boing Boing for contributory infringement, Playboy could actually try to uncover the people who shared the infringing material, they argue. There is nothing that prevents them from doing so.

After hearing the arguments from both sides it is now up to the court to decide how to proceed. Given what’s at stake, the eventual outcome in this case is bound to set a crucial precedent.

A copy of Boing Boing’s reply is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Playboy Brands Boing Boing a “Clickbait” Site With No Fair Use Defense

Post Syndicated from Andy original https://torrentfreak.com/playboy-brands-boing-boing-a-clickbait-site-with-no-fair-use-defense-180126/

Late 2017, Boing Boing co-editor Xena Jardin posted an article in which he linked to an archive containing every Playboy centerfold image to date.

“Kind of amazing to see how our standards of hotness, and the art of commercial erotic photography, have changed over time,” Jardin noted.

While Boing Boing had nothing to do with the compilation, uploading, or storing of the Imgur-based archive, Playboy took exception to the popular blog linking to the album.

Noting that Jardin had referred to the archive uploader as a “wonderful person”, the adult publication responded with a lawsuit (pdf), claiming that Boing Boing had commercially exploited its copyrighted images.

Last week, with assistance from the Electronic Frontier Foundation, Boing Boing parent company Happy Mutants filed a motion to dismiss in which it defended its right to comment on and link to copyrighted content without that constituting infringement.

“This lawsuit is frankly mystifying. Playboy’s theory of liability seems to be that it is illegal to link to material posted by others on the web — an act performed daily by hundreds of millions of users of Facebook and Twitter, and by journalists like the ones in Playboy’s crosshairs here,” the company wrote.

EFF Senior Staff Attorney Daniel Nazer weighed in too, arguing that since Boing Boing’s reporting and commenting is protected by copyright’s fair use doctrine, the “deeply flawed” lawsuit should be dismissed.

Now, just a week later, Playboy has fired back. Opposing Happy Mutants’ request for the Court to dismiss the case, the company cites the now-famous Perfect 10 v. Amazon/Google case from 2007, which tried to prevent Google from facilitating access to infringing images.

Playboy highlights the court’s finding that Google could have been held contributorily liable – if it had knowledge that Perfect 10 images were available using its search engine, could have taken simple measures to prevent further damage, but failed to do so.

Turning to Boing Boing’s conduct, Playboy says that the company knew it was linking to infringing content, could have taken steps to prevent that, but failed to do so. It then launches an attack on the site itself, offering disparaging comments concerning its activities and business model.

“This is an important case. At issue is whether clickbait sites like Happy Mutants’ Boing Boing weblog — a site designed to attract viewers and encourage them to click on links in order to generate advertising revenue — can knowingly find, promote, and profit from infringing content with impunity,” Playboy writes.

“Clickbait sites like Boing Boing are not known for creating original content. Rather, their business model is based on ‘collecting’ interesting content created by others. As such, they effectively profit off the work of others without actually creating anything original themselves.”

Playboy notes that while sites like Boing Boing are within their rights to leverage works created by others, courts in the US and overseas have ruled that knowingly linking to infringing content is unacceptable.

Even given these conditions, Playboy argues, Happy Mutants and the EFF now want the Court to dismiss the case so that sites are free to “not only encourage, facilitate, and induce infringement, but to profit from those harmful activities.”

Claiming that Boing Boing’s only reason for linking to the infringing album was to “monetize the web traffic that over fifty years of Playboy photographs would generate”, Playboy insists that the site and parent company Happy Mutants was properly charged with copyright infringement.

Playboy also dismisses Boing Boing’s argument that a link to infringing content cannot result in liability due to the link having both infringing and substantial non-infringing uses.

First citing the Betamax case, which found that maker Sony could not be held liable for infringement because its video recorders had substantial non-infringing uses, Playboy counters with the Grokster decision, which held that a distributor of a product could be liable for infringement, if there was an intent to encourage or support infringement.

“In this case, Happy Mutants’ offending link — which does nothing more than support infringing content — is good for nothing but promoting infringement and there is no legitimate public interest in its unlicensed availability,” Playboy notes.

In its motion to dismiss, Happy Mutants also argued that unless Playboy could identify users who “in fact downloaded — rather than simply viewing — the material in question,” the case should be dismissed. However, Playboy rejects the argument, claiming it is based on an erroneous interpretation of the law.

Citing the Grokster decision once more, the adult publisher notes that the Supreme Court found that someone infringes contributorily when they intentionally induce or encourage direct infringement.

“The argument that contributory infringement only lies where the defendant’s actions result in further infringement ignores the ‘or’ and collapses ‘inducing’ and ‘encouraging’ into one thing when they are two distinct things,” Playboy writes.

As for Boing Boing’s four classic fair use arguments, the publisher describes these as “extremely weak” and proceeds to hit them one by one.

In respect of the purpose and character of the use, Playboy discounts Boing Boing’s position that the aim of its post was to show “how our standards of hotness, and the art of commercial erotic photography, have changed over time.” The publisher argues that is the exact same purpose of Playboy magazine, while highliting its publication Playboy: The Compete Centerfolds, 1953-2016.

Moving on to the second factor of fair use – the nature of the copyrighted work – Playboy notes that an entire album of artwork is involved, rather than just a single image.

On the third factor, concerning the amount and substantiality of the original work used, Playboy argues that in order to publish an opinion on how “standards of hotness” had developed over time, there was no need to link to all of the pictures in the archive.

“Had only representative images from each decade, or perhaps even each year, been taken, this would be a very different case — but Happy Mutants cannot dispute that it knew it was linking to an illegal library of ‘Every Playboy Playmate Centerfold Ever’ since that is what it titled its blog post,” Playboy notes.

Finally, when considering the effect of the use upon the potential market for or value of the copyrighted work, Playbody says its archive of images continues to be monetized and Boing Boing’s use of infringing images jeopardizes that.

“Given that people are generally not going to pay for what is freely available, it is disingenuous of Happy Mutants to claim that promoting the free availability of infringing archives of Playboy’s work for viewing and downloading is not going to have an adverse effect on the value or market of that work,” the publisher adds.

While it appears the parties agree on very little, there is agreement on one key aspect of the case – its wider importance.

On the one hand, Playboy insists that a finding in its favor will ensure that people can’t commercially exploit infringing content with impunity. On the other, Boing Boing believes that the health of the entire Internet is at stake.

“The world can’t afford a judgment against us in this case — it would end the web as we know it, threatening everyone who publishes online, from us five weirdos in our basements to multimillion-dollar, globe-spanning publishing empires like Playboy,” the company concludes.

Playboy’s opposition to Happy Mutants’ motion to dismiss can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

New Book Coming in September: "Click Here to Kill Everybody"

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/new_book_coming.html

My next book is still on track for a September 2018 publication. Norton is still the publisher. The title is now Click Here to Kill Everybody: Peril and Promise on a Hyperconnected Planet, which I generally refer to as CH2KE.

The table of contents has changed since I last blogged about this, and it now looks like this:

  • Introduction: Everything is Becoming a Computer
  • Part 1: The Trends
    • 1. Computers are Still Hard to Secure
    • 2. Everyone Favors Insecurity
    • 3. Autonomy and Physical Agency Bring New Dangers
    • 4. Patching is Failing as a Security Paradigm
    • 5. Authentication and Identification are Getting Harder
    • 6. Risks are Becoming Catastrophic
  • Part 2: The Solutions
    • 7. What a Secure Internet+ Looks Like
    • 8. How We Can Secure the Internet+
    • 9. Government is Who Enables Security
    • 10. How Government Can Prioritize Defense Over Offense
    • 11. What’s Likely to Happen, and What We Can Do in Response
    • 12. Where Policy Can Go Wrong
    • 13. How to Engender Trust on the Internet+
  • Conclusion: Technology and Policy, Together

Two questions for everyone.

1. I’m not really happy with the subtitle. It needs to be descriptive, to counterbalance the admittedly clickbait title. It also needs to telegraph: “everyone needs to read this book.” I’m taking suggestions.

2. In the book I need a word for the Internet plus the things connected to it plus all the data and processing in the cloud. I’m using the word “Internet+,” and I’m not really happy with it. I don’t want to invent a new word, but I need to strongly signal that what’s coming is much more than just the Internet — and I can’t find any existing word. Again, I’m taking suggestions.

Power data ingestion into Splunk using Amazon Kinesis Data Firehose

Post Syndicated from Tarik Makota original https://aws.amazon.com/blogs/big-data/power-data-ingestion-into-splunk-using-amazon-kinesis-data-firehose/

In late September, during the annual Splunk .conf, Splunk and Amazon Web Services (AWS) jointly announced that Amazon Kinesis Data Firehose now supports Splunk Enterprise and Splunk Cloud as a delivery destination. This native integration between Splunk Enterprise, Splunk Cloud, and Amazon Kinesis Data Firehose is designed to make AWS data ingestion setup seamless, while offering a secure and fault-tolerant delivery mechanism. We want to enable customers to monitor and analyze machine data from any source and use it to deliver operational intelligence and optimize IT, security, and business performance.

With Kinesis Data Firehose, customers can use a fully managed, reliable, and scalable data streaming solution to Splunk. In this post, we tell you a bit more about the Kinesis Data Firehose and Splunk integration. We also show you how to ingest large amounts of data into Splunk using Kinesis Data Firehose.

Push vs. Pull data ingestion

Presently, customers use a combination of two ingestion patterns, primarily based on data source and volume, in addition to existing company infrastructure and expertise:

  1. Pull-based approach: Using dedicated pollers running the popular Splunk Add-on for AWS to pull data from various AWS services such as Amazon CloudWatch or Amazon S3.
  2. Push-based approach: Streaming data directly from AWS to Splunk HTTP Event Collector (HEC) by using AWS Lambda. Examples of applicable data sources include CloudWatch Logs and Amazon Kinesis Data Streams.

The pull-based approach offers data delivery guarantees such as retries and checkpointing out of the box. However, it requires more ops to manage and orchestrate the dedicated pollers, which are commonly running on Amazon EC2 instances. With this setup, you pay for the infrastructure even when it’s idle.

On the other hand, the push-based approach offers a low-latency scalable data pipeline made up of serverless resources like AWS Lambda sending directly to Splunk indexers (by using Splunk HEC). This approach translates into lower operational complexity and cost. However, if you need guaranteed data delivery then you have to design your solution to handle issues such as a Splunk connection failure or Lambda execution failure. To do so, you might use, for example, AWS Lambda Dead Letter Queues.

How about getting the best of both worlds?

Let’s go over the new integration’s end-to-end solution and examine how Kinesis Data Firehose and Splunk together expand the push-based approach into a native AWS solution for applicable data sources.

By using a managed service like Kinesis Data Firehose for data ingestion into Splunk, we provide out-of-the-box reliability and scalability. One of the pain points of the old approach was the overhead of managing the data collection nodes (Splunk heavy forwarders). With the new Kinesis Data Firehose to Splunk integration, there are no forwarders to manage or set up. Data producers (1) are configured through the AWS Management Console to drop data into Kinesis Data Firehose.

You can also create your own data producers. For example, you can drop data into a Firehose delivery stream by using Amazon Kinesis Agent, or by using the Firehose API (PutRecord(), PutRecordBatch()), or by writing to a Kinesis Data Stream configured to be the data source of a Firehose delivery stream. For more details, refer to Sending Data to an Amazon Kinesis Data Firehose Delivery Stream.

You might need to transform the data before it goes into Splunk for analysis. For example, you might want to enrich it or filter or anonymize sensitive data. You can do so using AWS Lambda. In this scenario, Kinesis Data Firehose buffers data from the incoming source data, sends it to the specified Lambda function (2), and then rebuffers the transformed data to the Splunk Cluster. Kinesis Data Firehose provides the Lambda blueprints that you can use to create a Lambda function for data transformation.

Systems fail all the time. Let’s see how this integration handles outside failures to guarantee data durability. In cases when Kinesis Data Firehose can’t deliver data to the Splunk Cluster, data is automatically backed up to an S3 bucket. You can configure this feature while creating the Firehose delivery stream (3). You can choose to back up all data or only the data that’s failed during delivery to Splunk.

In addition to using S3 for data backup, this Firehose integration with Splunk supports Splunk Indexer Acknowledgments to guarantee event delivery. This feature is configured on Splunk’s HTTP Event Collector (HEC) (4). It ensures that HEC returns an acknowledgment to Kinesis Data Firehose only after data has been indexed and is available in the Splunk cluster (5).

Now let’s look at a hands-on exercise that shows how to forward VPC flow logs to Splunk.

How-to guide

To process VPC flow logs, we implement the following architecture.

Amazon Virtual Private Cloud (Amazon VPC) delivers flow log files into an Amazon CloudWatch Logs group. Using a CloudWatch Logs subscription filter, we set up real-time delivery of CloudWatch Logs to an Kinesis Data Firehose stream.

Data coming from CloudWatch Logs is compressed with gzip compression. To work with this compression, we need to configure a Lambda-based data transformation in Kinesis Data Firehose to decompress the data and deposit it back into the stream. Firehose then delivers the raw logs to the Splunk Http Event Collector (HEC).

If delivery to the Splunk HEC fails, Firehose deposits the logs into an Amazon S3 bucket. You can then ingest the events from S3 using an alternate mechanism such as a Lambda function.

When data reaches Splunk (Enterprise or Cloud), Splunk parsing configurations (packaged in the Splunk Add-on for Kinesis Data Firehose) extract and parse all fields. They make data ready for querying and visualization using Splunk Enterprise and Splunk Cloud.


Install the Splunk Add-on for Amazon Kinesis Data Firehose

The Splunk Add-on for Amazon Kinesis Data Firehose enables Splunk (be it Splunk Enterprise, Splunk App for AWS, or Splunk Enterprise Security) to use data ingested from Amazon Kinesis Data Firehose. Install the Add-on on all the indexers with an HTTP Event Collector (HEC). The Add-on is available for download from Splunkbase.

HTTP Event Collector (HEC)

Before you can use Kinesis Data Firehose to deliver data to Splunk, set up the Splunk HEC to receive the data. From Splunk web, go to the Setting menu, choose Data Inputs, and choose HTTP Event Collector. Choose Global Settings, ensure All tokens is enabled, and then choose Save. Then choose New Token to create a new HEC endpoint and token. When you create a new token, make sure that Enable indexer acknowledgment is checked.

When prompted to select a source type, select aws:cloudwatch:vpcflow.

Create an S3 backsplash bucket

To provide for situations in which Kinesis Data Firehose can’t deliver data to the Splunk Cluster, we use an S3 bucket to back up the data. You can configure this feature to back up all data or only the data that’s failed during delivery to Splunk.

Note: Bucket names are unique. Thus, you can’t use tmak-backsplash-bucket.

aws s3 create-bucket --bucket tmak-backsplash-bucket --create-bucket-configuration LocationConstraint=ap-northeast-1

Create an IAM role for the Lambda transform function

Firehose triggers an AWS Lambda function that transforms the data in the delivery stream. Let’s first create a role for the Lambda function called LambdaBasicRole.

Note: You can also set this role up when creating your Lambda function.

$ aws iam create-role --role-name LambdaBasicRole --assume-role-policy-document file://TrustPolicyForLambda.json

Here is TrustPolicyForLambda.json.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      "Action": "sts:AssumeRole"


After the role is created, attach the managed Lambda basic execution policy to it.

$ aws iam attach-role-policy 
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 
  --role-name LambdaBasicRole


Create a Firehose Stream

On the AWS console, open the Amazon Kinesis service, go to the Firehose console, and choose Create Delivery Stream.

In the next section, you can specify whether you want to use an inline Lambda function for transformation. Because incoming CloudWatch Logs are gzip compressed, choose Enabled for Record transformation, and then choose Create new.

From the list of the available blueprint functions, choose Kinesis Data Firehose CloudWatch Logs Processor. This function unzips data and place it back into the Firehose stream in compliance with the record transformation output model.

Enter a name for the Lambda function, choose Choose an existing role, and then choose the role you created earlier. Then choose Create Function.

Go back to the Firehose Stream wizard, choose the Lambda function you just created, and then choose Next.

Select Splunk as the destination, and enter your Splunk Http Event Collector information.

Note: Amazon Kinesis Data Firehose requires the Splunk HTTP Event Collector (HEC) endpoint to be terminated with a valid CA-signed certificate matching the DNS hostname used to connect to your HEC endpoint. You receive delivery errors if you are using a self-signed certificate.

In this example, we only back up logs that fail during delivery.

To monitor your Firehose delivery stream, enable error logging. Doing this means that you can monitor record delivery errors.

Create an IAM role for the Firehose stream by choosing Create new, or Choose. Doing this brings you to a new screen. Choose Create a new IAM role, give the role a name, and then choose Allow.

If you look at the policy document, you can see that the role gives Kinesis Data Firehose permission to publish error logs to CloudWatch, execute your Lambda function, and put records into your S3 backup bucket.

You now get a chance to review and adjust the Firehose stream settings. When you are satisfied, choose Create Stream. You get a confirmation once the stream is created and active.

Create a VPC Flow Log

To send events from Amazon VPC, you need to set up a VPC flow log. If you already have a VPC flow log you want to use, you can skip to the “Publish CloudWatch to Kinesis Data Firehose” section.

On the AWS console, open the Amazon VPC service. Then choose VPC, Your VPC, and choose the VPC you want to send flow logs from. Choose Flow Logs, and then choose Create Flow Log. If you don’t have an IAM role that allows your VPC to publish logs to CloudWatch, choose Set Up Permissions and Create new role. Use the defaults when presented with the screen to create the new IAM role.

Once active, your VPC flow log should look like the following.

Publish CloudWatch to Kinesis Data Firehose

When you generate traffic to or from your VPC, the log group is created in Amazon CloudWatch. The new log group has no subscription filter, so set up a subscription filter. Setting this up establishes a real-time data feed from the log group to your Firehose delivery stream.

At present, you have to use the AWS Command Line Interface (AWS CLI) to create a CloudWatch Logs subscription to a Kinesis Data Firehose stream. However, you can use the AWS console to create subscriptions to Lambda and Amazon Elasticsearch Service.

To allow CloudWatch to publish to your Firehose stream, you need to give it permissions.

$ aws iam create-role --role-name CWLtoKinesisFirehoseRole --assume-role-policy-document file://TrustPolicyForCWLToFireHose.json

Here is the content for TrustPolicyForCWLToFireHose.json.

  "Statement": {
    "Effect": "Allow",
    "Principal": { "Service": "logs.us-east-1.amazonaws.com" },
    "Action": "sts:AssumeRole"


Attach the policy to the newly created role.

$ aws iam put-role-policy 
    --role-name CWLtoKinesisFirehoseRole 
    --policy-name Permissions-Policy-For-CWL 
    --policy-document file://PermissionPolicyForCWLToFireHose.json

Here is the content for PermissionPolicyForCWLToFireHose.json.

        "Resource":["arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/ FirehoseSplunkDeliveryStream"]

Finally, create a subscription filter.

$ aws logs put-subscription-filter 
   --log-group-name " /vpc/flowlog/FirehoseSplunkDemo" 
   --filter-name "Destination" 
   --filter-pattern "" 
   --destination-arn "arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/FirehoseSplunkDeliveryStream" 
   --role-arn "arn:aws:iam::YOUR-AWS-ACCT-NUM:role/CWLtoKinesisFirehoseRole"

When you run the AWS CLI command preceding, you don’t get any acknowledgment. To validate that your CloudWatch Log Group is subscribed to your Firehose stream, check the CloudWatch console.

As soon as the subscription filter is created, the real-time log data from the log group goes into your Firehose delivery stream. Your stream then delivers it to your Splunk Enterprise or Splunk Cloud environment for querying and visualization. The screenshot following is from Splunk Enterprise.

In addition, you can monitor and view metrics associated with your delivery stream using the AWS console.


Although our walkthrough uses VPC Flow Logs, the pattern can be used in many other scenarios. These include ingesting data from AWS IoT, other CloudWatch logs and events, Kinesis Streams or other data sources using the Kinesis Agent or Kinesis Producer Library. We also used Lambda blueprint Kinesis Data Firehose CloudWatch Logs Processor to transform streaming records from Kinesis Data Firehose. However, you might need to use a different Lambda blueprint or disable record transformation entirely depending on your use case. For an additional use case using Kinesis Data Firehose, check out This is My Architecture Video, which discusses how to securely centralize cross-account data analytics using Kinesis and Splunk.


Additional Reading

If you found this post useful, be sure to check out Integrating Splunk with Amazon Kinesis Streams and Using Amazon EMR and Hunk for Rapid Response Log Analysis and Review.

About the Authors

Tarik Makota is a solutions architect with the Amazon Web Services Partner Network. He provides technical guidance, design advice and thought leadership to AWS’ most strategic software partners. His career includes work in an extremely broad software development and architecture roles across ERP, financial printing, benefit delivery and administration and financial services. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.




Roy Arsan is a solutions architect in the Splunk Partner Integrations team. He has a background in product development, cloud architecture, and building consumer and enterprise cloud applications. More recently, he has architected Splunk solutions on major cloud providers, including an AWS Quick Start for Splunk that enables AWS users to easily deploy distributed Splunk Enterprise straight from their AWS console. He’s also the co-author of the AWS Lambda blueprints for Splunk. He holds an M.S. in Computer Science Engineering from the University of Michigan.




Stretch for PCs and Macs, and a Raspbian update

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/stretch-pcs-macs-raspbian-update/

Today, we are launching the first Debian Stretch release of the Raspberry Pi Desktop for PCs and Macs, and we’re also releasing the latest version of Raspbian Stretch for your Pi.

Raspberry Pi Desktop Stretch splash screen

For PCs and Macs

When we released our custom desktop environment on Debian for PCs and Macs last year, we were slightly taken aback by how popular it turned out to be. We really only created it as a result of one of those “Wouldn’t it be cool if…” conversations we sometimes have in the office, so we were delighted by the Pi community’s reaction.

Seeing how keen people were on the x86 version, we decided that we were going to try to keep releasing it alongside Raspbian, with the ultimate aim being to make simultaneous releases of both. This proved to be tricky, particularly with the move from the Jessie version of Debian to the Stretch version this year. However, we have now finished the job of porting all the custom code in Raspbian Stretch to Debian, and so the first Debian Stretch release of the Raspberry Pi Desktop for your PC or Mac is available from today.

The new Stretch releases

As with the Jessie release, you can either run this as a live image from a DVD, USB stick, or SD card or install it as the native operating system on the hard drive of an old laptop or desktop computer. Please note that installing this software will erase anything else on the hard drive — do not install this over a machine running Windows or macOS that you still need to use for its original purpose! It is, however, safe to boot a live image on such a machine, since your hard drive will not be touched by this.

We’re also pleased to announce that we are releasing the latest version of Raspbian Stretch for your Pi today. The Pi and PC versions are largely identical: as before, there are a few applications (such as Mathematica) which are exclusive to the Pi, but the user interface, desktop, and most applications will be exactly the same.

For Raspbian, this new release is mostly bug fixes and tweaks over the previous Stretch release, but there are one or two changes you might notice.

File manager

The file manager included as part of the LXDE desktop (on which our desktop is based) is a program called PCManFM, and it’s very feature-rich; there’s not much you can’t do in it. However, having used it for a few years, we felt that it was perhaps more complex than it needed to be — the sheer number of menu options and choices made some common operations more awkward than they needed to be. So to try to make file management easier, we have implemented a cut-down mode for the file manager.

Raspberry Pi Desktop Stretch - file manager

Most of the changes are to do with the menus. We’ve removed a lot of options that most people are unlikely to change, and moved some other options into the Preferences screen rather than the menus. The two most common settings people tend to change — how icons are displayed and sorted — are now options on the toolbar and in a top-level menu rather than hidden away in submenus.

The sidebar now only shows a single hierarchical view of the file system, and we’ve tidied the toolbar and updated the icons to make them match our house style. We’ve removed the option for a tabbed interface, and we’ve stomped a few bugs as well.

One final change was to make it possible to rename a file just by clicking on its icon to highlight it, and then clicking on its name. This is the way renaming works on both Windows and macOS, and it’s always seemed slightly awkward that Unix desktop environments tend not to support it.

As with most of the other changes we’ve made to the desktop over the last few years, the intention is to make it simpler to use, and to ease the transition from non-Unix environments. But if you really don’t like what we’ve done and long for the old file manager, just untick the box for Display simplified user interface and menus in the Layout page of Preferences, and everything will be back the way it was!

Raspberry Pi Desktop Stretch - preferences GUI

Battery indicator for laptops

One important feature missing from the previous release was an indication of the amount of battery life. Eben runs our desktop on his Mac, and he was becoming slightly irritated by having to keep rebooting into macOS just to check whether his battery was about to die — so fixing this was a priority!

We’ve added a battery status icon to the taskbar; this shows current percentage charge, along with whether the battery is charging, discharging, or connected to the mains. When you hover over the icon with the mouse pointer, a tooltip with more details appears, including the time remaining if the battery can provide this information.

Raspberry Pi Desktop Stretch - battery indicator

While this battery monitor is mainly intended for the PC version, it also supports the first-generation pi-top — to see it, you’ll only need to make sure that I2C is enabled in Configuration. A future release will support the new second-generation pi-top.

New PC applications

We have included a couple of new applications in the PC version. One is called PiServer — this allows you to set up an operating system, such as Raspbian, on the PC which can then be shared by a number of Pi clients networked to it. It is intended to make it easy for classrooms to have multiple Pis all running exactly the same software, and for the teacher to have control over how the software is installed and used. PiServer is quite a clever piece of software, and it’ll be covered in more detail in another blog post in December.

We’ve also added an application which allows you to easily use the GPIO pins of a Pi Zero connected via USB to a PC in applications using Scratch or Python. This makes it possible to run the same physical computing projects on the PC as you do on a Pi! Again, we’ll tell you more in a separate blog post this month.

Both of these applications are included as standard on the PC image, but not on the Raspbian image. You can run them on a Pi if you want — both can be installed from apt.

How to get the new versions

New images for both Raspbian and Debian versions are available from the Downloads page.

It is possible to update existing installations of both Raspbian and Debian versions. For Raspbian, this is easy: just open a terminal window and enter

sudo apt-get update
sudo apt-get dist-upgrade

Updating Raspbian on your Raspberry Pi

How to update to the latest version of Raspbian on your Raspberry Pi. Download Raspbian here: More information on the latest version of Raspbian: Buy a Raspberry Pi:

It is slightly more complex for the PC version, as the previous release was based around Debian Jessie. You will need to edit the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list, using sudo to do so. In both files, change every occurrence of the word “jessie” to “stretch”. When that’s done, do the following:

sudo apt-get update 
sudo dpkg --force-depends -r libwebkitgtk-3.0-common
sudo apt-get -f install
sudo apt-get dist-upgrade
sudo apt-get install python3-thonny
sudo apt-get install sonic-pi=2.10.0~repack-rpt1+2
sudo apt-get install piserver
sudo apt-get install usbbootgui

At several points during the upgrade process, you will be asked if you want to keep the current version of a configuration file or to install the package maintainer’s version. In every case, keep the existing version, which is the default option. The update may take an hour or so, depending on your network connection.

As with all software updates, there is the possibility that something may go wrong during the process, which could lead to your operating system becoming corrupted. Therefore, we always recommend making a backup first.

Enjoy the new versions, and do let us know any feedback you have in the comments or on the forums!

The post Stretch for PCs and Macs, and a Raspbian update appeared first on Raspberry Pi.

‘Netflix’ Takedown Request Targets “Stranger Things” Subreddit (Update)

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-takedown-request-targets-stranger-things-subreddit-171126/

Netflix offers a great selection of movies and TV-shows and dozens of millions of people can’t go a week without it.

Netflix is seen as an alternative to piracy. However, since Netflix’s priorities are shifting more to the production of original content, piracy is also a problem.

The streaming service now has its own anti-piracy unit and works with third-party vendors to remove unauthorized content from the Internet. This includes links to their shows in Google’s search results.

While most requests are legitimate, a recent takedown notice targeting “Stranger Things,” was a bit off. Tucked in between various pirate sites, we spotted articles from news sites Express and The Wrap.

(Update: The notice in question appears to be fake/fraudulent, see update below. This is potentially an even problematic.)


The Express article has an obvious clickbait title aimed to attract freeloaders: “Stranger Things season 2 streaming – How to watch Stranger Things online for FREE in UK.”

While there are no references to infringing content in the piece, it’s at least understandable that Netflix’ anti-piracy partner confused by it. The Wrap article, however, doesn’t even hint at anything piracy related.

That’s not all though. Netflix’s takedown request also lists the “Stranger Things” subreddit. This community page has nearly a quarter million followers and explicitly forbids any pirated content. Still, Netflix wanted it removed from Google’s search results.

Stranger Things subreddit

To give Netflix the benefit of doubt, it’s always possible that a link to pirated content slipped through at the time the notice was sent. But, if that was the case they should have at least targeted the link to the full Reddit post as well.

The more likely scenario is that there was some sort of hiccup in the automated takedown software, or perhaps a human error of some kind. Stanger things have happened.

The good news is that Google came to the rescue. After reviewing the takedown notice, the three mentioned links were discarded. This means that the subreddit is still available in Google’s search results. For now.

Reddit itself is also quite skilled at spotting faulty takedown requests. While it’s unknown whether they were contacted directly by Netflix’s anti-piracy partner, the company rejects more than half of all DMCA takedown requests it receives.

Update: A spokesman from IP Arrow, who are listed as the sender, they have nothing to do with the takedown notice. This suggests that some third party not related to IP Arrow or Netflix may have submitted it.

IP Arrow will ask Google to look into it. Strange things are clearly happening here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

New – Amazon EC2 Elastic GPUs for Windows

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-ec2-elastic-gpus-for-windows/

Today we’re excited to announce the general availability of Amazon EC2 Elastic GPUs for Windows. An Elastic GPU is a GPU resource that you can attach to your Amazon Elastic Compute Cloud (EC2) instance to accelerate the graphics performance of your applications. Elastic GPUs come in medium (1GB), large (2GB), xlarge (4GB), and 2xlarge (8GB) sizes and are lower cost alternatives to using GPU instance types like G3 or G2 (for OpenGL 3.3 applications). You can use Elastic GPUs with many instance types allowing you the flexibility to choose the right compute, memory, and storage balance for your application. Today you can provision elastic GPUs in us-east-1 and us-east-2.

Elastic GPUs start at just $0.05 per hour for an eg1.medium. A nickel an hour. If we attach that Elastic GPU to a t2.medium ($0.065/hour) we pay a total of less than 12 cents per hour for an instance with a GPU. Previously, the cheapest graphical workstation (G2/3 class) cost 76 cents per hour. That’s over an 80% reduction in the price for running certain graphical workloads.

When should I use Elastic GPUs?

Elastic GPUs are best suited for applications that require a small or intermittent amount of additional GPU power for graphics acceleration and support OpenGL. Elastic GPUs support up to and including the OpenGL 3.3 API standards with expanded API support coming soon.

Elastic GPUs are not part of the hardware of your instance. Instead they’re attached through an elastic GPU network interface in your subnet which is created when you launch an instance with an Elastic GPU. The image below shows how Elastic GPUs are attached.

Since Elastic GPUs are network attached it’s important to provision an instance with adequate network bandwidth to support your application. It’s also important to make sure your instance security group allows traffic on port 2007.

Any application that can use the OpenGL APIs can take advantage of Elastic GPUs so Blender, Google Earth, SIEMENS SolidEdge, and more could all run with Elastic GPUs. Even Kerbal Space Program!

Ok, now that we know when to use Elastic GPUs and how they work, let’s launch an instance and use one.

Using Elastic GPUs

First, we’ll navigate to the EC2 console and click Launch Instance. Next we’ll select a Windows AMI like: “Microsoft Windows Server 2016 Base”. Then we’ll select an instance type. Then we’ll make sure we select the “Elastic GPU” section and allocate an eg1.medium (1GB) Elastic GPU.

We’ll also include some userdata in the advanced details section. We’ll write a quick PowerShell script to download and install our Elastic GPU software.

Start-Transcript -Path "C:\egpu_install.log" -Append
(new-object net.webclient).DownloadFile('http://ec2-elasticgpus.s3-website-us-east-1.amazonaws.com/latest', 'C:\egpu.msi')
Start-Process "msiexec.exe" -Wait -ArgumentList "/i C:\egpu.msi /qn /L*v C:\egpu_msi_install.log"
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Program Files\Amazon\EC2ElasticGPUs\manager\", [EnvironmentVariableTarget]::Machine)
Restart-Computer -Force

This software sends all OpenGL API calls to the attached Elastic GPU.

Next, we’ll double check to make sure my security group has TCP port 2007 exposed to my VPC so my Elastic GPU can connect to my instance. Finally, we’ll click launch and wait for my instance and Elastic GPU to provision. The best way to do this is to create a separate SG that you can attach to the instance.

You can see an animation of the launch procedure below.

Alternatively we could have launched on the AWS CLI with a quick call like this:

$aws ec2 run-instances --elastic-gpu-specification Type=eg1.2xlarge \
--image-id ami-1a2b3c4d \
--subnet subnet-11223344 \
--instance-type r4.large \
--security-groups "default" "elasticgpu-sg"

then we could have followed the Elastic GPU software installation instructions here.

We can now see our Elastic GPU is humming along and attached by checking out the Elastic GPU status in the taskbar.

We welcome any feedback on the service and you can click on the Feedback link in the bottom left corner of the GPU Status Box to let us know about your experience with Elastic GPUs.

Elastic GPU Demonstration

Ok, so we have our instance provisioned and our Elastic GPU attached. My teammates here at AWS wanted me to talk about the amazingly wonderful 3D applications you can run, but when I learned about Elastic GPUs the first thing that came to mind was Kerbal Space Program (KSP), so I’m going to run a quick test with that. After all, if you can’t launch Jebediah Kerman into space then what was the point of all of that software? I’ve downloaded KSP and added the launch parameter of -force-opengl to make sure we’re using OpenGL to do our rendering. Below you can see my poor attempt at building a spaceship – I used to build better ones. It looks pretty smooth considering we’re going over a network with a lossy remote desktop protocol.

I’d show a picture of the rocket launch but I didn’t even make it off the ground before I experienced a rapid unscheduled disassembly of the rocket. Back to the drawing board for me.

In the mean time I can check my Amazon CloudWatch metrics and see how much GPU memory I used during my brief game.

Partners, Pricing, and Documentation

To continue to build out great experiences for our customers, our 3D software partners like ANSYS and Siemens are looking to take advantage of the OpenGL APIs on Elastic GPUs, and are currently certifying Elastic GPUs for their software. You can learn more about our partnerships here.

You can find information on Elastic GPU pricing here. You can find additional documentation here.

Now, if you’ll excuse me I have some virtual rockets to build.


Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

Game of Thrones Pirates Arrested For Leaking Episode Early

Post Syndicated from Andy original https://torrentfreak.com/game-of-thrones-pirates-arrested-for-leaking-episode-early-170814/

Over the past several years, Game of Thrones has become synonymous with fantastic drama and story telling on the one hand, and Internet piracy on the other. It’s the most pirated TV show in history, hands down.

With the new season well underway, another GoT drama began to unfold early August when the then-unaired episode “The Spoils of War” began to circulate on various file-sharing and streaming sites. The leak only trumped the official release by a few days, but that didn’t stop people downloading in droves.

As previously reported, the leaked episode stated that it was “For Internal Viewing Only” at the top of the screen and on the bottom right sported a “Star India Pvt Ltd” watermark. The company commented shortly after.

“We take this breach very seriously and have immediately initiated forensic investigations at our and the technology partner’s end to swiftly determine the cause. This is a grave issue and we are taking appropriate legal remedial action,” a spokesperson said.

Now, just ten days later, that investigation has already netted its first victims. Four people have reportedly been arrested in India for leaking the episode before it aired.

“We investigated the case and have arrested four individuals for unauthorized publication of the fourth episode from season seven,” Deputy Commissioner of Police Akbar Pathan told AFP.

The report indicates that a complaint was filed by a Mumbai-based company that was responsible for storing and processing the TV episodes for an app. It has been named locally as Prime Focus Technologies, which markets itself as a Netflix “Preferred Vendor”.

It’s claimed that at least some of the men had access to login credentials for Game of Thrones episodes which were then abused for the purposes of leaking.

Local media identified the men as Bhaskar Joshi, Alok Sharma and Abhishek Ghadiyal, who were employed by Prime Focus, and Mohamad Suhail, a former employee, who was responsible for leaking the episode onto the Internet.

All of the men were based in Bangalore and were interrogated “throughout the night” at their workplace on August 11. Star India welcomed the arrests and thanked the authorities for their swift action.

“We are deeply grateful to the police for their swift and prompt action. We believe that valuable intellectual property is a critical part of the development of the creative industry and strict enforcement of the law is essential to protecting it,” the company said in a statement.

“We at Star India and Novi Digital Entertainment Private Limited stand committed and ready to help the law enforcement agencies with any technical assistance and help they may require in taking the investigation to its logical conclusion.”

The men will be held in custody until August 21 while investigations continue.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Build a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/build-a-healthcare-data-warehouse-using-amazon-emr-amazon-redshift-aws-lambda-and-omop/

In the healthcare field, data comes in all shapes and sizes. Despite efforts to standardize terminology, some concepts (e.g., blood glucose) are still often depicted in different ways. This post demonstrates how to convert an openly available dataset called MIMIC-III, which consists of de-identified medical data for about 40,000 patients, into an open source data model known as the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). It describes the architecture and steps for analyzing data across various disconnected sources of health datasets so you can start applying Big Data methods to health research.

Note: If you arrived at this page looking for more info on the movie Mimic 3: Sentinel, you might not enjoy this post.

OMOP overview

The OMOP CDM helps standardize healthcare data and makes it easier to analyze outcomes at a large scale. The CDM is gaining a lot of traction in the health research community, which is deeply involved in developing and adopting a common data model. Community resources are available for converting datasets, and there are software tools to help unlock your data after it’s in the OMOP format. The great advantage of converting data sources into a standard data model like OMOP is that it allows for streamlined, comprehensive analytics and helps remove the variability associated with analyzing health records from different sources.

OMOP ETL with Apache Spark

Observational Health Data Sciences and Informatics (OHDSI) provides the OMOP CDM in a variety of formats, including Apache Impala, Oracle, PostgreSQL, and SQL Server. (See the OHDSI Common Data Model repo in GitHub.) In this scenario, the data is moved to AWS to take advantage of the unbounded scale of Amazon EMR and serverless technologies, and the variety of AWS services that can help make sense of the data in a cost-effective way—including Amazon Machine Learning, Amazon QuickSight, and Amazon Redshift.

This example demonstrates an architecture that can be used to run SQL-based extract, transform, load (ETL) jobs to map any data source to the OMOP CDM. It uses MIMIC ETL code provided by Md. Shamsuzzoha Bayzid. The code was modified to run in Amazon Redshift.

Getting access to the MIMIC-III data

Before you can retrieve the MIMIC-III data, you must request access on the PhysioNet website, which is hosted on Amazon S3 as part of the Amazon Web Services (AWS) Public Dataset Program. However, you don’t need access to the MIMIC-III data to follow along with this post.

Solution architecture and loading process

The following diagram shows the architecture that is used to convert the MIMIC-III dataset to the OMOP CDM.

The data conversion process includes the following steps:

  1. The entire infrastructure is spun up using an AWS CloudFormation template. This includes the Amazon EMR cluster, Amazon SNS topics/subscriptions, an AWS Lambda function and trigger, and AWS Identity and Access Management (IAM) roles.
  2. The MIMIC-III data is read in via an Apache Spark program that is running on Amazon EMR. The files are registered as tables in Spark so that they can be queried by Spark SQL.
  3. The transformation queries are located in a separate Amazon S3 location, which is read in by Spark and executed on the newly registered tables to convert the data into OMOP form.
  4. The data is then written to a staging S3 location, where it is ready to be copied into Amazon Redshift.
  5. As each file is loaded in OMOP form into S3, the Spark program sends a message to an SNS topic that signifies that the load completed successfully.
  6. After that message is pushed, it triggers a Lambda function that consumes the message and executes a COPY command from S3 into Amazon Redshift for the appropriate table.

This architecture provides a scalable way to use various healthcare sources and convert them to OMOP format, where the only changes needed are in the SQL transformation files. The transformation logic is stored in an S3 bucket and is completely de-coupled from the Apache Spark program that runs on EMR and converts the data into OMOP form. This makes the transformation code portable and allows the Spark jar to be reused if other data sources are added—for example, electronic health records (EHR), billing systems, and other research datasets.

Note: For larger files, you might experience the five-minute timeout limitation in Lambda. In that scenario you can use AWS Step Functions to split the file and load it one piece at a time.

Scaling the solution

The transformation code runs in a Spark container that can scale out based on how you define your EMR cluster. There are no single points of failure. As your data grows, your infrastructure can grow without requiring any changes to the underlying architecture.

If you add more data sources, such as EHRs and other research data, the high-level view of the ETL would look like the following:

In this case, the loads of the different systems are completely independent. If the EHR load is four times the size that you expected and uses all the resources, it has no impact on the Research Data or HR System loads because they are in separate containers.

You can scale your EMR cluster based on the size of the data that you anticipate. For example, you can have a 50-node cluster in your container for loading EHR data and a 2-node cluster for loading the HR System. This design helps you scale the resources based on what you consume, as opposed to expensive infrastructure sitting idle.

The only code that is unique to each execution is any diffs between the CloudFormation templates (e.g., cluster size and SQL file locations) and the transformation SQL that resides in S3 buckets. The Spark jar that is executed as an EMR step is reused across all three executions.

Upgrading versions

In this architecture, upgrading the versions of Amazon EMR, Apache Hadoop, or Spark requires a one-time change to one line of code in the CloudFormation template:

"EMRC2SparkBatch": {
      "Type": "AWS::EMR::Cluster",
      "Properties": {
        "Applications": [
            "Name": "Hadoop"
            "Name": "Spark"
        "Instances": {
          "MasterInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Master"
          "CoreInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Core"
          "TerminationProtected": false
        "Name": "EMRC2SparkBatch",
        "JobFlowRole": { "Ref": "EMREC2InstanceProfile" },
          "ServiceRole": {
                    "Ref": "EMRRole"
        "ReleaseLabel": "emr-5.0.0",
        "VisibleToAllUsers": true      

Note that this example uses a slightly lower version of EMR so that it can use Spark 2.0.0 instead of Spark 2.1.0, which does not support nulls in CSV files.

You can also select the version in the Release list in the General Configuration section of the EMR console:

The data sources all have different CloudFormation templates, so you can upgrade one data source at a time or upgrade them all together. As long as the reusable Spark jar is compatible with the new version, none of the transformation code has to change.

Executing queries on the data

After all the data is loaded, it’s easy to tear down the CloudFormation stack so you don’t pay for resources that aren’t being used:

CloudFormationManager cf = new CloudFormationManager(); 

This includes the EMR cluster, Lambda function, SNS topics and subscriptions, and temporary IAM roles that were created to push the data to Amazon Redshift. The S3 buckets that contain the raw MIMIC-III data and the data in OMOP form remain because they existed outside the CloudFormation stack.

You can now connect to the Amazon Redshift cluster and start executing queries on the ten OMOP tables that were created, as shown in the following example:

select *
from drug_exposure
limit 100;

OMOP analytics tools

For information about open source analytics tools that are built on top of the OMOP model, visit the OHDSI Software page.

The following are examples of data visualizations provided by Achilles, an open source visualization tool for OMOP.


This post demonstrated how to convert MIMIC-III data into OMOP form using data tools that are built for scale and flexibility. It compared the architecture against a traditional data warehouse and showed how this design scales by mixing a scale-out technology with EMR and a serverless technology with Lambda. It also showed how you can lower your costs by using CloudFormation to create your data pipeline infrastructure. And by tearing down the stack after the data is loaded, you don’t pay for idle servers.

You can find all the code in the AWS Labs GitHub repo with detailed, step-by-step instructions on how to load the data from MIMIC-III to OMOP using this design.

If you have any questions or suggestions, please add them below.

About the Author

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.




Create a Healthcare Data Hub with AWS and Mirth Connect








ISP Blocks Pirate Bay But Vows to Fight Future Blocking Demands

Post Syndicated from Andy original https://torrentfreak.com/isp-blocks-pirate-bay-but-vows-to-fight-future-blocking-demands-170301/

Two weeks go after almost three years of legal battles, Universal Music, Sony Music, Warner Music, Nordisk Film and the Swedish Film Industry finally achieved their dream of blocking a ‘pirate’ site.

The Patent and Market Court ordered Bredbandsbolaget, the ISP at the center of the action, to block The Pirate Bay and another defunct site, Swefilmer. A few hours ago the provider barred its subscribers from accessing them, just ahead of the Court deadline.

This pioneering legal action will almost certainly open the floodgates to similar demands in the future, but if content providers think that Bredbandsbolaget will roll over and give up, they have another thing coming.

In a statement announcing that it had complied with the orders of the court, the ISP said that despite having good reasons to appeal, it had been not allowed to do so. The provider adds that it finds it unreasonable that any provider should have to block content following pressure from private interests, so will fight all future requests.

“We are now forced to contest any future blocking demands. It is the only way for us and other Internet operators to ensure that private players should not have the last word regarding the content that should be accessible on the Internet,” Bredbandsbolaget said.

Noting that the chances of contesting a precedent-setting ruling are “small or non-existent”, the ISP added that not all providers will have the resources to fight, if they are targeted next. Fighting should be the aim though, since there are problems with the existing court order.

According to Bredbandsbolaget, the order requires it to block 100 domain names. However, the ISP says that during the trial it was not determined whether they all lead to illegal sites. In fact, it appears that some of the domains actual point to sites that are either fully legal or non-operational.

For example, in tests conducted by TF this morning the domain bay.malk.rocks led to a Minecraft forum, fattorrents.ws and magnetsearch.net/org were dead, piratewiki.info had expired, torrentdr.com was parked and ViceTorrent.com returned error 404. Also, Swefilmer.com returned a placeholder and SweHD.com was parked and for sale.

“What domains should be blocked or not blocked is therefore reliant on rightsholders’ sincerity, infallibility and the ability to make proportionate assessments,” Bredbandsbolaget warns.

“It is still unclear which body receives questions and complaints if an operator is required to mistakenly block a domain.”

In the wake of the blocking ruling two weeks ago, two other major ISPs in Sweden indicated that they too would put up a fight against blocking demands.

Bahnhof slammed the decision to block The Pirate Bay, describing the effort as signaling the “death throes” of the copyright industry.

Telia was more moderate but said it has no intention of blocking The Pirate Bay, unless it is forced to do so by law.

The full list of domains that were blocked this morning are as follows:


Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Run Mixed Workloads with Amazon Redshift Workload Management

Post Syndicated from Suresh Akena original https://aws.amazon.com/blogs/big-data/run-mixed-workloads-with-amazon-redshift-workload-management/

Mixed workloads run batch and interactive workloads (short-running and long-running queries or reports) concurrently to support business needs or demand. Typically, managing and configuring mixed workloads requires a thorough understanding of access patterns, how the system resources are being used and performance requirements.

It’s common for mixed workloads to have some processes that require higher priority than others. Sometimes, this means a certain job must complete within a given SLA. Other times, this means you only want to prevent a non-critical reporting workload from consuming too many cluster resources at any one time.

Without workload management (WLM), each query is prioritized equally, which can cause a person, team, or workload to consume excessive cluster resources for a process which isn’t as valuable as other more business-critical jobs.

This post provides guidelines on common WLM patterns and shows how you can use WLM query insights to optimize configuration in production workloads.

Workload concepts

You can use WLM to define the separation of business concerns and to prioritize the different types of concurrently running queries in the system:

  • Interactive: Software that accepts input from humans as it runs. Interactive software includes most popular programs, such as BI tools or reporting applications.
    • Short-running, read-only user queries such as Tableau dashboard query with low latency requirements.
    • Long-running, read-only user queries such as a complex structured report that aggregates the last 10 years of sales data.
  • Batch: Execution of a job series in a server program without manual intervention (non-interactive). The execution of a series of programs, on a set or “batch” of inputs, rather than a single input, would instead be a custom job.
    • Batch queries includes bulk INSERT, UPDATE, and DELETE transactions, for example, ETL or ELT programs.

Amazon Redshift Workload Management

Amazon Redshift is a fully managed, petabyte scale, columnar, massively parallel data warehouse that offers scalability, security and high performance. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

Amazon Redshift is a good fit for any type of analytical data model, for example, star and snowflake schemas, or simple de-normalized tables.

Managing workloads

Amazon Redshift Workload Management allows you to manage workloads of various sizes and complexity for specific environments. Parameter groups contain WLM configuration, which determines how many query queues are available for processing and how queries are routed to those queues. The default parameter group settings are not configurable. Create a custom parameter group to modify the settings in that group, and then associate it with your cluster. The following settings can be configured:

  • How many queries can run concurrently in each queue
  • How much memory is allocated among the queues
  • How queries are routed to queues, based on criteria such as the user who is running the query or a query label
  • Query timeout settings for a queue

When the user runs a query, WLM assigns the query to the first matching queue and executes rules based on the WLM configuration. For more information about WLM query queues, concurrency, user groups, query groups, timeout configuration, and queue hopping capability, see Defining Query Queues. For more information about the configuration properties that can be changed dynamically, see WLM Dynamic and Static Configuration Properties.

For example, the WLM configuration in the following screenshot has three queues to support ETL, BI, and other users. ETL jobs are assigned to the long-running queue and BI queries to the short-running queue. Other user queries are executed in the default queue.


Guidelines on WLM optimal cluster configuration

1. Separate the business concerns and run queries independently from each other

Create independent queues to support different business processes, such as dashboard queries and ETL. For example, creating a separate queue for one-time queries would be a good solution so that they don’t block more important ETL jobs.

Additionally, because faster queries typically use a smaller amount of memory, you can set a low percentage for WLM memory percent to use for that one-time user queue or query group.

2. Rotate the concurrency and memory allocations based on the access patterns (if applicable)

In traditional data management, ETL jobs pull the data from the source systems in a specific batch window, transform, and then load the data into the target data warehouse. In this approach, you can allocate more concurrency and memory to the BI_USER group and very limited resources to ETL_USER during business hours. After hours, you can dynamically allocate or switch the resources to ETL_USER without rebooting the cluster so that heavy, resource-intensive jobs complete very quickly.

Note: The example AWS CLI command is shown on several lines for demonstration purposes. Actual commands should not have line breaks and must be submitted as a single line. The following JSON configuration requires escaped quotes.


To change WLM settings dynamically, AWS recommends a scheduled Lambda function or scheduled data pipeline (ShellCmd).

3. Use queue hopping to optimize or support mixed workload (ETL and BI workload) continuously

WLM queue hopping allows read-only queries (BI_USER queries) to move from one queue to another queue without cancelling them completely. For example, as shown in the following screenshot, you can create two queues—one with a 60-second timeout for interactive queries and another with no timeout for batch queries—and add the same user group, BI_USER, to each queue. WLM automatically re-routes any only BI_USER timed-out queries in the interactive queue to the batch queue and restarts them.


In this example, the ETL workload does not block the BI workload queries and the BI workload is eventually classified as batch, so that long-running, read-only queries do not block the execution of quick-running queries from the same user group.

4. Increase the slot count temporarily for resource-intensive ETL or batch queries

Amazon Redshift writes intermediate results to the disk to help prevent out-of-memory errors, but the disk I/O can degrade the performance. The following query shows if any active queries are currently running on disk:

SELECT query, label, is_diskbased FROM svv_query_state
WHERE is_diskbased = 't';

Query results:

query | label        | is_diskbased
1025   | hash tbl=142 |      t

Typically, hashes, aggregates, and sort operators are likely to write data to disk if the system doesn’t have enough memory allocated for query processing. To fix this issue, allocate more memory to the query by temporarily increasing the number of query slots that it uses. For example, a queue with a concurrency level of 4 has 4 slots. When the slot count is set to 4, a single query uses the entire available memory of that queue. Note that assigning several slots to one query consumes the concurrency and blocks other queries from being able to run.

In the following example, I set the slot count to 4 before running the query and then reset the slot count back to 1 after the query finishes.

set wlm_query_slot_count to 4;
	count(distinct ps_suppkey) as supplier_cnt
	p_partkey = ps_partkey
	and p_brand <> 'Brand#21'
	and p_type not like 'LARGE POLISHED%'
	and p_size in (26, 40, 28, 23, 17, 41, 2, 20)
	and ps_suppkey not in (
			s_comment like '%Customer%Complaints%'
group by
order by
	supplier_cnt desc,

set wlm_query_slot_count to 1; -- after query completion, resetting slot count back to 1

Note:  The above TPC data set query is used for illustration purposes only.

Example insights from WLM queries

The following example queries can help answer questions you might have about your workloads:

  • What is the current query queue configuration? What is the number of query slots and the time out defined for each queue?
  • How many queries are executed, queued and executing per query queue?
  • What is my workload look like for each query queue per hour? Do I need to change my configuration based on the load?
  • How is my existing WLM configuration working? What query queues should be optimized to meet the business demand?

WLM configures query queues according to internally-defined WLM service classes. The terms queue and service class are often used interchangeably in the system tables.

Amazon Redshift creates several internal queues according to these service classes along with the queues defined in the WLM configuration. Each service class has a unique ID. Service classes 1-4 are reserved for system use and the superuser queue uses service class 5. User-defined queues use service class 6 and greater.

Query: Existing WLM configuration

Run the following query to check the existing WLM configuration. Four queues are configured and every queue is assigned to a number. In the query, queue number is mapped to service_class (Queue #1 => ETL_USER=>Service class 6) with the evictable flag set to false (no query timeout defined).

select service_class, num_query_tasks, evictable, eviction_threshold, name
from stv_wlm_service_class_config
where service_class > 5;

The query above provides information about the current WLM configuration. This query can be automated using Lambda and send notifications to the operations team whenever there is a change to WLM.

Query: Queue state

Run the following query to monitor the state of the queues, the memory allocation for each queue and the number of queries executed in each queue. The query provides information about the custom queues and the superuser queue.

select config.service_class, config.name
, trim (class.condition) as description
, config.num_query_tasks as slots
, config.max_execution_time as max_time
, state.num_queued_queries queued
, state.num_executing_queries executing
, state.num_executed_queries executed
class.action_service_class = config.service_class
and class.action_service_class = state.service_class
and config.service_class > 5
order by config.service_class;


Service class 9 is not being used in the above results. This would allow you to configure minimum possible resources (concurrency and memory) for default queue. Service class 6, etl_group, has executed more queries so you may configure or re-assign more memory and concurrency for this group.

Query: After the last cluster restart

The following query shows the number of queries that are either executing or have completed executing by service class after the last cluster restart.

select service_class, num_executing_queries,  num_executed_queries
from stv_wlm_service_class_state
where service_class >5
order by service_class;

Service class 9 is not being used in the above results. Service class 6, etl_group, has executed more queries than any other service class. You may want configure more memory and concurrency for this group to speed up query processing.

Query: Hourly workload for each WLM query queue

The following query returns the hourly workload for each WLM query queue. Use this query to fine-tune WLM queues that contain too many or too few slots, resulting in WLM queuing or unutilized cluster memory. You can copy this query (wlm_apex_hourly.sql) from the amazon-redshift-utils GitHub repo.

        -- Replace STL_SCAN in generate_dt_series with another table which has > 604800 rows if STL_SCAN does not
        generate_dt_series AS (select sysdate - (n * interval '1 second') as dt from (select row_number() over () as n from stl_scan limit 604800)),
        apex AS (SELECT iq.dt, iq.service_class, iq.num_query_tasks, count(iq.slot_count) as service_class_queries, sum(iq.slot_count) as service_class_slots
                (select gds.dt, wq.service_class, wscc.num_query_tasks, wq.slot_count
                FROM stl_wlm_query wq
                JOIN stv_wlm_service_class_config wscc ON (wscc.service_class = wq.service_class AND wscc.service_class > 4)
                JOIN generate_dt_series gds ON (wq.service_class_start_time <= gds.dt AND wq.service_class_end_time > gds.dt)
                WHERE wq.userid > 1 AND wq.service_class > 4) iq
        GROUP BY iq.dt, iq.service_class, iq.num_query_tasks),
        maxes as (SELECT apex.service_class, trunc(apex.dt) as d, date_part(h,apex.dt) as dt_h, max(service_class_slots) max_service_class_slots
                        from apex group by apex.service_class, apex.dt, date_part(h,apex.dt))
SELECT apex.service_class, apex.num_query_tasks as max_wlm_concurrency, maxes.d as day, maxes.dt_h || ':00 - ' || maxes.dt_h || ':59' as hour, MAX(apex.service_class_slots) as max_service_class_slots
FROM apex
JOIN maxes ON (apex.service_class = maxes.service_class AND apex.service_class_slots = maxes.max_service_class_slots)
GROUP BY  apex.service_class, apex.num_query_tasks, maxes.d, maxes.dt_h
ORDER BY apex.service_class, maxes.d, maxes.dt_h;

For the purposes of this post, the results are broken down by service class.


In the above results, service class#6 seems to be utilized consistently up to 8 slots in 24 hrs. Looking at these numbers, no change is required for this service class at this point.


Service class#7 can be optimized based on the above results. Two observations to note:

  • 6am- 3pm or 6pm- 6am (next day): The maximum number of slots used is 3. There is an opportunity to rotate concurrency and memory allocation based on these access patterns. For more information about how to rotate resources dynamically, see the guidelines section earlier in the post.
  • 3pm-6pm: Peak is observed during this period. You can leave the existing configuration during this time.


Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. Using the WLM feature, you can ensure that different users and processes running on the cluster receive the appropriate amount of resource to maximize performance and throughput.

If you have questions or suggestions, please leave a comment below.


About the Author

Suresh_90Suresh Akena is a Senior Big data/IT Transformation architect for AWS Professional Services. He works with the enterprise customers to provide leadership on large scale data strategies including migration to AWS platform, big data and analytics projects and help them to optimize and improve time to market for data driven applications when using AWS. In his spare time, he likes to play with his 8 and 3 year old daughters and watch movies.





Top 10 Performance Tuning Techniques for Amazon Redshift



Introducing PIXEL

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/introducing-pixel/

It was just over two years ago when I walked into Pi Towers for the first time. I only had the vaguest idea of what I was going to be doing, but on the first day Eben and I sat down and played with the Raspbian desktop for half an hour, then he asked me “do you think you can make it better?”


Bear in mind that at this point I’d barely ever used Linux or Xwindows, never mind made any changes to them, so when I answered “hmmm – I think so”, it was with rather more confidence than I actually felt. It was obvious that there was a lot that could be done in terms of making it a better experience for the user, and I spent many years working in user interface design in previous jobs. But I had no idea where to start in terms of changing Raspbian. I clearly had a bit of a learning curve in front of me…

Well, that was two years ago, and I’ve learnt an awful lot since then. It’s actually surprisingly easy to hack about with the LXDE desktop once you get your head around what all the bits do, and since then I’ve been slowly chipping away at the bits that I felt would most benefit from tweaking. Stuff has slowly been becoming more and more like my original concept for the desktop; with the latest changes, I think the desktop has reached the point where it’s a complete product in its own right and should have its own name. So today, we’re announcing the release of the PIXEL desktop, which will ship with the Foundation’s Raspbian image from now on.



One of the things I said (at least partly in jest) to my colleagues in those first few weeks was that I’d quite like to rename the desktop environment once it was a bit more Pi-specific, and I had the name “pixel” in my mind about two weeks in. It was a nice reminder of my days learning to program in BASIC on the Sinclair ZX81; nowadays, everything from your TV to your phone has pixels on it, but back then it was a uniquely “computer-y” word and concept. I also like crosswords and word games, and once it occurred to me that “pixel” could be made up from the initials of words like Pi and Xwindows, the name stuck in my head and never quite went away. So PIXEL it is, which now officially stands for “Pi Improved Xwindows Environment, Lightweight”.

What’s new?

The latest set of changes are almost entirely to do with the appearance of the desktop; there are some functional changes and a few new applications, about which more below, but this is mostly about making things look nicer.

The first thing you’ll notice on rebooting is that the trail of cryptic boot messages has (mostly) gone, replaced by a splash screen. One feature which has frequently been requested is an obvious version number for our Raspbian image, and this can now be seen at the bottom-right of the splash image. We’ll update this whenever we release a new version of the image, so it should hopefully be slightly easier to know exactly what version you’re running in future.


I should mention that the code for the splash screen has been carefully written and tested, and should not slow down the Pi’s boot process; the time to go from powering on to the desktop appearing is identical, whether the splash is shown or not.

Desktop pictures

Once the desktop appears, the first thing you’ll notice is the rather stunning background image. We’re very fortunate in that Greg Annandale, one of the Foundation’s developers, is also a very talented (and very well-travelled) photographer, and he has kindly allowed us to use some of his work as desktop pictures for PIXEL. There are 16 images to choose from; you can find them in /usr/share/pixel-wallpaper/, and you can use the Appearance Settings application to choose which one you prefer. Do have a look through them, as Greg’s work is well worth seeing! If you’re curious, the EXIF data in each image will tell you where it was taken.





You’ll also notice that the icons on the taskbar, menu, and file manager have had a makeover. Sam Alder and Alex Carter, the guys responsible for all the cartoons and graphics you see on our website, have been sweating blood over these for the last few months, with Eben providing a watchful eye to make sure every pixel was exactly the right colour! We wanted something that looked businesslike enough to be appropriate for those people who use the Pi desktop for serious work, but with just a touch of playfulness, and Sam and Alex did a great job. (Some of the icons you don’t see immediately are even nicer; it’s almost worth installing some education or engineering applications just so those categories appear in the menu…)


Speaking of icons, the default is now not to show icons in individual application menus. These always made menus look a bit crowded, and didn’t really offer any improvement in usability, not least because it wasn’t always that obvious what the icon was supposed to represent… The menus look cleaner and more readable as a result, since the lack of visual clutter now makes them easier to use.

Finally on the subject of icons, in the past if your Pi was working particularly hard, you might have noticed some yellow and red squares appearing in the top-right corner of the screen, which were indications of overtemperature or undervoltage. These have now been replaced with some new symbols that make it a bit more obvious what’s actually happening; there’s a lightning bolt for undervoltage, and a thermometer for overtemperature.


If you open a window, you’ll see that the window frame design has now changed significantly. The old window design always looked a bit dated compared to what Apple and Microsoft are now shipping, so I was keen to update it. Windows now have a subtle curve on the corners, a cleaner title bar with new close / minimise / maximise icons, and a much thinner frame. One reason the frame was quite thick on the old windows was so that the grab handles for resizing were big enough to find with the mouse. To avoid this problem, the grab handles now extend slightly outside the window; if you hold the mouse pointer just outside the window which has focus, you’ll see the pointer change to show the handle.



Steve Jobs said that one thing he was insistent on about the Macintosh was that its typography was good, and it’s true that using the right fonts makes a big difference. We’ve been using the Roboto font in the desktop for the last couple of years; it’s a nice-looking modern font, and it hasn’t changed for this release. However, we have made it look better in PIXEL by including the Infinality font rendering package. This is a library of tweaks and customisations that optimises how fonts are mapped to pixels on the screen; the effect is quite subtle, but it does give a noticeable improvement in some places.


Most people have their Pi set up to automatically log in when the desktop starts, as this is the default setting for a new install. For those who prefer to log in manually each time, the login screen has been redesigned to visually match the rest of the desktop; you now see the login box (known as the “greeter”) over your chosen desktop design, with a seamless transition from greeter to desktop.


Wireless power switching

One request we have had in the past is to be able to shut off WiFi and/or Bluetooth completely, particularly on Pi 3. There are now options in the WiFi and Bluetooth menus to turn off the relevant devices. These work on the Pi 3’s onboard wireless hardware; they should also work on most external WiFi and Bluetooth dongles.

You can also now disconnect from an associated wireless access point by clicking on its entry in the WiFi menu.

New applications

There are a couple of new applications now included in the image.

RealVNC have ported their VNC server and viewer applications to Pi, and they are now integrated with the system. To enable the server, select the option on the Interfaces tab in Raspberry Pi Configuration; you’ll see the VNC menu appear on the taskbar, and you can then log in to your Pi and control it remotely from a VNC viewer.

The RealVNC viewer is also included – you can find it from the Internet section of the Applications menu – and it allows you to control other RealVNC clients, including other Pis. Have a look here on RealVNC’s site for more information.


Please note that if you already use xrdp to remotely access your Pi, this conflicts with the RealVNC server, so you shouldn’t install both at once. If you’re updating an existing image, don’t run the sudo apt-get install realvnc-vnc-server line in the instructions below. If you want to use xrdp on a clean image, first uninstall the RealVNC server with sudo apt-get purge realvnc-vnc-server before installing xrdp. (If the above paragraph means nothing to you, then you probably aren’t using xrdp, so you don’t have to worry about any of it!)

Also included is the new SenseHAT emulator, which was described in a blog post a couple of weeks ago; have a look here for all the details.



There are updates for a number of the built-in applications; these are mostly tweaks and bug fixes, but there have been improvements made to Scratch and Node-RED.

One more thing…

We’ve been shipping the Epiphany web browser for the last couple of years, but it’s now starting to show its age. So for this release (and with many thanks to Gustav Hansen from the forums for his invaluable help with this), we’re including an initial release of Chromium for the Pi. This uses the Pi’s hardware to accelerate playback of streaming video content.


We’ve preinstalled a couple of extensions; the uBlock Origin adblocker should hopefully keep intrusive adverts from slowing down your browsing experience, and the h264ify extension forces YouTube to serve videos in a format which can be accelerated by the Pi’s hardware.

Chromium is a much more demanding piece of software than Epiphany, but it runs well on Pi 2 and Pi 3; it can struggle slightly on the Pi 1 and Pi Zero, but it’s still usable. (Epiphany is still installed in case you find it useful; launch it from the command line by typing “epiphany-browser”.)

How do I get it?

The Raspbian + PIXEL image is available from the Downloads page on our website now.

To update an existing Jessie image, type the following at the command line:

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install -y rpi-chromium-mods
sudo apt-get install -y python-sense-emu python3-sense-emu
sudo apt-get install -y python-sense-emu-doc realvnc-vnc-viewer

and then reboot.

If you don’t use xrdp and would like to use the RealVNC server to remotely access your Pi, type the following:

sudo apt-get install -y realvnc-vnc-server

As always, your feedback on the new release is very welcome; feel free to let us know what you think in the comments or on the forums.

The post Introducing PIXEL appeared first on Raspberry Pi.

TPBClean: A ‘Safe for Work’ Pirate Bay Without Porn

Post Syndicated from Ernesto original https://torrentfreak.com/tpbclean-safe-work-pirate-bay-without-porn-160925/

tpbcleanOver the years, regular Pirate Bay visitors have seen plenty of scarcely dressed girls being featured on the site.

The XXX category has traditionally been one of the largest, and TPB’s ads can show quite a bit of flesh as well. While there are many people who see this adult themed content as a feature, not everyone appreciates it.

This didn’t go unnoticed to “MrClean,” a developer with quite a bit of experience when it comes to torrent proxy sites.

He was recently confronted with the issue when several Indian programmers he tried to hire for torrent related projects refused to work on a site that listed porn and other nudity.

“Over fifty percent of those contacted refused to work on the project, not for copyright related reasons, but because they didn’t want to work on a project that had ANY links with adult content,” MrClean tells TF.

Apparently, there is a need for pornless torrent sites, so “MrClean” decided to make a sanitized torrent site for these proper folks. This is how the TPBClean proxy site was born.

“Since TPB is now the biggest torrent site again, I figured there might be people with similar feelings toward adult content that would appreciate a clean version of the bay,” MrClean explains.

TPBClean is a direct proxy of all Pirate Bay torrents, minus those in the porn categories. In addition, the site has a customized look, without any ads.

“Although many TPB users don’t mind tits in their face while searching the top 100, TPBClean users will have a more ‘Safe For Work’ experience,” MrClean notes.

People who try searching for XXX content get the following response instead. “Any Explicit/Adult Content has been Removed 🙂” We did some test searches and it appears to work quite well.

XXX filtered


The Pirate Bay team can usually appreciate inventive creative expressions, although they might find it hard to believe that anyone would be interested in a torrent site without porn.

In any case, MrClean says that he doesn’t mean to do any harm to the original Pirate Bay, which he full-heartedly supports. And thus far the public response has mainly been positive as well.

“Finally a PirateBay you can browse with your granny!” Pirate Bay proxy portal UKBay tweeted excitingly.


Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

‘Will Trump Shut Down The Pirate Bay?’

Post Syndicated from Ernesto original https://torrentfreak.com/will-trump-shut-pirate-bay-160911/

trumpdNo, Trump personally can’t and won’t shut down The Pirate Bay. Period.

Excuse me for the clickbait title and the strange intro, but since it’s the topic of this opinion piece, I thought it was warranted.

Here’s what’s going on.

The torrent community is in turmoil after the shutdowns of KAT and Torrentz. We’ve written about this extensively, but there’s a rather frustrating side-effect that we haven’t discussed so far.

For some reason there’s a slew of news sites, prominently featured in search engines and on social media, that keep spreading fear and panic about a looming Pirate Bay shutdown.

These publications take every piece of file-sharing related news, often sourced from TorrentFreak, and rewrite it in a way that suggests the world’s number one torrent site may disappear, or is already gone.

Here are just a few headlines I’ve seen over the past few days. Click on the links at your own risk.

    • Pirate Bay, Extra Torrent Shutting Down; Fans In Search For Best Torrent Alternative (link)
    • The Pirate Bay (TPB) Shut Down Imminent After Service Partner Faces Piracy Lawsuit (link)
    • Goodbye The Pirate Bay? Cloudflare Under Fire For Helping TPB, Terror Groups (link)
    • The Pirate Bay to shut down soon? (link)
    • The Pirate Bay Shut Down Rumors: Once Site Goes Down, US Library Of Congress Might Be The Next Piracy Haven (link)
  • Pirate Bay to Shut Down…


    • The Pirate Bay To Shut Down Soon As Excipio Starts To Shoot And Kill Torrent Sites? (link)
    • TPB Now Leads The Pack Of Torrent Sites, But Might Shut Down Soon? List Of Top Torrent Sites Inside (link)
    • The Pirate Bay (TPB), KickassTorrents, Torrentz Shut Down: US Library of Congress As Next Alternative? (link)
  • These reports have absolutely nothing to do with an apparent Pirate Bay shutdown of course.

    The last one, for example, bizarrely connects concerns the RIAA has about access to digital works at the Library of Congress, to the potential demise of TPB, which is pure nonsense.

    Pirate Bay Declared Dead


    Many other articles follow the same format, writing nonsensical trash such as the following:

    “Other torrent sites such as TorrentFreak is not happy with the growing population of The Pirate Bay but they do appreciate the role that TPB is playing in the world of torrent sites.”

    The quote above comes from The Parent Herald, which also suggests that copyright trolls plan to fine The Pirate Bay. Clearly, they have not read the TorrentFreak article on the topic, which they’re quoting, or they simply don’t understand it.

    Might Shut Down Soon?


    So why are these “news” sites reporting this type of doom and gloom? The short answer is ad views. The clickbait articles are shared on social media, appear in Google news and in search results.

    The latter can bring in thousands of views. If people Google for “The Pirate Bay,” these headlines are featured as “news” and beg to be clicked on, generating revenue for the sites in question. For the very same reason you’ll see numerous articles about KAT and Torrentz alternatives.

    Click, Click, Click


    Why are we complaining about this? Well, these news reports are picked up by other sites and shared among thousands of people. At TorrentFreak we do our best to report news as accurately as possible, and these clickbait articles go directly against this, often using our name.

    We have addressed the clickbait issue in the past but in recent months it has gotten much worse.

    While there are many different sites guilty of this practice, we recently stumbled upon a ring of related publications that all belong to the same company. They carry names such as Parent Herald, iSports Times., University Herald, Mobile&Apps and share a similar layout and design.

    The owner in question, according to the copyright statement, is the New York based company IQ Adnet, which is… surprise surprise, an ad network that specializes in premium digital and native advertising. That explains everything.

    There’s not much we can do about this, unfortunately, besides telling people what’s really going on and venting our frustration every now and then.

    In the meantime, we’ll be waiting for these sites to pick up the Trump angle, which shouldn’t take long.

    For the record. At TorrentFreak we don’t use pay per view ads, partly to get rid of the pageview obsession. This means that the clickbait title we used for this article doesn’t bring in any extra money.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

    The latest update to Raspbian

    Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/another-update-raspbian/

    No exciting new hardware announcement to tie it to this time, but we’ve just released a new version of our Raspbian image with some (hopefully) useful features. Read on for all the details of what has changed…


    When the Pi 3 launched back in February, we’d not had time to do much in terms of getting access to the new onboard Bluetooth hardware. There was a working software stack, but the UI was non-existent.

    I’d hoped to be able to use one of the existing Linux Bluetooth UIs, but on trying them all, none were really what I was looking for in terms of usability and integration with the look and feel of the desktop. I really didn’t want to write one from scratch, but that ended up being what I did, which meant a fun few weeks trying to make head or tail of the mysteries of BlueZ and D-Bus. After a few false starts, I finally got something I felt was usable, and so there is now a Bluetooth plugin for the lxpanel taskbar.


    On the taskbar, to the left of the network icon, there is now a Bluetooth icon. Clicking this opens a menu which allows you to make the Pi discoverable by other devices, or to add or remove a Bluetooth device. Selecting the ‘Add Device…’ option opens a window which will gradually populate with any discoverable Bluetooth devices which are in range – just select the one you want to pair with and press the ‘Pair’ button.


    You will then be guided through the pairing procedure, the nature of which depends on the device. With many devices (such as mice or speakers), pairing is entirely automatic and requires no user interaction; on others you may be asked to enter a code or to confirm that a code displayed on a remote device matches that shown on the Pi. Follow the prompts, and (all being well), you should be rewarded with a dialog telling you that pairing was successful.

    Paired devices are listed at the end of the Bluetooth menu – these menu entries can be used to connect or disconnect a paired device. To remove a pairing completely, use the ‘Remove Device…’ option in the menu.

    Bluetooth support is limited at this stage; you can pair with pretty much anything, but you can only usefully connect to devices which support either the Human Interface Device or Audio Sink services – in other words, mice, keyboards and other UI devices, and speakers and headsets.

    Devices should reconnect after a reboot or on powering up your Pi, but bear in mind that keyboards and mice may need you to press a key or click the mouse button to wake them from sleep when first used after a power-up.

    The Bluetooth UI should also work with an external Bluetooth dongle on platforms other than Pi 3 – I’ve successfully tested it with a Targus dongle on all the earlier platforms.

    Bluetooth audio

    The UI now supports the use of Bluetooth speakers and headsets for audio output, with a few caveats, about which more below.

    To connect an audio device, you pair it as described above – it will then be listed in the audio device menu, accessible by right-clicking the speaker icon on the taskbar.


    Selecting a Bluetooth device from the audio device menu will cause it to be selected as the default audio output device – there will be a few seconds’ pause while the connection is established. You can then use the volume control on the taskbar to control it, as for standard wired audio devices.

    There is one issue with the support for Bluetooth audio, however. Due to the way the Bluetooth stack has been written, Bluetooth devices do not appear to the system as standard ALSA audio devices – they require the use of an intermediate audio layer called PulseAudio. The PulseAudio magic is all built into the UI – you don’t need to worry about setting it up – but the problem is that not all applications are able to send audio to the PulseAudio interface, and therefore cannot output audio over Bluetooth.

    Most applications work just fine – videos and music work in the Epiphany and Iceweasel browsers, as does the command-line mplayer music player and the vlc media player. But at present neither Scratch nor Sonic Pi can output audio over Bluetooth – we are working with the authors of these programs to address this and are hopeful that both can be made compatible, so please bear with us!

    The use of PulseAudio has one other effect that may cause issues for a small number of users – specifically, if you are already using PulseAudio for anything other than interfacing with Bluetooth devices. This plugin will automatically stop the PulseAudio service whenever a standard ALSA device is selected. If you are using PulseAudio for your own purposes, it would be best to remove the volumealsa plugin from the taskbar completely to avoid this – just right-click anywhere on the taskbar, choose ‘Add/Remove Panel Items’, and remove the “Volume Control (ALSA)” item from the list.

    SD card copier

    One query which comes up a lot on the forums is about the best way to back up your Pi. People also want to know how to migrate their Raspbian install to a new SD card which is larger or smaller than the one they are using at the moment. This has been difficult with the command-line tools that we’ve recommended in the past, so there is now a new application to help with this, and you’ll find it in the menu under ‘Accessories’.


    The SD Card Copier application will copy Raspbian from one card to another – that’s pretty much all it does – but there are several useful things that you can do as a result. To use it, you will need a USB SD card writer.

    To take a common example: what if you want to back up your existing Raspbian installation? Put a blank SD card in your USB card writer and plug it into your Pi, and then launch SD Card Copier. In the ‘Copy From Device’ box, select “Internal SD Card”, and then select the USB card writer in the ‘Copy To Device’ box (where it will probably be the only device listed). Press ‘Start’, watch the messages on the screen and wait – in ten or fifteen minutes, you should have a clone of your current installation on the new SD card. You can test it by putting the newly-copied card into the Pi’s SD card slot and booting it; it should boot and look exactly the same as your original installation, with all your data and applications intact.

    You can run directly from the backup, but if you want to recover your original card from your backup, simply reverse the process – boot your Pi from the backup card, put the card to which you want to restore into the SD card writer, and repeat the process above.

    The program does not restrict you to only copying to a card the same size as the source; you can copy to a larger card if you are running out of space on your existing one, or even to a smaller card (as long as it has enough space to store all your files – the program will warn you if there isn’t enough space). It has been designed to work with Raspbian and NOOBS images; it may work with other OSes or custom card formats, but this can’t be guaranteed.

    The only restriction is that you cannot write to the internal SD card reader, as that would overwrite the OS you are actually running, which would cause bad things to happen.

    Please also bear in mind that everything on the destination card will be overwritten by this program, so do make sure you’ve got nothing you want to keep on the destination card before you hit Start!


    This image includes the pigpio library from abyz.co.uk – this provides a unified way of accessing the Pi’s GPIO pins from Python, C and other languages. It removes the need to use sudo in programs which want to access the GPIOs, and as a result Scratch now runs sudo-less for everyone.


    One of the tools which is really useful for professional programmers is a good text editor – the simple editor provided with LXDE is fine for small tasks, but not really suitable for serious work.


    The image now includes the Geany editor, which is much better suited to big projects – it offers features like syntax highlighting, automatic indentation and management of multiple files. There’s good online help built into the program itself, or have a look at the Geany website.

    New versions of applications

    There are new versions of many of the standard programs included in the image, including Scratch, Sonic Pi, Node-RED, BlueJ and PyPy. Please see the relevant individual websites or changelists for details of what has changed in each of these.

    New kernel

    The Linux kernel has been upgraded to version 4.4. This change should have no noticeable effect for most users, but it does force the use of device tree; if you’ve been hacking about with your Raspbian install, particularly in terms of installing new hardware, you may find reading this forum post useful.


    There are a lot of small user interface tweaks throughout the system which you may notice. Some of these include:

    • A new Shutdown Options dialog


    • The Mouse and Keyboard Settings dialog now allows you to set the delay between double-clicks of the mouse button


    • The Raspberry Pi Configuration dialog now allows you to enable or disable the single-wire interface, and to enable or disable remote access to the pigpio daemon


    • Right-clicking the Wastebasket icon on the desktop now gives the option to empty the wastebasket


    • The keyboard shortcut Ctrl-Alt-T can now be used to open a Terminal window

    Finally, there are a couple of setup-related features:

    • When flashing a new Raspbian image, the file system will automatically be expanded to use all the space on the card when it is first booted.

    • If a wpa_supplicant.conf file is placed into the /boot/ directory, this will be moved to the /etc/wpa_supplicant/ directory the next time the system is booted, overwriting the network settings; this allows a Wifi configuration to be preloaded onto a card from a Windows or other machine that can only see the boot partition.

    There are also a host of fixes for minor bugs in various parts of the system, and some general cleaning-up of themes and text.

    How do I get it?

    A full image and a NOOBS installer are available from the Downloads page on this website.

    If you are running the current Jessie image, it can be updated to the new version by running

    sudo apt-get update
    sudo apt-get dist-upgrade
    sudo apt-get install piclone geany usb-modeswitch

    As ever, your feedback on the new release is very welcome – feel free to comment here or in the forums.

    The post The latest update to Raspbian appeared first on Raspberry Pi.

    Apple did not invent emoji

    Post Syndicated from Eevee original https://eev.ee/blog/2016/04/12/apple-did-not-invent-emoji/

    I love emoji. I love Unicode in general. I love seeing plain text become more expressive and more universal.

    But, Internet, I’ve noticed a worrying trend. Both popular media and a lot of tech circles tend to assume that “emoji” de facto means Apple’s particular font.

    I have some objections.

    A not so brief history of emoji

    The Unicode Technical Report on emoji also goes over some of this.

    Emoji are generally traced back to the Japanese mobile carrier NTT DoCoMo, which in February 1999 released a service called i-mode which powered a line of wildly popular early smartphones. Its messenger included some 180 small pixel-art images you could type as though they were text, because they were text, encoded using unused space in Shift JIS.

    (Quick background, because I’d like this to be understandable by a general audience: computers only understand numbers, not text, so we need a “character set” that lists all the characters you can type and what numbers represent them. So in ASCII, for example, a capital “A” is passed around as the number 65. Computers always deal with bytes, which can go up to 255, but ASCII only lists characters up to 127 — so everything from 128 to 255 is just unused space. Shift JIS is Japan’s equivalent to ASCII, and had a lot more unused space, and that’s where early emoji were put.)

    Naturally, other carriers added their own variations. Naturally, they used different sets of images, but often in a different order, so the same character might be an apple on one phone and a banana on another. They came up with tables for translating between carriers, but that wouldn’t help if your friend tried to send you an image that your phone just didn’t have. And when these characters started to leak outside of Japan, they had no hope whatsoever of displaying as anything other than garbage.

    This is kind of like how Shift JIS is mostly compatible with ASCII, except that for some reason it has the yen sign ¥ in place of the ASCII backslash , producing hilarious results. Also, this is precisely the problem that Unicode was invented to solve.

    I’ll get back to all this in a minute, but something that’s left out of emoji discussions is that the English-speaking world was developing a similar idea. As far as I can tell, we got our first major exposure to graphical emoticons with the release of AIM 4.0 circa May 2000 and these infamous “smileys”:

    Pixellated, 4-bit smileys from 2000

    Even though AIM was a closed network where there was little risk of having private characters escape, these were all encoded as ASCII emoticons. That simple smiley on the very left would be sent as :-) and turned into an image on your friend’s computer, which meant that if you literally typed :-) in a message, it would still render graphically. Rather than being an extension to regular text, these images were an enhancement of regular text, showing a graphical version of something the text already spelled out. A very fancy ligature.

    Little ink has been spilled over this, but those humble 4-bit graphics became a staple of instant messaging, by which I mean everyone immediately ripped them off. ICQ, MSN Messenger, Yahoo! Messenger, Pidgin (then Gaim), Miranda, Trillian… I can’t name a messenger since 2003 that didn’t have smileys included. All of them still relied on the same approach of substituting graphics for regular ASCII sequences. That had made sense for AIM’s limited palette of faces, but during its heyday MSN Messenger included 67 graphics, most of them not faces. If you sent a smiling crescent moon to someone who had the graphics disabled (or used an alternative client), all they’d see was a mysterious (S).

    So while Japan is generally credited as the source of emoji, the US was quite busy making its own mess of things.

    Anyway, Japan had this mess of several different sets of emoji in common use, being encoded in several different incompatible ways. That’s exactly the sort of mess Unicode exists to sort out, so in mid-2007, several Google employees (one of whom was the co-founder of the Unicode Consortium, which surely helped) put together a draft proposal for adding the emoji to Unicode. The idea was to combine all the sets, drop any duplicates, and add to Unicode whatever wasn’t already there.

    (Unicode is intended as a unification of all character sets. ASCII has , Shift JIS has ¥, but Unicode has both — so an English speaker and a Japanese speaker can both use both characters without getting confused, as long as they’re using Unicode. And so on, for thousands of characters in dozens of character sets. Part of the problem with sending the carriers’ emoji to American computers was that the US was pretty far along in shifting everything to use Unicode, but the emoji simply didn’t exist in Unicode. Obvious solution: add them!)

    Meanwhile, the iPhone launched in Japan in 2008. iOS 2.2, released in November, added the first implementation of emoji — but using SoftBank’s invented encoding, since they were only on one carrier and the characters weren’t yet in Unicode. A couple Apple employees jumped on the bandwagon around that time and coauthored the first official proposal, published in January 2009. Unicode 6.0, the first version to include emoji, was released in October 2010.

    iPhones worldwide gained the ability to use its emoji (now mapped to Unicode) with the release of iOS 5.0 in October 2011.

    Android didn’t get an emoji font at all until version 4.3, in July 2013. I’m at a loss for why, given that Google had proposed emoji in the first place, and Android had been in Japan since the HTC Magic in May 2009. It was even on NTT DoCoMo, the carrier that first introduced emoji! What the heck, Google.

    The state of things

    Consider this travesty of an article from last week. This Genius Theory Will Change the Way You Use the “Pink Lady” Emoji:

    Unicode, creators of the emoji app, call her the “Information Desk Person.”

    Oh, dear. Emoji aren’t an “app”, Unicode didn’t create them, and the person isn’t necessarily female. But the character is named “Information Desk Person”, so at least that part is correct.

    It’s non-technical clickbait, sure. But notice that neither “Apple” nor the names of any of its platforms appear in the text. As far as this article and author are concerned, emoji are Apple’s presentation of them.

    I see also that fileformat.info is now previewing emoji using Apple’s font. Again, there’s no mention of Apple that I can find here; even the page that credits the data and name sources doesn’t mention Apple. The font is even called “Apple Color Emoji”, so you’d think that might show up somewhere.

    Telegram and WhatsApp both use Apple’s font for emoji on every platform; you cannot use your system font. Slack lets you choose, but defaults to Apple’s font. (I objected to Android Telegram’s jarring use of a non-native font; the sole developer explained simply that they like Apple’s font more, and eventually shut down the issue tracker to stop people from discussing it further.)

    The latest revision of Google’s emoji font even made some questionable changes, seemingly just for the sake of more closely resembling Apple’s font. I’ll get into that a bit later, but suffice to say, even Google is quietly treating Apple’s images as a de facto standard.

    The Unicode Consortium will now let you “adopt” a character. If you adopt an emoji, the certificate they print out for you uses Apple’s font.

    It’s a little unusual that this would happen when Android has been more popular than the iPhone almost everywhere, even since iOS first exposed its emoji keyboard worldwide. Also given that Apple’s font is not freely-licensed (so you’re not actually allowed to use it in your project), whereas Google’s whole font family is. And — full disclosure here — quite a few of them look to me like they came from a disquieting uncanny valley populated by plastic people.


    Granted, the iPhone did have a 20-month head start at exposing the English-speaking world to emoji. Plus there’s that whole thing where Apple features are mysteriously assumed to be the first of their kind. I’m not entirely surprised that Apple’s font is treated as canonical; I just have some objections.

    Some objections

    I’m writing this in a terminal that uses Source Code Pro. You’re (probably) reading it on the web in Merriweather. Miraculously, you still understand what all the letters mean, even though they appear fairly differently.

    Emoji are text, just like the text you’re reading now, not too different from those goofy :-) smileys in AIM. They’re often displayed with colorful graphics, but they’re just ideograms, similar to Egyptian hieroglyphs (which are also in Unicode). It’s totally okay to write them a little differently sometimes.

    This the only reason emoji are in Unicode at all — the only reason we have a universal set of little pictures. If they’d been true embedded images, there never would have been any reason to turn them into characters.

    Having them as text means we can use them anywhere we can use text — there’s no need to hunt down a graphic and figure out how to embed it. You want to put emoji in filenames, in source code, in the titlebar of a window? Sure thing — they’re just text.

    Treating emoji as though they are a particular set of graphics rather defeats the point. At best, it confuses people’s understanding of what the heck is going on here, and I don’t much like that.

    I’ve encountered people who genuinely believed that Apple’s emoji were some kind of official standard, and anyone deviating from them was somehow wrong. I wouldn’t be surprised if a lot of lay people believed Apple invented emoji. I can hardly blame them, when we have things like World Emoji Day, based on the date on Apple’s calendar glyph. This is not a good state of affairs.

    Along the same lines, nothing defines an emoji, as I’ve mentioned before. Whether a particular character appears as a colored graphic is purely a property of the fonts you have installed. You could have a font that rendered all English text in sparkly purple letters, if you really wanted to. Or you could have a font that rendered emoji as simple black-and-white outlines like other characters — which is in fact what I have.

    Well… that was true, but mere weeks before that post was published, the Unicode Consortium published a list of characters with a genuine “Emoji” property.

    But, hang on. That list isn’t part of the actual Unicode database; it’s part of a “technical report”, which is informative only. In fact, if you look over the Unicode Technical Report on emoji, you may notice that the bulk of it is merely summarizing what’s being done in the wild. It’s not saying what you must do, only what’s already been done. The very first sentence even says that it’s about interoperability.

    If that doesn’t convince you, consider that the list of “emoji” characters includes # and *. Yes, the ASCII characters on a regular qwerty keyboard. I don’t think this is a particularly good authoritative reference.

    Speaking of which, the same list also contains ©, ®, and — and Twitter’s font has glyphs for all three of them: ©, ®, . They aren’t used on web Twitter, but if you naïvely dropped twemoji into your own project, you’d see these little superscript characters suddenly grow to fit large full-width squares. (Worse, all three of them are a single solid color, so they’ll be unreadable on a dark background.) There’s an excellent reason for this, believe it or not: Shift JIS doesn’t contain any of these characters, so the Japanese carriers faked it by including them as emoji.

    Anyway, the technical report proper is a little more nuanced, breaking emoji into a few coarse groups based on who implements them. (Observe that it uses Apple’s font for all 1282 example emoji.)

    I care about all this because I see an awful lot of tech people link this document as though it were a formal specification, which leads to a curious cycle.

    1. Apple does a thing with emoji.
    2. Because Apple is a major vendor, the thing it did is added to the technical report.
    3. Other people look at the report, believe it to be normative, and also do Apple’s thing because it’s “part of Unicode”.
    4. (Wow, Apple did this first again! They’re so ahead of the curve!)

    After I wrote the above list, I accidentally bumbled upon this page from emojipedia, which states:

    In addition to emojis approved in Unicode 8.0 (mid-2015), iOS 9.1 also includes emoji versions of characters all the way back to Unicode 1.1 (1993) that have retroactively been deemed worthy of emoji presentation by the Unicode Consortium.

    That’s flat-out wrong. The Unicode Consortium has never deemed characters worthy of “emoji presentation” — it’s written reports about the characters that vendors like Apple have given colored glyphs. This paragraph congratulates Apple for having an emoji font that covers every single character Apple decided to put in their emoji font!

    This is a great segue into what happened with Google’s recent update to its own emoji font.

    Google’s emoji font changes

    Android 6.0.1 was released in December 2015, and contained a long-overdue update to its emoji font, Noto Color Emoji. It added newly-defined emoji like 🌭 U+1F32D HOT DOG and 🦄 U+1F984 UNICORN FACE, so, that was pretty good.

    ZWJ sequences

    How is this a segue, you ask? Well, see, there are these curious chimeras called ZWJ sequences — effectively new emoji created by mashing multiple emoji together with a special “glue” character in the middle. Apple used (possibly invented?) this mechanism to create “diverse” versions of several emoji like 💏 U+1F48F KISS. The emoji for two women kissing looks like a single image, but it’s actually written as seven characters: woman + heart + kiss + woman with some glue between them. It’s a lot like those AIM smileys, only not ASCII under the hood.

    So, that’s fine, it makes sense, I guess. But then Apple added a new chimera emoji: a speech bubble with an eyeball in it, written as eye + speech bubble. It turned out to be some kind of symbol related to an anti-bullying campaign, dreamed up in conjunction with the Ad Council (?!). I’ve never seen it used and never heard about this campaign outside of being a huge Unicode nerd.

    Lo and behold, it appeared in the updated font. And Twitter’s font. And Emoji One.

    Is this how we want it to work? Apple is free to invent whatever it wants by mashing emoji together, and everyone else treats it as canonical, with no resistance whatsoever? Apple gets to deliberately circumvent the Unicode character process?

    Apple appreciated the symbol, too. “When we first asked about bringing this emoji to the official Apple keyboard, they told us it would take at least a year or two to get it through and approved under Unicode,” says Wittmark. The company found a way to fast-track it, she says, by combining two existing emoji.

    Maybe this is truly a worthy cause. I don’t know. All I know is that Apple added a character (designed by an ad agency) basically on a whim, and now it’s enshrined forever in Unicode documents. There doesn’t seem to be any real incentive for them to not do this again. I can’t wait for apple + laptop to become the MacBook Pro™ emoji.

    (On the other hand, I can absolutely get behind ninja cat.)

    Gender diversity

    I take issue with using this mechanism for some of the “diverse” emoji as well. I didn’t even realize the problem until Google copied Apple’s implementation.

    The basic emoji in question are 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART. The emoji technical report contains the following advice, emphasis mine:

    Some multi-person groupings explicitly indicate gender: MAN AND WOMAN HOLDING HANDS, TWO MEN HOLDING HANDS, TWO WOMEN HOLDING HANDS. Others do not: KISS, COUPLE WITH HEART, FAMILY (the latter is also non-specific as to the number of adult and child members). While the default representation for the characters in the latter group should be gender-neutral, implementations may desire to provide (and users may desire to have available) multiple representations of each of these with a variety of more-specific gender combinations.

    This reinforces the document’s general advice about gender which comes down to: if the name doesn’t explicitly reference gender, the image should be gender-neutral. Makes sense.

    Here’s how 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART look, before and after the font update.

    Pictured: straight people, ruining everything

    Before, both images were gender-agnostic blobs. Now, with the increased “diversity”, you can choose from various combinations of genders… but the genderless version is gone. The default — what you get from the single characters on their own, without any chimera gluing stuff — is heteromance.

    In fact, almost every major font does this for both KISS and COUPLE WITH HEART, save for Microsoft’s. (HTC’s KISS doesn’t, but only because it doesn’t show people at all.)

    Google’s font has changed from “here are two people” to “heterosexuals are the default, but you can use some other particular combinations too”. This isn’t a step towards diversity; this is a step backwards. It also violates the advice in the very document that’s largely based on “whatever Apple and Google are doing”, which is confounding.

    Sometimes, Apple is wrong

    It also highlights another problem with treating Apple’s font as canonical, which is that Apple is occasionally wrong. I concede that “wrong” is a fuzzy concept here, but I think “surprising, given the name of the character” is a reasonable definition.

    In that sense, everyone but Microsoft is wrong about 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART, since neither character mentions gender.

    You might expect 🙌 U+1F64C PERSON RAISING BOTH HANDS IN CELEBRATION and 🙏 U+1F64F PERSON WITH FOLDED HANDS to depict people, but Apple only shows a pair of hands for both of them. This is particularly bad with PERSON WITH FOLDED HANDS, which just looks like a high five. Almost every other font has followed suit (CELEBRATION, FOLDED HANDS). Google used to get this right, but changed it with the update.

    Celebration changed to pat-a-cake, for some reason

    👿 U+1F47F IMP suggests, er, an imp, especially since it’s right next to other “monster” characters like 👾 U+1F47E ALIEN MONSTER and 👹 U+1F479 JAPANESE OGRE. Apple appears to have copied its own 😈 U+1F608 SMILING FACE WITH HORNS from the emoticons block and changed the smile to a frown, producing something I would never guess is meant to be an imp. Google followed suit, just like most other fonts, resulting in the tragic loss of one of my favorite Noto glyphs and the only generic representation of a demon.

    This is going to wreak havoc on all my tweets about Doom

    👯 U+1F46F WOMAN WITH BUNNY EARS suggests a woman. Apple has two, for some reason, though that hasn’t been copied quite as much.

    ⬜ U+2B1C WHITE LARGE SQUARE needs a little explanation. Before Unicode contained any emoji (several of which are named with explicit colors), quite a few character names used “black” to mean “filled” and “white” to mean “empty”, referring to how the character would look when printed in black ink on white paper. “White large square” really means the outline of a square, in contrast to ⬛ U+2B1B BLACK LARGE SQUARE, which is solid. Unfortunately, both of these characters somehow ended up in virtually every emoji font, despite not being in the original lists of Japanese carriers’ emoji… and everyone gets it wrong, save for Microsoft. Every single font shows a solid square colored white. Except Google, who colors it blue. And Facebook, who has some kind of window frame, which it colors black for the BLACK glyph.

    When Apple screws up and doesn’t fix it, everyone else copies their screw-up for the sake of compatibility — and as far as I can tell, the only time Apple has ever changed emoji is for the addition of skin tones and when updating images of their own products. We’re letting Apple set a de facto standard for the appearance of text, even when they’re incorrect, because… well, I’m not even sure why.

    Hand gestures

    Returning briefly to the idea of diversity, Google also updated the glyphs for its dozen or so “hand gesture” emoji:

    Hmm I wonder where they got the inspiration for these

    They used to be pink outlines with a flat white fill, but now are a more realistic flat style with the same yellow as the blob faces and shading. This is almost certainly for the sake of supporting the skin tone modifiers later, though Noto doesn’t actually support them yet.

    The problem is, the new ones are much harder to tell apart at a glance! The shadows are very subtle, especially at small sizes, so they might as well all be yellow splats.

    I always saw the old glyphs as abstract symbols, rather than a crop of a person, even a cartoony person. That might be because I’m white as hell, though. I don’t know. If people of color generally saw them the same way, it seems a shame to have made them all less distinct.

    It’s not like the pink and white style would’ve prevented Noto from supporting skin tones in the future, either. Nothing says an emoji with a skin tone has to look exactly like the same emoji without one. The font could easily use the more abstract symbols by default, and switch to this more realistic style when combined with a skin tone.


    And finally, some kind of tragic accident has made 💩 U+1F4A9 PILE OF POO turn super goofy and grow a face.

    What even IS that now?

    Why? Well, you see, Apple’s has a face. And so does almost everyone else’s, now.

    I looked at the original draft proposal for this one, and SoftBank (the network the iPhone first launched on in Japan) also had a face for this character, whereas KDDI did not. So the true origin is probably just that one particular carrier happened to strike a deal to carry the iPhone first.

    Interop and confusion

    I’m sure the rationale for many of these changes was to reduce confusion when Android and iOS devices communicate. I’m sure plenty of people celebrated the changes on those grounds.

    I was subscribed to several Android Telegram issues about emoji before the issue tracker was shut down, so I got a glimpse into how people feel about this. One person was particularly adamant that in general, the recipient should always see exactly the same image that the sender chose. Which sounds… like it’s asking for embedded images. Which Telegram supports. So maybe use those instead?

    I grew up on the Internet, in a time when ^_^ looked terrible in mIRC’s default font of Fixedsys but just fine in PIRCH98. Some people used MS Comic Chat, which would try to encode actions in a way that looked like annoying noise to everyone else. Abbreviations were still a novelty, so you might not know what “ttfn” means.

    Somehow, we all survived. We caught on, we asked for clarification, we learned the rules, and life went on. All human communication is ambiguous, so it baffles me when people bring up “there’s more than one emoji font” as though it spelled the end of civilization. Someone might read what you wrote and interpret it differently than you intended? Damn, that is definitely a new and serious problem that we have no idea how to handle.

    It sounds to me how this would’ve sounded in 1998:

    A: ^_^
    B: Wow, that looks totally goofy over here. I’m using mIRC.
    A: Oh, I see the problem. Every IRC client should use Arial, like PIRCH does.

    That is, after all, the usual subtext: every font should just copy whatever Apple does. Let’s not.

    Look, science!

    Conveniently for me, someone just did a study on this. Here’s what I found most interesting:

    Overall, we found that if you send an emoji across platform boundaries (e.g., an iPhone to a Nexus), the sender and the receiver will differ by about 2.04 points on average on our -5 to 5 sentiment scale. However, even within platforms, the average difference is 1.88 points.

    In other words, people still interpret the same exact glyph differently — just like people sometimes interpret the same words differently.

    The gap between same-glyph and different-glyph is a mere 0.16 points out of a 10-point scale, a mere 1.6%. The paper still concludes that the designs should move closer together, and sure, they totally should — towards what the characters describe.

    To underscore that idea, note the summary page discusses U+1F601 😁 GRINNING FACE WITH SMILING EYES across five different fonts. Surely this should express something positive, right? Grins are positive, smiling eyes are positive; this might be the most positive face in Unicode. Indeed, every font was measured as expressing a very positive emotion, except Apple’s, which was apparently controversial but averaged out to slightly negative. Looking at the various renderings, I can totally see how Apple’s might be construed as a grimace.

    So in the name of interoperability, what should font vendors do here? Push Apple (and Twitter and Facebook, by the look of it) to change their glyph? Or should everyone else change, so we end up in a world where two-thirds of people think “grinning face with smiling eyes” is expressing negativity?

    A diversion: fonts

    Perhaps the real problem here is font support itself.

    You can’t install fonts or change default fonts on either iOS or Android (sans root). That Telegram developer who loves Apple’s emoji should absolutely be able to switch their Android devices to use Apple’s font… but that’s impossible.

    It’s doubly impossible because of a teensy technical snag. You see,

    • Apple added support for embedding PNG images in an OpenType font to OS X and iOS.

    • Google added support for embedding PNG images in an OpenType font to FreeType, the font rendering library used on Linux and Android. But they did it differently from Apple.

    • Microsoft added support for color layers in OpenType, so all of its emoji are basically several different monochrome vector images colored and stacked together. It’s actually an interesting approach — it makes the font smaller, it allows pieces to be reused between characters, and it allows the same emoji to be rendered in different palettes on different background colors almost for free.

    • Mozilla went way out into the weeds and added support for embedding SVG in OpenType. If you’re using Firefox, please enjoy these animated emoji. Those are just the letter “o” in plain text — try highlighting or copy/pasting it. The animation is part of the font. (I don’t know whether this mechanism can adapt to the current font color, but these particular soccer balls do not.)

    We have four separate ways to create an emoji font, all of them incompatible, none of them standard (yet? I think?). You can’t even make one set of images and save it as four separate fonts, because they’re all designed very differently: Apple and Google only support regular PNG images, Microsoft only supports stacked layers of solid colors, and Mozilla is ridiculously flexible but still prefers vectors. Apple and Google control the mobile market, so they’re likely to win in the end, which seems a shame since their approaches are the least flexible in terms of size and color and other text properties.

    I don’t think most people have noticed this, partly because even desktop operating systems don’t have an obvious way to change the emoji font (so who would think to try?), and partly because emoji mostly crop up on desktops via web sites which can quietly substitute images (like Twitter and Slack do). It’s not a situation I’d like to see become permanent, though.

    Consider, if you will, that making an emoji font is really hard — there are over 1200 high-resolution images to create, if you want to match Apple’s font. If you used any web forums or IM clients ten years ago, you’re probably also aware that most smiley packs are pretty bad. If you’re stuck on a platform where the default emoji font just horrifies you (for example), surely you’d like to be able to change the font system-wide.

    Disconnecting the fonts from the platforms would actually make it easier to create a new emoji font, because the ability to install more than one side-by-side means that no one font would need to cover everything. You could make a font that provides all the facial expressions, and let someone else worry about the animals. Or you could make a font that provides ZWJ sequences for every combination of an animal face and a facial expression. (Yes, please.) Or you could make a font that turns names of Pokémon into ligatures, so e-e-v-e-e displays as (eevee icon), similar to how Sans Bullshit Sans works.

    But no one can do any of this, so long as there’s no single extension that works everywhere.

    (Also, for some reason, I’ve yet to get Google’s font to work anywhere in Linux. I’m sure there are some fascinating technical reasons, but the upshot is that Google’s browser doesn’t support Google’s emoji font using Google’s FreeType patch that implements Google’s own font extension. It’s been like this for years, and there’s been barely any movement on it, leaving Linux as the only remotely-major platform that can’t seem to natively render color emoji glyphs — even though Android can.)


    Some miscellaneous thoughts:

    • I’m really glad that emoji have forced more developers to actually handle Unicode correctly. Having to deal with commonly-used characters outside of ASCII is a pretty big kick in the pants already, but most emoji are also in Plane 1, which means they don’t fit in a single JavaScript “character” — an issue that would otherwise be really easy to overlook. 💩 is

    • On the other hand, it’s a shame that the rise of emoji keyboards hasn’t necessarily made the rest of Unicode accessible. There are still plenty of common symbols, like ♫, that I can only type on my phone using the Japanese keyboard. I do finally have an input method on my desktop that lets me enter characters by name, which is nice. We’ve certainly improved since the olden days, when you just had to memorize that Alt0233 produced an é… or, wait, maybe English Windows users still have to do that.

    • Breadth of font support is still a problem outside of emoji, and in a plaintext environment there’s just no way to provide any fallback. Google’s Noto font family aspires to have full coverage — it’s named for “no tofu”, referring to the small boxes that often appear for undisplayable characters — but there are still quite a few gaps. Also, on Android, a character that you don’t have a font for just doesn’t appear at all, with no indication you’re missing anything. That’s one way to get no tofu, I guess.

    • Brands™ running ad campaigns revolving around emoji are probably the worst thing. Hey, if we had a standard way to make colored fonts, then Guinness could’ve just released a font with a darker 🍺 U+1F37A BEER MUG and 🍻 U+1F37B CLINKING BEER MUGS, rather than running a ridiculous ad campaign asking Unicode to add a stout emoji.

    • If you’re on a platform that doesn’t ship with an emoji font, you should really really get Symbola. It covers a vast swath of Unicode with regular old black-and-white vector glyphs, usually using the example glyphs from Unicode’s own documents.

    • The plural is “emoji”, dangit. ∎

    Top 10 Performance Tuning Techniques for Amazon Redshift

    Post Syndicated from Ian Meyers original https://blogs.aws.amazon.com/bigdata/post/Tx31034QG0G3ED1/Top-10-Performance-Tuning-Techniques-for-Amazon-Redshift

    Ian Meyers is a Principal Solutions Architect with Amazon Web Services

    Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post

    Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. Customers use Amazon Redshift for everything from accelerating existing database environments that are struggling to scale, to ingestion of web logs for big data analytics use cases. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

    Amazon Redshift can run any type of data model, from a production transaction system third-normal-form model, to star and snowflake schemas, or simple flat tables. As customers adopt Amazon Redshift, they must consider its architecture in order to ensure that their data model is correctly deployed and maintained by the database. This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each. If you address each of these items, you should be able to achieve optimal performance of queries and be able to scale effectively to meet customer demand.

    Issue #1: Incorrect column encoding

    Amazon Redshift is a column-oriented database, which means that rather than organising data on disk by rows, data is stored by column, and rows are extracted from column storage at runtime. This architecture is particularly well suited to analytics queries on tables with a large number of columns, where most queries only access a subset of all possible dimensions and measures. Amazon Redshift is able to only access those blocks on disk that are for columns included in the SELECT or WHERE clause, and doesn’t have to read all table data to evaluate a query. Data stored by column should also be encoded (see Choosing a Column Compression Type in the Amazon Redshift Database Developer Guide) , which means that it is heavily compressed to offer high read performance. This further means that Amazon Redshift doesn’t require the creation and maintenance of indexes: every column is almost like its own index, with just the right structure for the data being stored.

    Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from this best practice, run the following query to determine if any tables have NO column encoding applied:

    SELECT database, schema || ‘.’ || "table" AS "table", encoded, size
    FROM svv_table_info
    WHERE encoded=’N’
    ORDER BY 2;

    Afterward, review the tables and columns which aren’t encoded by running the following query:

    SELECT trim(n.nspname || ‘.’ || c.relname) AS "table",trim(a.attname) AS "column",format_type(a.atttypid, a.atttypmod) AS "type",
    format_encoding(a.attencodingtype::integer) AS "encoding", a.attsortkeyord AS "sortkey"
    FROM pg_namespace n, pg_class c, pg_attribute a
    WHERE n.oid = c.relnamespace AND c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped and n.nspname NOT IN (‘information_schema’,’pg_catalog’,’pg_toast’) AND format_encoding(a.attencodingtype::integer) = ‘none’ AND c.relkind=’r’ AND a.attsortkeyord != 1 ORDER BY n.nspname, c.relname, a.attnum;

    If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility on AWS Labs GitHub to apply encoding. This command line utility uses the ANALYZE COMPRESSION command on each table. If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. (Please note that the first column in a compound sort key should not be encoded, and is not encoded by this utility.)

    Issue #2 – Skewed table data

    Amazon Redshift is a distributed, shared nothing database architecture where each node in the cluster stores a subset of the data. When a table is created, decide whether to spread the data evenly among nodes (default), or place data on a node on the basis of one of the columns. By choosing columns for distribution that are commonly joined together, you can minimize the amount of data transferred over the network during the join. This can significantly increase performance on these types of queries.

    The selection of a good distribution key is the topic of many AWS articles, including Choose the Best Distribution Style; see a definitive guide to distribution and sorting of star schemas in the Optimizing for Star Schemas and Interleaved Sorting on Amazon Redshift blog post. In general, a good distribution key should exhibit the following properties:

    High cardinality – There should be a large number of unique data values in the column relative to the number of nodes in the cluster.

    Uniform distribution/low skew – Each unique value in the distribution key should occur in the table an even number of times. This allows Amazon Redshift to put the same number of records on each node in the cluster.

    Commonly joined – The columns in a distribution key should be those that you usually join to other tables. If you have many possible columns that fit this criterion, then you may choose the column that joins to the largest table.

    A skewed distribution key results in nodes not working equally hard as each other on query execution, requiring unbalanced CPU or memory, and ultimately only running as fast as the slowest node:

    If skew is a problem, you typically see that node performance is uneven on the cluster. Use one of the admin scripts in the Amazon Redshift Utils GitHub repository, such as table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster.

    If you find that you have tables with skewed distribution keys, then consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. Evaluate a candidate column as a distribution key by creating a new table using CTAS:


    Run the table_inspector.sql script against the table again to analyze data skew.

    If there is no good distribution key in any of your records, you may find that moving to EVEN distribution works better, due to the lack of a single node being a hotspot. For small tables, you can also use DISTSTYLE ALL to place table data onto every node in the cluster.

    Issue #3 – Queries not benefiting from sort keys

    Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases but which does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern, then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an INTERLEAVED sort key.

    To determine which tables don’t have sort keys, and how often they have been queried, run the following query:

    SELECT database, table_id, schema || ‘.’ || "table" AS "table", size, nvl(s.num_qs,0) num_qs
    FROM svv_table_info t
    LEFT JOIN (SELECT tbl, COUNT(distinct query) num_qs
    FROM stl_scan s
    WHERE s.userid > 1
    AND s.perm_table_name NOT IN (‘Internal Worktable’,’S3′)
    GROUP BY tbl) s ON s.tbl = t.table_id
    WHERE t.sortkey1 IS NULL
    ORDER BY 5 desc;

    You can run a tutorial that walks you through how to address unsorted tables in the Amazon Redshift Developer Guide. You can also take advantage of another GitHub admin script that recommends sort keys based on query activity. Bear in mind that queries evaluated against a sort key column must not apply a SQL function to the sort key; instead, ensure that you apply the functions to the compared values so that the sort key is used. This is commonly found on TIMESTAMP columns that are used as sort keys.

    Issue #4 – Tables without statistics or which need vacuum

    Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). Without good statistics, the optimiser may make suboptimal or incorrect choices about the order in which to access tables, or how to join datasets together.

    The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics, and you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics:

    SELECT database, schema || ‘.’ || "table" AS "table", stats_off
    FROM svv_table_info
    WHERE stats_off > 5
    ORDER BY 2;

    In Amazon Redshift, data blocks are immutable. When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. Both of these operations cause the previous version of the row to continue consuming disk space and continue being scanned when a query scans the table. As a result, table storage space is increased and performance degraded due to otherwise avoidable disk I/O during scans. A VACUUM command recovers the space from deleted rows and restores the sort order.

    To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need reorganisation.

    Issue #5 – Tables with very large VARCHAR columns

    During processing of complex queries, intermediate query results might need to be stored in temporary blocks. These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and temporary disk space, which can affect query performance. For more information, see Use the Smallest Possible Column Size.

    Use the following query to generate a list of tables that should have their maximum column widths reviewed:

    SELECT database, schema || ‘.’ || "table" AS "table", max_varchar
    FROM svv_table_info
    WHERE max_varchar > 150
    ORDER BY 2;

    After you have a list of tables, identify which table columns have wide varchar columns and then determine the true maximum width for each wide column, using the following query:

    SELECT max(len(rtrim(column_name)))
    FROM table_name;

    In some cases, you may have large VARCHAR type columns because you are storing JSON fragments in the table, which you then query with JSON functions. If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use actually execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.

    If you find that the table has columns that are wider than necessary, then you need to re-create a version of the table with appropriate column widths by performing a deep copy.

    Issue #6 – Queries waiting on queue slots

    Amazon Redshift runs queries using a queuing system known as workload management (WLM). You can define up to 8 queues to separate workloads from each other, and set the concurrency on each queue to meet your overall throughput requirements.

    In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency.

    First, you need to determine if any queries are queuing, using the queuing_queries.sql admin script. Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. Keep in mind that increasing concurrency allows more queries to run, but they share the same memory allocation (unless you increase it). You may find that by increasing concurrency, some queries must use temporary disk storage to complete, which is also sub-optimal as well see next.

    Issue #7 – Queries that are disk-based

    If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. The additional disk I/O slows down the query, and can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation).

    To determine if any queries have been writing to disk, use the following query:

    q.query, trim(q.cat_text)
    FROM (SELECT query, replace( listagg(text,’ ‘) WITHIN GROUP (ORDER BY sequence), ‘\n’, ‘ ‘) AS cat_text FROM stl_querytext WHERE userid>1 GROUP BY query) q
    (SELECT distinct query FROM svl_query_summary WHERE is_diskbased=’t’ AND (LABEL LIKE ‘hash%’ OR LABEL LIKE ‘sort%’ OR LABEL LIKE ‘aggr%’) AND userid > 1) qs ON qs.query = q.query;

    Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete. You can also increase the WLM_QUERY_SLOT_COUNT (http://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html) for the session from the default of 1 to the maximum concurrency for the queue.  As outlined in Issue #6, this may result in queueing queries, so use with care

    Issue #8 – Commit queue waits

    Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.

    If you are committing too often on your database, you will start to see waits on the commit queue increase, which can be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.

    Issue #9 – Inefficient data loads

    Amazon Redshift best practices suggest the use of the COPY command to perform data loads. This API operation uses all compute nodes in the cluster to load data in parallel, from sources such as Amazon S3, Amazon DynamoDB, Amazon EMR HDFS file systems, or any SSH connection.

    When performing data loads, you should compress the files to be loaded whenever possible; Amazon Redshift supports both GZIP and LZO compression. It is more efficient to load a large number of small files than one large one, and the ideal file count is a multiple of the slice count. The number of slices per node depends on the node size of the cluster. For example, each DS1.XL compute node has two slices, and each DS1.8XL compute node has 16 slices. By ensuring you have an even number of files per slices, you can know that COPY execution will evenly use cluster resources and complete as quickly as possible.

    An anti-pattern is to insert data directly into Amazon Redshift, with single record inserts or the use of a multi-value INSERT statement, which allows up to 16 MB of data to be inserted at one time. These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node CPU or memory.

    Issue #10 – Inefficient use of Temporary Tables

    Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a query SELECT … INTO #TEMP_TABLE. The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and uses default storage properties.

    These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you are using select/into syntax you cannot set the column encoding or distribution and sort keys.

    It is highly recommended that you convert all select/into syntax to use the CREATE statement. This ensures that your temporary tables have column encoding and are distributed in a fashion that is sympathetic the other entities that are part of the workflow. To perform a conversion of a statement which uses:

    select column_a, column_b into #my_temp_table from my_table;

    You would analyse the temporary table for optimal column encoding:

    And then convert the select/into statement to:

    create temporary table my_temp_table(
    column_a varchar(128) encode lzo,
    column_b char(4) encode bytedict)
    distkey (column_a) — Assuming you intend to join this table on column_a
    sortkey (column_b); — Assuming you are sorting or grouping by column_b

    insert into my_temp_table select column_a, column_b from my_table;

    You may also wish to analyze statistics on the temporary table, if it is used as a join table for subsequent queries:

    analyze my_temp_table;

    This way, you retain the functionality of using temporary tables but control data placement on the cluster through distkey assignment and take advantage of the columnar nature of Amazon Redshift through use of Column Encoding.

    Tip: Using explain plan alerts

    The last tip is to use diagnostic information from the cluster during query execution. This is stored in an extremely useful view called STL_ALERT_EVENT_LOG. Use the perf_alert.sql admin script to diagnose issues that the cluster has encountered over the last seven days. This is an invaluable resource in understanding how your cluster develops over time.


    Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. While Amazon Redshift can run any type of data model, you can avoid possible pitfalls that might decrease performance or increase cost, by being aware of how data is stored and managed. Run a simple set of diagnostic queries for common issues and ensure that you get the best performance possible.

    If you have questions or suggestions, please leave a comment below.

    UPDATE: This blog post has been translated into Japanese:



    Best Practices for Micro-Batch Loading on Amazon Redshift