Tag Archives: KBA

New – Amazon EC2 Elastic GPUs for Windows

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-ec2-elastic-gpus-for-windows/

Today we’re excited to announce the general availability of Amazon EC2 Elastic GPUs for Windows. An Elastic GPU is a GPU resource that you can attach to your Amazon Elastic Compute Cloud (EC2) instance to accelerate the graphics performance of your applications. Elastic GPUs come in medium (1GB), large (2GB), xlarge (4GB), and 2xlarge (8GB) sizes and are lower cost alternatives to using GPU instance types like G3 or G2 (for OpenGL 3.3 applications). You can use Elastic GPUs with many instance types allowing you the flexibility to choose the right compute, memory, and storage balance for your application. Today you can provision elastic GPUs in us-east-1 and us-east-2.

Elastic GPUs start at just $0.05 per hour for an eg1.medium. A nickel an hour. If we attach that Elastic GPU to a t2.medium ($0.065/hour) we pay a total of less than 12 cents per hour for an instance with a GPU. Previously, the cheapest graphical workstation (G2/3 class) cost 76 cents per hour. That’s over an 80% reduction in the price for running certain graphical workloads.

When should I use Elastic GPUs?

Elastic GPUs are best suited for applications that require a small or intermittent amount of additional GPU power for graphics acceleration and support OpenGL. Elastic GPUs support up to and including the OpenGL 3.3 API standards with expanded API support coming soon.

Elastic GPUs are not part of the hardware of your instance. Instead they’re attached through an elastic GPU network interface in your subnet which is created when you launch an instance with an Elastic GPU. The image below shows how Elastic GPUs are attached.

Since Elastic GPUs are network attached it’s important to provision an instance with adequate network bandwidth to support your application. It’s also important to make sure your instance security group allows traffic on port 2007.

Any application that can use the OpenGL APIs can take advantage of Elastic GPUs so Blender, Google Earth, SIEMENS SolidEdge, and more could all run with Elastic GPUs. Even Kerbal Space Program!

Ok, now that we know when to use Elastic GPUs and how they work, let’s launch an instance and use one.

Using Elastic GPUs

First, we’ll navigate to the EC2 console and click Launch Instance. Next we’ll select a Windows AMI like: “Microsoft Windows Server 2016 Base”. Then we’ll select an instance type. Then we’ll make sure we select the “Elastic GPU” section and allocate an eg1.medium (1GB) Elastic GPU.

We’ll also include some userdata in the advanced details section. We’ll write a quick PowerShell script to download and install our Elastic GPU software.


<powershell>
Start-Transcript -Path "C:\egpu_install.log" -Append
(new-object net.webclient).DownloadFile('http://ec2-elasticgpus.s3-website-us-east-1.amazonaws.com/latest', 'C:\egpu.msi')
Start-Process "msiexec.exe" -Wait -ArgumentList "/i C:\egpu.msi /qn /L*v C:\egpu_msi_install.log"
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Program Files\Amazon\EC2ElasticGPUs\manager\", [EnvironmentVariableTarget]::Machine)
Restart-Computer -Force
</powershell>

This software sends all OpenGL API calls to the attached Elastic GPU.

Next, we’ll double check to make sure my security group has TCP port 2007 exposed to my VPC so my Elastic GPU can connect to my instance. Finally, we’ll click launch and wait for my instance and Elastic GPU to provision. The best way to do this is to create a separate SG that you can attach to the instance.

You can see an animation of the launch procedure below.

Alternatively we could have launched on the AWS CLI with a quick call like this:

$aws ec2 run-instances --elastic-gpu-specification Type=eg1.2xlarge \
--image-id ami-1a2b3c4d \
--subnet subnet-11223344 \
--instance-type r4.large \
--security-groups "default" "elasticgpu-sg"

then we could have followed the Elastic GPU software installation instructions here.

We can now see our Elastic GPU is humming along and attached by checking out the Elastic GPU status in the taskbar.

We welcome any feedback on the service and you can click on the Feedback link in the bottom left corner of the GPU Status Box to let us know about your experience with Elastic GPUs.

Elastic GPU Demonstration

Ok, so we have our instance provisioned and our Elastic GPU attached. My teammates here at AWS wanted me to talk about the amazingly wonderful 3D applications you can run, but when I learned about Elastic GPUs the first thing that came to mind was Kerbal Space Program (KSP), so I’m going to run a quick test with that. After all, if you can’t launch Jebediah Kerman into space then what was the point of all of that software? I’ve downloaded KSP and added the launch parameter of -force-opengl to make sure we’re using OpenGL to do our rendering. Below you can see my poor attempt at building a spaceship – I used to build better ones. It looks pretty smooth considering we’re going over a network with a lossy remote desktop protocol.

I’d show a picture of the rocket launch but I didn’t even make it off the ground before I experienced a rapid unscheduled disassembly of the rocket. Back to the drawing board for me.

In the mean time I can check my Amazon CloudWatch metrics and see how much GPU memory I used during my brief game.

Partners, Pricing, and Documentation

To continue to build out great experiences for our customers, our 3D software partners like ANSYS and Siemens are looking to take advantage of the OpenGL APIs on Elastic GPUs, and are currently certifying Elastic GPUs for their software. You can learn more about our partnerships here.

You can find information on Elastic GPU pricing here. You can find additional documentation here.

Now, if you’ll excuse me I have some virtual rockets to build.

Randall

Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

Game of Thrones Pirates Arrested For Leaking Episode Early

Post Syndicated from Andy original https://torrentfreak.com/game-of-thrones-pirates-arrested-for-leaking-episode-early-170814/

Over the past several years, Game of Thrones has become synonymous with fantastic drama and story telling on the one hand, and Internet piracy on the other. It’s the most pirated TV show in history, hands down.

With the new season well underway, another GoT drama began to unfold early August when the then-unaired episode “The Spoils of War” began to circulate on various file-sharing and streaming sites. The leak only trumped the official release by a few days, but that didn’t stop people downloading in droves.

As previously reported, the leaked episode stated that it was “For Internal Viewing Only” at the top of the screen and on the bottom right sported a “Star India Pvt Ltd” watermark. The company commented shortly after.

“We take this breach very seriously and have immediately initiated forensic investigations at our and the technology partner’s end to swiftly determine the cause. This is a grave issue and we are taking appropriate legal remedial action,” a spokesperson said.

Now, just ten days later, that investigation has already netted its first victims. Four people have reportedly been arrested in India for leaking the episode before it aired.

“We investigated the case and have arrested four individuals for unauthorized publication of the fourth episode from season seven,” Deputy Commissioner of Police Akbar Pathan told AFP.

The report indicates that a complaint was filed by a Mumbai-based company that was responsible for storing and processing the TV episodes for an app. It has been named locally as Prime Focus Technologies, which markets itself as a Netflix “Preferred Vendor”.

It’s claimed that at least some of the men had access to login credentials for Game of Thrones episodes which were then abused for the purposes of leaking.

Local media identified the men as Bhaskar Joshi, Alok Sharma and Abhishek Ghadiyal, who were employed by Prime Focus, and Mohamad Suhail, a former employee, who was responsible for leaking the episode onto the Internet.

All of the men were based in Bangalore and were interrogated “throughout the night” at their workplace on August 11. Star India welcomed the arrests and thanked the authorities for their swift action.

“We are deeply grateful to the police for their swift and prompt action. We believe that valuable intellectual property is a critical part of the development of the creative industry and strict enforcement of the law is essential to protecting it,” the company said in a statement.

“We at Star India and Novi Digital Entertainment Private Limited stand committed and ready to help the law enforcement agencies with any technical assistance and help they may require in taking the investigation to its logical conclusion.”

The men will be held in custody until August 21 while investigations continue.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Build a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/build-a-healthcare-data-warehouse-using-amazon-emr-amazon-redshift-aws-lambda-and-omop/

In the healthcare field, data comes in all shapes and sizes. Despite efforts to standardize terminology, some concepts (e.g., blood glucose) are still often depicted in different ways. This post demonstrates how to convert an openly available dataset called MIMIC-III, which consists of de-identified medical data for about 40,000 patients, into an open source data model known as the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). It describes the architecture and steps for analyzing data across various disconnected sources of health datasets so you can start applying Big Data methods to health research.

Note: If you arrived at this page looking for more info on the movie Mimic 3: Sentinel, you might not enjoy this post.

OMOP overview

The OMOP CDM helps standardize healthcare data and makes it easier to analyze outcomes at a large scale. The CDM is gaining a lot of traction in the health research community, which is deeply involved in developing and adopting a common data model. Community resources are available for converting datasets, and there are software tools to help unlock your data after it’s in the OMOP format. The great advantage of converting data sources into a standard data model like OMOP is that it allows for streamlined, comprehensive analytics and helps remove the variability associated with analyzing health records from different sources.

OMOP ETL with Apache Spark

Observational Health Data Sciences and Informatics (OHDSI) provides the OMOP CDM in a variety of formats, including Apache Impala, Oracle, PostgreSQL, and SQL Server. (See the OHDSI Common Data Model repo in GitHub.) In this scenario, the data is moved to AWS to take advantage of the unbounded scale of Amazon EMR and serverless technologies, and the variety of AWS services that can help make sense of the data in a cost-effective way—including Amazon Machine Learning, Amazon QuickSight, and Amazon Redshift.

This example demonstrates an architecture that can be used to run SQL-based extract, transform, load (ETL) jobs to map any data source to the OMOP CDM. It uses MIMIC ETL code provided by Md. Shamsuzzoha Bayzid. The code was modified to run in Amazon Redshift.

Getting access to the MIMIC-III data

Before you can retrieve the MIMIC-III data, you must request access on the PhysioNet website, which is hosted on Amazon S3 as part of the Amazon Web Services (AWS) Public Dataset Program. However, you don’t need access to the MIMIC-III data to follow along with this post.

Solution architecture and loading process

The following diagram shows the architecture that is used to convert the MIMIC-III dataset to the OMOP CDM.

The data conversion process includes the following steps:

  1. The entire infrastructure is spun up using an AWS CloudFormation template. This includes the Amazon EMR cluster, Amazon SNS topics/subscriptions, an AWS Lambda function and trigger, and AWS Identity and Access Management (IAM) roles.
  2. The MIMIC-III data is read in via an Apache Spark program that is running on Amazon EMR. The files are registered as tables in Spark so that they can be queried by Spark SQL.
  3. The transformation queries are located in a separate Amazon S3 location, which is read in by Spark and executed on the newly registered tables to convert the data into OMOP form.
  4. The data is then written to a staging S3 location, where it is ready to be copied into Amazon Redshift.
  5. As each file is loaded in OMOP form into S3, the Spark program sends a message to an SNS topic that signifies that the load completed successfully.
  6. After that message is pushed, it triggers a Lambda function that consumes the message and executes a COPY command from S3 into Amazon Redshift for the appropriate table.

This architecture provides a scalable way to use various healthcare sources and convert them to OMOP format, where the only changes needed are in the SQL transformation files. The transformation logic is stored in an S3 bucket and is completely de-coupled from the Apache Spark program that runs on EMR and converts the data into OMOP form. This makes the transformation code portable and allows the Spark jar to be reused if other data sources are added—for example, electronic health records (EHR), billing systems, and other research datasets.

Note: For larger files, you might experience the five-minute timeout limitation in Lambda. In that scenario you can use AWS Step Functions to split the file and load it one piece at a time.

Scaling the solution

The transformation code runs in a Spark container that can scale out based on how you define your EMR cluster. There are no single points of failure. As your data grows, your infrastructure can grow without requiring any changes to the underlying architecture.

If you add more data sources, such as EHRs and other research data, the high-level view of the ETL would look like the following:

In this case, the loads of the different systems are completely independent. If the EHR load is four times the size that you expected and uses all the resources, it has no impact on the Research Data or HR System loads because they are in separate containers.

You can scale your EMR cluster based on the size of the data that you anticipate. For example, you can have a 50-node cluster in your container for loading EHR data and a 2-node cluster for loading the HR System. This design helps you scale the resources based on what you consume, as opposed to expensive infrastructure sitting idle.

The only code that is unique to each execution is any diffs between the CloudFormation templates (e.g., cluster size and SQL file locations) and the transformation SQL that resides in S3 buckets. The Spark jar that is executed as an EMR step is reused across all three executions.

Upgrading versions

In this architecture, upgrading the versions of Amazon EMR, Apache Hadoop, or Spark requires a one-time change to one line of code in the CloudFormation template:

"EMRC2SparkBatch": {
      "Type": "AWS::EMR::Cluster",
      "Properties": {
        "Applications": [
          {
            "Name": "Hadoop"
          },
          {
            "Name": "Spark"
          }
        ],
        "Instances": {
          "MasterInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Master"
          },
          "CoreInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Core"
          },
          "TerminationProtected": false
        },
        "Name": "EMRC2SparkBatch",
        "JobFlowRole": { "Ref": "EMREC2InstanceProfile" },
          "ServiceRole": {
                    "Ref": "EMRRole"
                  },
        "ReleaseLabel": "emr-5.0.0",
        "VisibleToAllUsers": true      
}
    }

Note that this example uses a slightly lower version of EMR so that it can use Spark 2.0.0 instead of Spark 2.1.0, which does not support nulls in CSV files.

You can also select the version in the Release list in the General Configuration section of the EMR console:

The data sources all have different CloudFormation templates, so you can upgrade one data source at a time or upgrade them all together. As long as the reusable Spark jar is compatible with the new version, none of the transformation code has to change.

Executing queries on the data

After all the data is loaded, it’s easy to tear down the CloudFormation stack so you don’t pay for resources that aren’t being used:

CloudFormationManager cf = new CloudFormationManager(); 
cf.terminateStack(stack);    

This includes the EMR cluster, Lambda function, SNS topics and subscriptions, and temporary IAM roles that were created to push the data to Amazon Redshift. The S3 buckets that contain the raw MIMIC-III data and the data in OMOP form remain because they existed outside the CloudFormation stack.

You can now connect to the Amazon Redshift cluster and start executing queries on the ten OMOP tables that were created, as shown in the following example:

select *
from drug_exposure
limit 100;

OMOP analytics tools

For information about open source analytics tools that are built on top of the OMOP model, visit the OHDSI Software page.

The following are examples of data visualizations provided by Achilles, an open source visualization tool for OMOP.

Conclusion

This post demonstrated how to convert MIMIC-III data into OMOP form using data tools that are built for scale and flexibility. It compared the architecture against a traditional data warehouse and showed how this design scales by mixing a scale-out technology with EMR and a serverless technology with Lambda. It also showed how you can lower your costs by using CloudFormation to create your data pipeline infrastructure. And by tearing down the stack after the data is loaded, you don’t pay for idle servers.

You can find all the code in the AWS Labs GitHub repo with detailed, step-by-step instructions on how to load the data from MIMIC-III to OMOP using this design.

If you have any questions or suggestions, please add them below.


About the Author

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 


Related

Create a Healthcare Data Hub with AWS and Mirth Connect

 

 

 

 

 

 

 

ISP Blocks Pirate Bay But Vows to Fight Future Blocking Demands

Post Syndicated from Andy original https://torrentfreak.com/isp-blocks-pirate-bay-but-vows-to-fight-future-blocking-demands-170301/

Two weeks go after almost three years of legal battles, Universal Music, Sony Music, Warner Music, Nordisk Film and the Swedish Film Industry finally achieved their dream of blocking a ‘pirate’ site.

The Patent and Market Court ordered Bredbandsbolaget, the ISP at the center of the action, to block The Pirate Bay and another defunct site, Swefilmer. A few hours ago the provider barred its subscribers from accessing them, just ahead of the Court deadline.

This pioneering legal action will almost certainly open the floodgates to similar demands in the future, but if content providers think that Bredbandsbolaget will roll over and give up, they have another thing coming.

In a statement announcing that it had complied with the orders of the court, the ISP said that despite having good reasons to appeal, it had been not allowed to do so. The provider adds that it finds it unreasonable that any provider should have to block content following pressure from private interests, so will fight all future requests.

“We are now forced to contest any future blocking demands. It is the only way for us and other Internet operators to ensure that private players should not have the last word regarding the content that should be accessible on the Internet,” Bredbandsbolaget said.

Noting that the chances of contesting a precedent-setting ruling are “small or non-existent”, the ISP added that not all providers will have the resources to fight, if they are targeted next. Fighting should be the aim though, since there are problems with the existing court order.

According to Bredbandsbolaget, the order requires it to block 100 domain names. However, the ISP says that during the trial it was not determined whether they all lead to illegal sites. In fact, it appears that some of the domains actual point to sites that are either fully legal or non-operational.

For example, in tests conducted by TF this morning the domain bay.malk.rocks led to a Minecraft forum, fattorrents.ws and magnetsearch.net/org were dead, piratewiki.info had expired, torrentdr.com was parked and ViceTorrent.com returned error 404. Also, Swefilmer.com returned a placeholder and SweHD.com was parked and for sale.

“What domains should be blocked or not blocked is therefore reliant on rightsholders’ sincerity, infallibility and the ability to make proportionate assessments,” Bredbandsbolaget warns.

“It is still unclear which body receives questions and complaints if an operator is required to mistakenly block a domain.”

In the wake of the blocking ruling two weeks ago, two other major ISPs in Sweden indicated that they too would put up a fight against blocking demands.

Bahnhof slammed the decision to block The Pirate Bay, describing the effort as signaling the “death throes” of the copyright industry.

Telia was more moderate but said it has no intention of blocking The Pirate Bay, unless it is forced to do so by law.



The full list of domains that were blocked this morning are as follows:

thepiratebay.se
thepiratebay.org
accesspiratebay.com
ahoy.one
bay.malk.rocks
baymirror.date
baymirror.win
bayproxy.date
bayproxy.pw
fastpiratebay.co.uk
fattorrents.ws
gameofbay.org
ikwilthepiratebay.org
kuiken.co
magnetsearch.net
magnetsearch.org
pbp.rocks
pbproxy.com
piraattilahti.net
pirate.trade
piratebay.click
piratebayblocked.com
piratebayproxy.tf
piratebays.co.uk
piratehole.com
pirateportal.xyz
pirateproxies.info
pirateproxies.net
pirate-proxy.info
pirateproxy.online
pirateproxy.wf
pirateproxy.vip
pirateproxy.yt
pirateproxybay.tech
pirates.pw
piratesbay.pe
piratetavern.net
piratetavern.org
piratewiki.info
proxypirate.pw
proxytpb.nl
thebay.tv
thehiddenbay.xyz
thenewbay.org
thepbproxy.website
thepiratebay.ar.com
thepiratebay.bypassed.live
thepiratebay.bypassed.red
thepiratebay.bypassed.video
thepiratebay.casa
thepiratebay.immunicity.live
thepiratebay.immunicity.video
thepiratebay.immunicity.red
thepiratebay.je
thepiratebay.lv
thepiratebay.mg
thepiratebay.red
thepiratebay.run
thepiratebay.skillproxy.com
thepiratebay.skillproxy.net
thepiratebay.skillproxy.org
thepiratebay.unblockthis.net
torrentdr.com
thepiratebay.uk.net
thepiratebay.unblocked.rocks
thepiratebay.unblocked.video
thepiratebay.unblockerproxy.xyz
thepiratebay-proxy.com
thepirateproxy.co
thepirateproxy.info
thepirateproxy.website
thepirateproxybay.xyz
theproxy.pw
theproxybay.pw
tpb.dashitz.com
tpb.patatje.eu
tpb.portalimg.com
tpb.proxyduck.co
tpb.retro.black
tpb.vrelk.com
tpbay.co
tpbmirror.us
tpbpro.xyz
tpbproxy.cc
tpbproxy.pw
tpbproxy.website
tproxy.pro
ukpirate.click
ukpirate.org
ukpirateproxy.xyz
unblockbay.com
unblockthepiratebay.net
unblockthepiratebay.org
urbanproxy.eu
vicetorrent.com
battleit.ee/tpb
thepiratebay.gg
bayproxy.org
thepirateproxybay.site
bayproxy.net
swefilmer.com
www.swefilmer.com
swehd.com
www.swehd.com

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Run Mixed Workloads with Amazon Redshift Workload Management

Post Syndicated from Suresh Akena original https://aws.amazon.com/blogs/big-data/run-mixed-workloads-with-amazon-redshift-workload-management/

Mixed workloads run batch and interactive workloads (short-running and long-running queries or reports) concurrently to support business needs or demand. Typically, managing and configuring mixed workloads requires a thorough understanding of access patterns, how the system resources are being used and performance requirements.

It’s common for mixed workloads to have some processes that require higher priority than others. Sometimes, this means a certain job must complete within a given SLA. Other times, this means you only want to prevent a non-critical reporting workload from consuming too many cluster resources at any one time.

Without workload management (WLM), each query is prioritized equally, which can cause a person, team, or workload to consume excessive cluster resources for a process which isn’t as valuable as other more business-critical jobs.

This post provides guidelines on common WLM patterns and shows how you can use WLM query insights to optimize configuration in production workloads.

Workload concepts

You can use WLM to define the separation of business concerns and to prioritize the different types of concurrently running queries in the system:

  • Interactive: Software that accepts input from humans as it runs. Interactive software includes most popular programs, such as BI tools or reporting applications.
    • Short-running, read-only user queries such as Tableau dashboard query with low latency requirements.
    • Long-running, read-only user queries such as a complex structured report that aggregates the last 10 years of sales data.
  • Batch: Execution of a job series in a server program without manual intervention (non-interactive). The execution of a series of programs, on a set or “batch” of inputs, rather than a single input, would instead be a custom job.
    • Batch queries includes bulk INSERT, UPDATE, and DELETE transactions, for example, ETL or ELT programs.

Amazon Redshift Workload Management

Amazon Redshift is a fully managed, petabyte scale, columnar, massively parallel data warehouse that offers scalability, security and high performance. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

Amazon Redshift is a good fit for any type of analytical data model, for example, star and snowflake schemas, or simple de-normalized tables.

Managing workloads

Amazon Redshift Workload Management allows you to manage workloads of various sizes and complexity for specific environments. Parameter groups contain WLM configuration, which determines how many query queues are available for processing and how queries are routed to those queues. The default parameter group settings are not configurable. Create a custom parameter group to modify the settings in that group, and then associate it with your cluster. The following settings can be configured:

  • How many queries can run concurrently in each queue
  • How much memory is allocated among the queues
  • How queries are routed to queues, based on criteria such as the user who is running the query or a query label
  • Query timeout settings for a queue

When the user runs a query, WLM assigns the query to the first matching queue and executes rules based on the WLM configuration. For more information about WLM query queues, concurrency, user groups, query groups, timeout configuration, and queue hopping capability, see Defining Query Queues. For more information about the configuration properties that can be changed dynamically, see WLM Dynamic and Static Configuration Properties.

For example, the WLM configuration in the following screenshot has three queues to support ETL, BI, and other users. ETL jobs are assigned to the long-running queue and BI queries to the short-running queue. Other user queries are executed in the default queue.

Redshift_Workload_1

Guidelines on WLM optimal cluster configuration

1. Separate the business concerns and run queries independently from each other

Create independent queues to support different business processes, such as dashboard queries and ETL. For example, creating a separate queue for one-time queries would be a good solution so that they don’t block more important ETL jobs.

Additionally, because faster queries typically use a smaller amount of memory, you can set a low percentage for WLM memory percent to use for that one-time user queue or query group.

2. Rotate the concurrency and memory allocations based on the access patterns (if applicable)

In traditional data management, ETL jobs pull the data from the source systems in a specific batch window, transform, and then load the data into the target data warehouse. In this approach, you can allocate more concurrency and memory to the BI_USER group and very limited resources to ETL_USER during business hours. After hours, you can dynamically allocate or switch the resources to ETL_USER without rebooting the cluster so that heavy, resource-intensive jobs complete very quickly.

Note: The example AWS CLI command is shown on several lines for demonstration purposes. Actual commands should not have line breaks and must be submitted as a single line. The following JSON configuration requires escaped quotes.

Redshift_Workload_2

To change WLM settings dynamically, AWS recommends a scheduled Lambda function or scheduled data pipeline (ShellCmd).

3. Use queue hopping to optimize or support mixed workload (ETL and BI workload) continuously

WLM queue hopping allows read-only queries (BI_USER queries) to move from one queue to another queue without cancelling them completely. For example, as shown in the following screenshot, you can create two queues—one with a 60-second timeout for interactive queries and another with no timeout for batch queries—and add the same user group, BI_USER, to each queue. WLM automatically re-routes any only BI_USER timed-out queries in the interactive queue to the batch queue and restarts them.

Redshift_Workload_3

In this example, the ETL workload does not block the BI workload queries and the BI workload is eventually classified as batch, so that long-running, read-only queries do not block the execution of quick-running queries from the same user group.

4. Increase the slot count temporarily for resource-intensive ETL or batch queries

Amazon Redshift writes intermediate results to the disk to help prevent out-of-memory errors, but the disk I/O can degrade the performance. The following query shows if any active queries are currently running on disk:

SELECT query, label, is_diskbased FROM svv_query_state
WHERE is_diskbased = 't';

Query results:

query | label        | is_diskbased
-------+--------------+--------------
1025   | hash tbl=142 |      t

Typically, hashes, aggregates, and sort operators are likely to write data to disk if the system doesn’t have enough memory allocated for query processing. To fix this issue, allocate more memory to the query by temporarily increasing the number of query slots that it uses. For example, a queue with a concurrency level of 4 has 4 slots. When the slot count is set to 4, a single query uses the entire available memory of that queue. Note that assigning several slots to one query consumes the concurrency and blocks other queries from being able to run.

In the following example, I set the slot count to 4 before running the query and then reset the slot count back to 1 after the query finishes.

set wlm_query_slot_count to 4;
select
	p_brand,
	p_type,
	p_size,
	count(distinct ps_suppkey) as supplier_cnt
from
	partsupp,
	part
where
	p_partkey = ps_partkey
	and p_brand <> 'Brand#21'
	and p_type not like 'LARGE POLISHED%'
	and p_size in (26, 40, 28, 23, 17, 41, 2, 20)
	and ps_suppkey not in (
		select
			s_suppk
                                  from
			supplier
		where
			s_comment like '%Customer%Complaints%'
	)
group by
	p_brand,
	p_type,
	p_size
order by
	supplier_cnt desc,
	p_brand,
	p_type,
	p_size;

set wlm_query_slot_count to 1; -- after query completion, resetting slot count back to 1

Note:  The above TPC data set query is used for illustration purposes only.

Example insights from WLM queries

The following example queries can help answer questions you might have about your workloads:

  • What is the current query queue configuration? What is the number of query slots and the time out defined for each queue?
  • How many queries are executed, queued and executing per query queue?
  • What is my workload look like for each query queue per hour? Do I need to change my configuration based on the load?
  • How is my existing WLM configuration working? What query queues should be optimized to meet the business demand?

WLM configures query queues according to internally-defined WLM service classes. The terms queue and service class are often used interchangeably in the system tables.

Amazon Redshift creates several internal queues according to these service classes along with the queues defined in the WLM configuration. Each service class has a unique ID. Service classes 1-4 are reserved for system use and the superuser queue uses service class 5. User-defined queues use service class 6 and greater.

Query: Existing WLM configuration

Run the following query to check the existing WLM configuration. Four queues are configured and every queue is assigned to a number. In the query, queue number is mapped to service_class (Queue #1 => ETL_USER=>Service class 6) with the evictable flag set to false (no query timeout defined).

select service_class, num_query_tasks, evictable, eviction_threshold, name
from stv_wlm_service_class_config
where service_class > 5;
Redshift_Workload_4

The query above provides information about the current WLM configuration. This query can be automated using Lambda and send notifications to the operations team whenever there is a change to WLM.

Query: Queue state

Run the following query to monitor the state of the queues, the memory allocation for each queue and the number of queries executed in each queue. The query provides information about the custom queues and the superuser queue.

select config.service_class, config.name
, trim (class.condition) as description
, config.num_query_tasks as slots
, config.max_execution_time as max_time
, state.num_queued_queries queued
, state.num_executing_queries executing
, state.num_executed_queries executed
from
STV_WLM_CLASSIFICATION_CONFIG class,
STV_WLM_SERVICE_CLASS_CONFIG config,
STV_WLM_SERVICE_CLASS_STATE state
where
class.action_service_class = config.service_class
and class.action_service_class = state.service_class
and config.service_class > 5
order by config.service_class;

Redshift_Workload_5

Service class 9 is not being used in the above results. This would allow you to configure minimum possible resources (concurrency and memory) for default queue. Service class 6, etl_group, has executed more queries so you may configure or re-assign more memory and concurrency for this group.

Query: After the last cluster restart

The following query shows the number of queries that are either executing or have completed executing by service class after the last cluster restart.

select service_class, num_executing_queries,  num_executed_queries
from stv_wlm_service_class_state
where service_class >5
order by service_class;

Redshift_Workload_6
Service class 9 is not being used in the above results. Service class 6, etl_group, has executed more queries than any other service class. You may want configure more memory and concurrency for this group to speed up query processing.

Query: Hourly workload for each WLM query queue

The following query returns the hourly workload for each WLM query queue. Use this query to fine-tune WLM queues that contain too many or too few slots, resulting in WLM queuing or unutilized cluster memory. You can copy this query (wlm_apex_hourly.sql) from the amazon-redshift-utils GitHub repo.

WITH
        -- Replace STL_SCAN in generate_dt_series with another table which has > 604800 rows if STL_SCAN does not
        generate_dt_series AS (select sysdate - (n * interval '1 second') as dt from (select row_number() over () as n from stl_scan limit 604800)),
        apex AS (SELECT iq.dt, iq.service_class, iq.num_query_tasks, count(iq.slot_count) as service_class_queries, sum(iq.slot_count) as service_class_slots
                FROM
                (select gds.dt, wq.service_class, wscc.num_query_tasks, wq.slot_count
                FROM stl_wlm_query wq
                JOIN stv_wlm_service_class_config wscc ON (wscc.service_class = wq.service_class AND wscc.service_class > 4)
                JOIN generate_dt_series gds ON (wq.service_class_start_time <= gds.dt AND wq.service_class_end_time > gds.dt)
                WHERE wq.userid > 1 AND wq.service_class > 4) iq
        GROUP BY iq.dt, iq.service_class, iq.num_query_tasks),
        maxes as (SELECT apex.service_class, trunc(apex.dt) as d, date_part(h,apex.dt) as dt_h, max(service_class_slots) max_service_class_slots
                        from apex group by apex.service_class, apex.dt, date_part(h,apex.dt))
SELECT apex.service_class, apex.num_query_tasks as max_wlm_concurrency, maxes.d as day, maxes.dt_h || ':00 - ' || maxes.dt_h || ':59' as hour, MAX(apex.service_class_slots) as max_service_class_slots
FROM apex
JOIN maxes ON (apex.service_class = maxes.service_class AND apex.service_class_slots = maxes.max_service_class_slots)
GROUP BY  apex.service_class, apex.num_query_tasks, maxes.d, maxes.dt_h
ORDER BY apex.service_class, maxes.d, maxes.dt_h;

For the purposes of this post, the results are broken down by service class.

Redshift_Workload_7

In the above results, service class#6 seems to be utilized consistently up to 8 slots in 24 hrs. Looking at these numbers, no change is required for this service class at this point.

Redshift_Workload_8

Service class#7 can be optimized based on the above results. Two observations to note:

  • 6am- 3pm or 6pm- 6am (next day): The maximum number of slots used is 3. There is an opportunity to rotate concurrency and memory allocation based on these access patterns. For more information about how to rotate resources dynamically, see the guidelines section earlier in the post.
  • 3pm-6pm: Peak is observed during this period. You can leave the existing configuration during this time.

Summary

Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. Using the WLM feature, you can ensure that different users and processes running on the cluster receive the appropriate amount of resource to maximize performance and throughput.

If you have questions or suggestions, please leave a comment below.

 


About the Author

Suresh_90Suresh Akena is a Senior Big data/IT Transformation architect for AWS Professional Services. He works with the enterprise customers to provide leadership on large scale data strategies including migration to AWS platform, big data and analytics projects and help them to optimize and improve time to market for data driven applications when using AWS. In his spare time, he likes to play with his 8 and 3 year old daughters and watch movies.

 

 


Related

 

Top 10 Performance Tuning Techniques for Amazon Redshift

o_redshift_update_1

 

Introducing PIXEL

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/introducing-pixel/

It was just over two years ago when I walked into Pi Towers for the first time. I only had the vaguest idea of what I was going to be doing, but on the first day Eben and I sat down and played with the Raspbian desktop for half an hour, then he asked me “do you think you can make it better?”

origdesk

Bear in mind that at this point I’d barely ever used Linux or Xwindows, never mind made any changes to them, so when I answered “hmmm – I think so”, it was with rather more confidence than I actually felt. It was obvious that there was a lot that could be done in terms of making it a better experience for the user, and I spent many years working in user interface design in previous jobs. But I had no idea where to start in terms of changing Raspbian. I clearly had a bit of a learning curve in front of me…

Well, that was two years ago, and I’ve learnt an awful lot since then. It’s actually surprisingly easy to hack about with the LXDE desktop once you get your head around what all the bits do, and since then I’ve been slowly chipping away at the bits that I felt would most benefit from tweaking. Stuff has slowly been becoming more and more like my original concept for the desktop; with the latest changes, I think the desktop has reached the point where it’s a complete product in its own right and should have its own name. So today, we’re announcing the release of the PIXEL desktop, which will ship with the Foundation’s Raspbian image from now on.

newdesk

PIXEL?

One of the things I said (at least partly in jest) to my colleagues in those first few weeks was that I’d quite like to rename the desktop environment once it was a bit more Pi-specific, and I had the name “pixel” in my mind about two weeks in. It was a nice reminder of my days learning to program in BASIC on the Sinclair ZX81; nowadays, everything from your TV to your phone has pixels on it, but back then it was a uniquely “computer-y” word and concept. I also like crosswords and word games, and once it occurred to me that “pixel” could be made up from the initials of words like Pi and Xwindows, the name stuck in my head and never quite went away. So PIXEL it is, which now officially stands for “Pi Improved Xwindows Environment, Lightweight”.

What’s new?

The latest set of changes are almost entirely to do with the appearance of the desktop; there are some functional changes and a few new applications, about which more below, but this is mostly about making things look nicer.

The first thing you’ll notice on rebooting is that the trail of cryptic boot messages has (mostly) gone, replaced by a splash screen. One feature which has frequently been requested is an obvious version number for our Raspbian image, and this can now be seen at the bottom-right of the splash image. We’ll update this whenever we release a new version of the image, so it should hopefully be slightly easier to know exactly what version you’re running in future.

splash

I should mention that the code for the splash screen has been carefully written and tested, and should not slow down the Pi’s boot process; the time to go from powering on to the desktop appearing is identical, whether the splash is shown or not.

Desktop pictures

Once the desktop appears, the first thing you’ll notice is the rather stunning background image. We’re very fortunate in that Greg Annandale, one of the Foundation’s developers, is also a very talented (and very well-travelled) photographer, and he has kindly allowed us to use some of his work as desktop pictures for PIXEL. There are 16 images to choose from; you can find them in /usr/share/pixel-wallpaper/, and you can use the Appearance Settings application to choose which one you prefer. Do have a look through them, as Greg’s work is well worth seeing! If you’re curious, the EXIF data in each image will tell you where it was taken.

desk2

desk3

desk1

Icons

You’ll also notice that the icons on the taskbar, menu, and file manager have had a makeover. Sam Alder and Alex Carter, the guys responsible for all the cartoons and graphics you see on our website, have been sweating blood over these for the last few months, with Eben providing a watchful eye to make sure every pixel was exactly the right colour! We wanted something that looked businesslike enough to be appropriate for those people who use the Pi desktop for serious work, but with just a touch of playfulness, and Sam and Alex did a great job. (Some of the icons you don’t see immediately are even nicer; it’s almost worth installing some education or engineering applications just so those categories appear in the menu…)

menu

Speaking of icons, the default is now not to show icons in individual application menus. These always made menus look a bit crowded, and didn’t really offer any improvement in usability, not least because it wasn’t always that obvious what the icon was supposed to represent… The menus look cleaner and more readable as a result, since the lack of visual clutter now makes them easier to use.

Finally on the subject of icons, in the past if your Pi was working particularly hard, you might have noticed some yellow and red squares appearing in the top-right corner of the screen, which were indications of overtemperature or undervoltage. These have now been replaced with some new symbols that make it a bit more obvious what’s actually happening; there’s a lightning bolt for undervoltage, and a thermometer for overtemperature.

Windows

If you open a window, you’ll see that the window frame design has now changed significantly. The old window design always looked a bit dated compared to what Apple and Microsoft are now shipping, so I was keen to update it. Windows now have a subtle curve on the corners, a cleaner title bar with new close / minimise / maximise icons, and a much thinner frame. One reason the frame was quite thick on the old windows was so that the grab handles for resizing were big enough to find with the mouse. To avoid this problem, the grab handles now extend slightly outside the window; if you hold the mouse pointer just outside the window which has focus, you’ll see the pointer change to show the handle.

window

Fonts

Steve Jobs said that one thing he was insistent on about the Macintosh was that its typography was good, and it’s true that using the right fonts makes a big difference. We’ve been using the Roboto font in the desktop for the last couple of years; it’s a nice-looking modern font, and it hasn’t changed for this release. However, we have made it look better in PIXEL by including the Infinality font rendering package. This is a library of tweaks and customisations that optimises how fonts are mapped to pixels on the screen; the effect is quite subtle, but it does give a noticeable improvement in some places.

Login

Most people have their Pi set up to automatically log in when the desktop starts, as this is the default setting for a new install. For those who prefer to log in manually each time, the login screen has been redesigned to visually match the rest of the desktop; you now see the login box (known as the “greeter”) over your chosen desktop design, with a seamless transition from greeter to desktop.

login

Wireless power switching

One request we have had in the past is to be able to shut off WiFi and/or Bluetooth completely, particularly on Pi 3. There are now options in the WiFi and Bluetooth menus to turn off the relevant devices. These work on the Pi 3’s onboard wireless hardware; they should also work on most external WiFi and Bluetooth dongles.

You can also now disconnect from an associated wireless access point by clicking on its entry in the WiFi menu.

New applications

There are a couple of new applications now included in the image.

RealVNC have ported their VNC server and viewer applications to Pi, and they are now integrated with the system. To enable the server, select the option on the Interfaces tab in Raspberry Pi Configuration; you’ll see the VNC menu appear on the taskbar, and you can then log in to your Pi and control it remotely from a VNC viewer.

The RealVNC viewer is also included – you can find it from the Internet section of the Applications menu – and it allows you to control other RealVNC clients, including other Pis. Have a look here on RealVNC’s site for more information.

vnc

Please note that if you already use xrdp to remotely access your Pi, this conflicts with the RealVNC server, so you shouldn’t install both at once. If you’re updating an existing image, don’t run the sudo apt-get install realvnc-vnc-server line in the instructions below. If you want to use xrdp on a clean image, first uninstall the RealVNC server with sudo apt-get purge realvnc-vnc-server before installing xrdp. (If the above paragraph means nothing to you, then you probably aren’t using xrdp, so you don’t have to worry about any of it!)

Also included is the new SenseHAT emulator, which was described in a blog post a couple of weeks ago; have a look here for all the details.

sensehat

Updates

There are updates for a number of the built-in applications; these are mostly tweaks and bug fixes, but there have been improvements made to Scratch and Node-RED.

One more thing…

We’ve been shipping the Epiphany web browser for the last couple of years, but it’s now starting to show its age. So for this release (and with many thanks to Gustav Hansen from the forums for his invaluable help with this), we’re including an initial release of Chromium for the Pi. This uses the Pi’s hardware to accelerate playback of streaming video content.

chromium

We’ve preinstalled a couple of extensions; the uBlock Origin adblocker should hopefully keep intrusive adverts from slowing down your browsing experience, and the h264ify extension forces YouTube to serve videos in a format which can be accelerated by the Pi’s hardware.

Chromium is a much more demanding piece of software than Epiphany, but it runs well on Pi 2 and Pi 3; it can struggle slightly on the Pi 1 and Pi Zero, but it’s still usable. (Epiphany is still installed in case you find it useful; launch it from the command line by typing “epiphany-browser”.)

How do I get it?

The Raspbian + PIXEL image is available from the Downloads page on our website now.

To update an existing Jessie image, type the following at the command line:

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install -y rpi-chromium-mods
sudo apt-get install -y python-sense-emu python3-sense-emu
sudo apt-get install -y python-sense-emu-doc realvnc-vnc-viewer

and then reboot.

If you don’t use xrdp and would like to use the RealVNC server to remotely access your Pi, type the following:

sudo apt-get install -y realvnc-vnc-server

As always, your feedback on the new release is very welcome; feel free to let us know what you think in the comments or on the forums.

The post Introducing PIXEL appeared first on Raspberry Pi.

TPBClean: A ‘Safe for Work’ Pirate Bay Without Porn

Post Syndicated from Ernesto original https://torrentfreak.com/tpbclean-safe-work-pirate-bay-without-porn-160925/

tpbcleanOver the years, regular Pirate Bay visitors have seen plenty of scarcely dressed girls being featured on the site.

The XXX category has traditionally been one of the largest, and TPB’s ads can show quite a bit of flesh as well. While there are many people who see this adult themed content as a feature, not everyone appreciates it.

This didn’t go unnoticed to “MrClean,” a developer with quite a bit of experience when it comes to torrent proxy sites.

He was recently confronted with the issue when several Indian programmers he tried to hire for torrent related projects refused to work on a site that listed porn and other nudity.

“Over fifty percent of those contacted refused to work on the project, not for copyright related reasons, but because they didn’t want to work on a project that had ANY links with adult content,” MrClean tells TF.

Apparently, there is a need for pornless torrent sites, so “MrClean” decided to make a sanitized torrent site for these proper folks. This is how the TPBClean proxy site was born.

“Since TPB is now the biggest torrent site again, I figured there might be people with similar feelings toward adult content that would appreciate a clean version of the bay,” MrClean explains.

TPBClean is a direct proxy of all Pirate Bay torrents, minus those in the porn categories. In addition, the site has a customized look, without any ads.

“Although many TPB users don’t mind tits in their face while searching the top 100, TPBClean users will have a more ‘Safe For Work’ experience,” MrClean notes.

People who try searching for XXX content get the following response instead. “Any Explicit/Adult Content has been Removed 🙂” We did some test searches and it appears to work quite well.

XXX filtered

noxxx

The Pirate Bay team can usually appreciate inventive creative expressions, although they might find it hard to believe that anyone would be interested in a torrent site without porn.

In any case, MrClean says that he doesn’t mean to do any harm to the original Pirate Bay, which he full-heartedly supports. And thus far the public response has mainly been positive as well.

“Finally a PirateBay you can browse with your granny!” Pirate Bay proxy portal UKBay tweeted excitingly.

granny

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

‘Will Trump Shut Down The Pirate Bay?’

Post Syndicated from Ernesto original https://torrentfreak.com/will-trump-shut-pirate-bay-160911/

trumpdNo, Trump personally can’t and won’t shut down The Pirate Bay. Period.

Excuse me for the clickbait title and the strange intro, but since it’s the topic of this opinion piece, I thought it was warranted.

Here’s what’s going on.

The torrent community is in turmoil after the shutdowns of KAT and Torrentz. We’ve written about this extensively, but there’s a rather frustrating side-effect that we haven’t discussed so far.

For some reason there’s a slew of news sites, prominently featured in search engines and on social media, that keep spreading fear and panic about a looming Pirate Bay shutdown.

These publications take every piece of file-sharing related news, often sourced from TorrentFreak, and rewrite it in a way that suggests the world’s number one torrent site may disappear, or is already gone.

Here are just a few headlines I’ve seen over the past few days. Click on the links at your own risk.

    • Pirate Bay, Extra Torrent Shutting Down; Fans In Search For Best Torrent Alternative (link)
    • The Pirate Bay (TPB) Shut Down Imminent After Service Partner Faces Piracy Lawsuit (link)
    • Goodbye The Pirate Bay? Cloudflare Under Fire For Helping TPB, Terror Groups (link)
    • The Pirate Bay to shut down soon? (link)
    • The Pirate Bay Shut Down Rumors: Once Site Goes Down, US Library Of Congress Might Be The Next Piracy Haven (link)
  • Pirate Bay to Shut Down…

    mobileapps

    • The Pirate Bay To Shut Down Soon As Excipio Starts To Shoot And Kill Torrent Sites? (link)
    • TPB Now Leads The Pack Of Torrent Sites, But Might Shut Down Soon? List Of Top Torrent Sites Inside (link)
    • The Pirate Bay (TPB), KickassTorrents, Torrentz Shut Down: US Library of Congress As Next Alternative? (link)
  • These reports have absolutely nothing to do with an apparent Pirate Bay shutdown of course.

    The last one, for example, bizarrely connects concerns the RIAA has about access to digital works at the Library of Congress, to the potential demise of TPB, which is pure nonsense.

    Pirate Bay Declared Dead

    isportsnonsense

    Many other articles follow the same format, writing nonsensical trash such as the following:

    “Other torrent sites such as TorrentFreak is not happy with the growing population of The Pirate Bay but they do appreciate the role that TPB is playing in the world of torrent sites.”

    The quote above comes from The Parent Herald, which also suggests that copyright trolls plan to fine The Pirate Bay. Clearly, they have not read the TorrentFreak article on the topic, which they’re quoting, or they simply don’t understand it.

    Might Shut Down Soon?

    patentherald

    So why are these “news” sites reporting this type of doom and gloom? The short answer is ad views. The clickbait articles are shared on social media, appear in Google news and in search results.

    The latter can bring in thousands of views. If people Google for “The Pirate Bay,” these headlines are featured as “news” and beg to be clicked on, generating revenue for the sites in question. For the very same reason you’ll see numerous articles about KAT and Torrentz alternatives.

    Click, Click, Click

    tpbnews

    Why are we complaining about this? Well, these news reports are picked up by other sites and shared among thousands of people. At TorrentFreak we do our best to report news as accurately as possible, and these clickbait articles go directly against this, often using our name.

    We have addressed the clickbait issue in the past but in recent months it has gotten much worse.

    While there are many different sites guilty of this practice, we recently stumbled upon a ring of related publications that all belong to the same company. They carry names such as Parent Herald, iSports Times., University Herald, Mobile&Apps and share a similar layout and design.

    The owner in question, according to the copyright statement, is the New York based company IQ Adnet, which is… surprise surprise, an ad network that specializes in premium digital and native advertising. That explains everything.

    There’s not much we can do about this, unfortunately, besides telling people what’s really going on and venting our frustration every now and then.

    In the meantime, we’ll be waiting for these sites to pick up the Trump angle, which shouldn’t take long.

    For the record. At TorrentFreak we don’t use pay per view ads, partly to get rid of the pageview obsession. This means that the clickbait title we used for this article doesn’t bring in any extra money.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

    The latest update to Raspbian

    Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/another-update-raspbian/

    No exciting new hardware announcement to tie it to this time, but we’ve just released a new version of our Raspbian image with some (hopefully) useful features. Read on for all the details of what has changed…

    Bluetooth

    When the Pi 3 launched back in February, we’d not had time to do much in terms of getting access to the new onboard Bluetooth hardware. There was a working software stack, but the UI was non-existent.

    I’d hoped to be able to use one of the existing Linux Bluetooth UIs, but on trying them all, none were really what I was looking for in terms of usability and integration with the look and feel of the desktop. I really didn’t want to write one from scratch, but that ended up being what I did, which meant a fun few weeks trying to make head or tail of the mysteries of BlueZ and D-Bus. After a few false starts, I finally got something I felt was usable, and so there is now a Bluetooth plugin for the lxpanel taskbar.

    btmenu

    On the taskbar, to the left of the network icon, there is now a Bluetooth icon. Clicking this opens a menu which allows you to make the Pi discoverable by other devices, or to add or remove a Bluetooth device. Selecting the ‘Add Device…’ option opens a window which will gradually populate with any discoverable Bluetooth devices which are in range – just select the one you want to pair with and press the ‘Pair’ button.

    btdialog

    You will then be guided through the pairing procedure, the nature of which depends on the device. With many devices (such as mice or speakers), pairing is entirely automatic and requires no user interaction; on others you may be asked to enter a code or to confirm that a code displayed on a remote device matches that shown on the Pi. Follow the prompts, and (all being well), you should be rewarded with a dialog telling you that pairing was successful.

    Paired devices are listed at the end of the Bluetooth menu – these menu entries can be used to connect or disconnect a paired device. To remove a pairing completely, use the ‘Remove Device…’ option in the menu.

    Bluetooth support is limited at this stage; you can pair with pretty much anything, but you can only usefully connect to devices which support either the Human Interface Device or Audio Sink services – in other words, mice, keyboards and other UI devices, and speakers and headsets.

    Devices should reconnect after a reboot or on powering up your Pi, but bear in mind that keyboards and mice may need you to press a key or click the mouse button to wake them from sleep when first used after a power-up.

    The Bluetooth UI should also work with an external Bluetooth dongle on platforms other than Pi 3 – I’ve successfully tested it with a Targus dongle on all the earlier platforms.

    Bluetooth audio

    The UI now supports the use of Bluetooth speakers and headsets for audio output, with a few caveats, about which more below.

    To connect an audio device, you pair it as described above – it will then be listed in the audio device menu, accessible by right-clicking the speaker icon on the taskbar.

    audiomen

    Selecting a Bluetooth device from the audio device menu will cause it to be selected as the default audio output device – there will be a few seconds’ pause while the connection is established. You can then use the volume control on the taskbar to control it, as for standard wired audio devices.

    There is one issue with the support for Bluetooth audio, however. Due to the way the Bluetooth stack has been written, Bluetooth devices do not appear to the system as standard ALSA audio devices – they require the use of an intermediate audio layer called PulseAudio. The PulseAudio magic is all built into the UI – you don’t need to worry about setting it up – but the problem is that not all applications are able to send audio to the PulseAudio interface, and therefore cannot output audio over Bluetooth.

    Most applications work just fine – videos and music work in the Epiphany and Iceweasel browsers, as does the command-line mplayer music player and the vlc media player. But at present neither Scratch nor Sonic Pi can output audio over Bluetooth – we are working with the authors of these programs to address this and are hopeful that both can be made compatible, so please bear with us!

    The use of PulseAudio has one other effect that may cause issues for a small number of users – specifically, if you are already using PulseAudio for anything other than interfacing with Bluetooth devices. This plugin will automatically stop the PulseAudio service whenever a standard ALSA device is selected. If you are using PulseAudio for your own purposes, it would be best to remove the volumealsa plugin from the taskbar completely to avoid this – just right-click anywhere on the taskbar, choose ‘Add/Remove Panel Items’, and remove the “Volume Control (ALSA)” item from the list.

    SD card copier

    One query which comes up a lot on the forums is about the best way to back up your Pi. People also want to know how to migrate their Raspbian install to a new SD card which is larger or smaller than the one they are using at the moment. This has been difficult with the command-line tools that we’ve recommended in the past, so there is now a new application to help with this, and you’ll find it in the menu under ‘Accessories’.

    sdcc

    The SD Card Copier application will copy Raspbian from one card to another – that’s pretty much all it does – but there are several useful things that you can do as a result. To use it, you will need a USB SD card writer.

    To take a common example: what if you want to back up your existing Raspbian installation? Put a blank SD card in your USB card writer and plug it into your Pi, and then launch SD Card Copier. In the ‘Copy From Device’ box, select “Internal SD Card”, and then select the USB card writer in the ‘Copy To Device’ box (where it will probably be the only device listed). Press ‘Start’, watch the messages on the screen and wait – in ten or fifteen minutes, you should have a clone of your current installation on the new SD card. You can test it by putting the newly-copied card into the Pi’s SD card slot and booting it; it should boot and look exactly the same as your original installation, with all your data and applications intact.

    You can run directly from the backup, but if you want to recover your original card from your backup, simply reverse the process – boot your Pi from the backup card, put the card to which you want to restore into the SD card writer, and repeat the process above.

    The program does not restrict you to only copying to a card the same size as the source; you can copy to a larger card if you are running out of space on your existing one, or even to a smaller card (as long as it has enough space to store all your files – the program will warn you if there isn’t enough space). It has been designed to work with Raspbian and NOOBS images; it may work with other OSes or custom card formats, but this can’t be guaranteed.

    The only restriction is that you cannot write to the internal SD card reader, as that would overwrite the OS you are actually running, which would cause bad things to happen.

    Please also bear in mind that everything on the destination card will be overwritten by this program, so do make sure you’ve got nothing you want to keep on the destination card before you hit Start!

    pigpio

    This image includes the pigpio library from abyz.co.uk – this provides a unified way of accessing the Pi’s GPIO pins from Python, C and other languages. It removes the need to use sudo in programs which want to access the GPIOs, and as a result Scratch now runs sudo-less for everyone.

    Geany

    One of the tools which is really useful for professional programmers is a good text editor – the simple editor provided with LXDE is fine for small tasks, but not really suitable for serious work.

    geany

    The image now includes the Geany editor, which is much better suited to big projects – it offers features like syntax highlighting, automatic indentation and management of multiple files. There’s good online help built into the program itself, or have a look at the Geany website.

    New versions of applications

    There are new versions of many of the standard programs included in the image, including Scratch, Sonic Pi, Node-RED, BlueJ and PyPy. Please see the relevant individual websites or changelists for details of what has changed in each of these.

    New kernel

    The Linux kernel has been upgraded to version 4.4. This change should have no noticeable effect for most users, but it does force the use of device tree; if you’ve been hacking about with your Raspbian install, particularly in terms of installing new hardware, you may find reading this forum post useful.

    Tweaks

    There are a lot of small user interface tweaks throughout the system which you may notice. Some of these include:

    • A new Shutdown Options dialog

    shutdown

    • The Mouse and Keyboard Settings dialog now allows you to set the delay between double-clicks of the mouse button

    mandk

    • The Raspberry Pi Configuration dialog now allows you to enable or disable the single-wire interface, and to enable or disable remote access to the pigpio daemon

    rcgui

    • Right-clicking the Wastebasket icon on the desktop now gives the option to empty the wastebasket

    ewaste

    • The keyboard shortcut Ctrl-Alt-T can now be used to open a Terminal window

    Finally, there are a couple of setup-related features:

    • When flashing a new Raspbian image, the file system will automatically be expanded to use all the space on the card when it is first booted.

    • If a wpa_supplicant.conf file is placed into the /boot/ directory, this will be moved to the /etc/wpa_supplicant/ directory the next time the system is booted, overwriting the network settings; this allows a Wifi configuration to be preloaded onto a card from a Windows or other machine that can only see the boot partition.

    There are also a host of fixes for minor bugs in various parts of the system, and some general cleaning-up of themes and text.

    How do I get it?

    A full image and a NOOBS installer are available from the Downloads page on this website.

    If you are running the current Jessie image, it can be updated to the new version by running

    sudo apt-get update
    sudo apt-get dist-upgrade
    sudo apt-get install piclone geany usb-modeswitch

    As ever, your feedback on the new release is very welcome – feel free to comment here or in the forums.

    The post The latest update to Raspbian appeared first on Raspberry Pi.

    Apple did not invent emoji

    Post Syndicated from Eevee original https://eev.ee/blog/2016/04/12/apple-did-not-invent-emoji/

    I love emoji. I love Unicode in general. I love seeing plain text become more expressive and more universal.

    But, Internet, I’ve noticed a worrying trend. Both popular media and a lot of tech circles tend to assume that “emoji” de facto means Apple’s particular font.

    I have some objections.

    A not so brief history of emoji

    The Unicode Technical Report on emoji also goes over some of this.

    Emoji are generally traced back to the Japanese mobile carrier NTT DoCoMo, which in February 1999 released a service called i-mode which powered a line of wildly popular early smartphones. Its messenger included some 180 small pixel-art images you could type as though they were text, because they were text, encoded using unused space in Shift JIS.

    (Quick background, because I’d like this to be understandable by a general audience: computers only understand numbers, not text, so we need a “character set” that lists all the characters you can type and what numbers represent them. So in ASCII, for example, a capital “A” is passed around as the number 65. Computers always deal with bytes, which can go up to 255, but ASCII only lists characters up to 127 — so everything from 128 to 255 is just unused space. Shift JIS is Japan’s equivalent to ASCII, and had a lot more unused space, and that’s where early emoji were put.)

    Naturally, other carriers added their own variations. Naturally, they used different sets of images, but often in a different order, so the same character might be an apple on one phone and a banana on another. They came up with tables for translating between carriers, but that wouldn’t help if your friend tried to send you an image that your phone just didn’t have. And when these characters started to leak outside of Japan, they had no hope whatsoever of displaying as anything other than garbage.

    This is kind of like how Shift JIS is mostly compatible with ASCII, except that for some reason it has the yen sign ¥ in place of the ASCII backslash , producing hilarious results. Also, this is precisely the problem that Unicode was invented to solve.

    I’ll get back to all this in a minute, but something that’s left out of emoji discussions is that the English-speaking world was developing a similar idea. As far as I can tell, we got our first major exposure to graphical emoticons with the release of AIM 4.0 circa May 2000 and these infamous “smileys”:

    Pixellated, 4-bit smileys from 2000

    Even though AIM was a closed network where there was little risk of having private characters escape, these were all encoded as ASCII emoticons. That simple smiley on the very left would be sent as :-) and turned into an image on your friend’s computer, which meant that if you literally typed :-) in a message, it would still render graphically. Rather than being an extension to regular text, these images were an enhancement of regular text, showing a graphical version of something the text already spelled out. A very fancy ligature.

    Little ink has been spilled over this, but those humble 4-bit graphics became a staple of instant messaging, by which I mean everyone immediately ripped them off. ICQ, MSN Messenger, Yahoo! Messenger, Pidgin (then Gaim), Miranda, Trillian… I can’t name a messenger since 2003 that didn’t have smileys included. All of them still relied on the same approach of substituting graphics for regular ASCII sequences. That had made sense for AIM’s limited palette of faces, but during its heyday MSN Messenger included 67 graphics, most of them not faces. If you sent a smiling crescent moon to someone who had the graphics disabled (or used an alternative client), all they’d see was a mysterious (S).

    So while Japan is generally credited as the source of emoji, the US was quite busy making its own mess of things.

    Anyway, Japan had this mess of several different sets of emoji in common use, being encoded in several different incompatible ways. That’s exactly the sort of mess Unicode exists to sort out, so in mid-2007, several Google employees (one of whom was the co-founder of the Unicode Consortium, which surely helped) put together a draft proposal for adding the emoji to Unicode. The idea was to combine all the sets, drop any duplicates, and add to Unicode whatever wasn’t already there.

    (Unicode is intended as a unification of all character sets. ASCII has , Shift JIS has ¥, but Unicode has both — so an English speaker and a Japanese speaker can both use both characters without getting confused, as long as they’re using Unicode. And so on, for thousands of characters in dozens of character sets. Part of the problem with sending the carriers’ emoji to American computers was that the US was pretty far along in shifting everything to use Unicode, but the emoji simply didn’t exist in Unicode. Obvious solution: add them!)

    Meanwhile, the iPhone launched in Japan in 2008. iOS 2.2, released in November, added the first implementation of emoji — but using SoftBank’s invented encoding, since they were only on one carrier and the characters weren’t yet in Unicode. A couple Apple employees jumped on the bandwagon around that time and coauthored the first official proposal, published in January 2009. Unicode 6.0, the first version to include emoji, was released in October 2010.

    iPhones worldwide gained the ability to use its emoji (now mapped to Unicode) with the release of iOS 5.0 in October 2011.

    Android didn’t get an emoji font at all until version 4.3, in July 2013. I’m at a loss for why, given that Google had proposed emoji in the first place, and Android had been in Japan since the HTC Magic in May 2009. It was even on NTT DoCoMo, the carrier that first introduced emoji! What the heck, Google.

    The state of things

    Consider this travesty of an article from last week. This Genius Theory Will Change the Way You Use the “Pink Lady” Emoji:

    Unicode, creators of the emoji app, call her the “Information Desk Person.”

    Oh, dear. Emoji aren’t an “app”, Unicode didn’t create them, and the person isn’t necessarily female. But the character is named “Information Desk Person”, so at least that part is correct.

    It’s non-technical clickbait, sure. But notice that neither “Apple” nor the names of any of its platforms appear in the text. As far as this article and author are concerned, emoji are Apple’s presentation of them.

    I see also that fileformat.info is now previewing emoji using Apple’s font. Again, there’s no mention of Apple that I can find here; even the page that credits the data and name sources doesn’t mention Apple. The font is even called “Apple Color Emoji”, so you’d think that might show up somewhere.

    Telegram and WhatsApp both use Apple’s font for emoji on every platform; you cannot use your system font. Slack lets you choose, but defaults to Apple’s font. (I objected to Android Telegram’s jarring use of a non-native font; the sole developer explained simply that they like Apple’s font more, and eventually shut down the issue tracker to stop people from discussing it further.)

    The latest revision of Google’s emoji font even made some questionable changes, seemingly just for the sake of more closely resembling Apple’s font. I’ll get into that a bit later, but suffice to say, even Google is quietly treating Apple’s images as a de facto standard.

    The Unicode Consortium will now let you “adopt” a character. If you adopt an emoji, the certificate they print out for you uses Apple’s font.

    It’s a little unusual that this would happen when Android has been more popular than the iPhone almost everywhere, even since iOS first exposed its emoji keyboard worldwide. Also given that Apple’s font is not freely-licensed (so you’re not actually allowed to use it in your project), whereas Google’s whole font family is. And — full disclosure here — quite a few of them look to me like they came from a disquieting uncanny valley populated by plastic people.

    Terrifying.

    Granted, the iPhone did have a 20-month head start at exposing the English-speaking world to emoji. Plus there’s that whole thing where Apple features are mysteriously assumed to be the first of their kind. I’m not entirely surprised that Apple’s font is treated as canonical; I just have some objections.

    Some objections

    I’m writing this in a terminal that uses Source Code Pro. You’re (probably) reading it on the web in Merriweather. Miraculously, you still understand what all the letters mean, even though they appear fairly differently.

    Emoji are text, just like the text you’re reading now, not too different from those goofy :-) smileys in AIM. They’re often displayed with colorful graphics, but they’re just ideograms, similar to Egyptian hieroglyphs (which are also in Unicode). It’s totally okay to write them a little differently sometimes.

    This the only reason emoji are in Unicode at all — the only reason we have a universal set of little pictures. If they’d been true embedded images, there never would have been any reason to turn them into characters.

    Having them as text means we can use them anywhere we can use text — there’s no need to hunt down a graphic and figure out how to embed it. You want to put emoji in filenames, in source code, in the titlebar of a window? Sure thing — they’re just text.

    Treating emoji as though they are a particular set of graphics rather defeats the point. At best, it confuses people’s understanding of what the heck is going on here, and I don’t much like that.

    I’ve encountered people who genuinely believed that Apple’s emoji were some kind of official standard, and anyone deviating from them was somehow wrong. I wouldn’t be surprised if a lot of lay people believed Apple invented emoji. I can hardly blame them, when we have things like World Emoji Day, based on the date on Apple’s calendar glyph. This is not a good state of affairs.


    Along the same lines, nothing defines an emoji, as I’ve mentioned before. Whether a particular character appears as a colored graphic is purely a property of the fonts you have installed. You could have a font that rendered all English text in sparkly purple letters, if you really wanted to. Or you could have a font that rendered emoji as simple black-and-white outlines like other characters — which is in fact what I have.

    Well… that was true, but mere weeks before that post was published, the Unicode Consortium published a list of characters with a genuine “Emoji” property.

    But, hang on. That list isn’t part of the actual Unicode database; it’s part of a “technical report”, which is informative only. In fact, if you look over the Unicode Technical Report on emoji, you may notice that the bulk of it is merely summarizing what’s being done in the wild. It’s not saying what you must do, only what’s already been done. The very first sentence even says that it’s about interoperability.

    If that doesn’t convince you, consider that the list of “emoji” characters includes # and *. Yes, the ASCII characters on a regular qwerty keyboard. I don’t think this is a particularly good authoritative reference.

    Speaking of which, the same list also contains ©, ®, and — and Twitter’s font has glyphs for all three of them: ©, ®, . They aren’t used on web Twitter, but if you naïvely dropped twemoji into your own project, you’d see these little superscript characters suddenly grow to fit large full-width squares. (Worse, all three of them are a single solid color, so they’ll be unreadable on a dark background.) There’s an excellent reason for this, believe it or not: Shift JIS doesn’t contain any of these characters, so the Japanese carriers faked it by including them as emoji.

    Anyway, the technical report proper is a little more nuanced, breaking emoji into a few coarse groups based on who implements them. (Observe that it uses Apple’s font for all 1282 example emoji.)

    I care about all this because I see an awful lot of tech people link this document as though it were a formal specification, which leads to a curious cycle.

    1. Apple does a thing with emoji.
    2. Because Apple is a major vendor, the thing it did is added to the technical report.
    3. Other people look at the report, believe it to be normative, and also do Apple’s thing because it’s “part of Unicode”.
    4. (Wow, Apple did this first again! They’re so ahead of the curve!)

    After I wrote the above list, I accidentally bumbled upon this page from emojipedia, which states:

    In addition to emojis approved in Unicode 8.0 (mid-2015), iOS 9.1 also includes emoji versions of characters all the way back to Unicode 1.1 (1993) that have retroactively been deemed worthy of emoji presentation by the Unicode Consortium.

    That’s flat-out wrong. The Unicode Consortium has never deemed characters worthy of “emoji presentation” — it’s written reports about the characters that vendors like Apple have given colored glyphs. This paragraph congratulates Apple for having an emoji font that covers every single character Apple decided to put in their emoji font!

    This is a great segue into what happened with Google’s recent update to its own emoji font.

    Google’s emoji font changes

    Android 6.0.1 was released in December 2015, and contained a long-overdue update to its emoji font, Noto Color Emoji. It added newly-defined emoji like 🌭 U+1F32D HOT DOG and 🦄 U+1F984 UNICORN FACE, so, that was pretty good.

    ZWJ sequences

    How is this a segue, you ask? Well, see, there are these curious chimeras called ZWJ sequences — effectively new emoji created by mashing multiple emoji together with a special “glue” character in the middle. Apple used (possibly invented?) this mechanism to create “diverse” versions of several emoji like 💏 U+1F48F KISS. The emoji for two women kissing looks like a single image, but it’s actually written as seven characters: woman + heart + kiss + woman with some glue between them. It’s a lot like those AIM smileys, only not ASCII under the hood.

    So, that’s fine, it makes sense, I guess. But then Apple added a new chimera emoji: a speech bubble with an eyeball in it, written as eye + speech bubble. It turned out to be some kind of symbol related to an anti-bullying campaign, dreamed up in conjunction with the Ad Council (?!). I’ve never seen it used and never heard about this campaign outside of being a huge Unicode nerd.

    Lo and behold, it appeared in the updated font. And Twitter’s font. And Emoji One.

    Is this how we want it to work? Apple is free to invent whatever it wants by mashing emoji together, and everyone else treats it as canonical, with no resistance whatsoever? Apple gets to deliberately circumvent the Unicode character process?

    Apple appreciated the symbol, too. “When we first asked about bringing this emoji to the official Apple keyboard, they told us it would take at least a year or two to get it through and approved under Unicode,” says Wittmark. The company found a way to fast-track it, she says, by combining two existing emoji.

    Maybe this is truly a worthy cause. I don’t know. All I know is that Apple added a character (designed by an ad agency) basically on a whim, and now it’s enshrined forever in Unicode documents. There doesn’t seem to be any real incentive for them to not do this again. I can’t wait for apple + laptop to become the MacBook Pro™ emoji.

    (On the other hand, I can absolutely get behind ninja cat.)

    Gender diversity

    I take issue with using this mechanism for some of the “diverse” emoji as well. I didn’t even realize the problem until Google copied Apple’s implementation.

    The basic emoji in question are 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART. The emoji technical report contains the following advice, emphasis mine:

    Some multi-person groupings explicitly indicate gender: MAN AND WOMAN HOLDING HANDS, TWO MEN HOLDING HANDS, TWO WOMEN HOLDING HANDS. Others do not: KISS, COUPLE WITH HEART, FAMILY (the latter is also non-specific as to the number of adult and child members). While the default representation for the characters in the latter group should be gender-neutral, implementations may desire to provide (and users may desire to have available) multiple representations of each of these with a variety of more-specific gender combinations.

    This reinforces the document’s general advice about gender which comes down to: if the name doesn’t explicitly reference gender, the image should be gender-neutral. Makes sense.

    Here’s how 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART look, before and after the font update.

    Pictured: straight people, ruining everything

    Before, both images were gender-agnostic blobs. Now, with the increased “diversity”, you can choose from various combinations of genders… but the genderless version is gone. The default — what you get from the single characters on their own, without any chimera gluing stuff — is heteromance.

    In fact, almost every major font does this for both KISS and COUPLE WITH HEART, save for Microsoft’s. (HTC’s KISS doesn’t, but only because it doesn’t show people at all.)

    Google’s font has changed from “here are two people” to “heterosexuals are the default, but you can use some other particular combinations too”. This isn’t a step towards diversity; this is a step backwards. It also violates the advice in the very document that’s largely based on “whatever Apple and Google are doing”, which is confounding.

    Sometimes, Apple is wrong

    It also highlights another problem with treating Apple’s font as canonical, which is that Apple is occasionally wrong. I concede that “wrong” is a fuzzy concept here, but I think “surprising, given the name of the character” is a reasonable definition.

    In that sense, everyone but Microsoft is wrong about 💏 U+1F48F KISS and 💑 U+1F491 COUPLE WITH HEART, since neither character mentions gender.

    You might expect 🙌 U+1F64C PERSON RAISING BOTH HANDS IN CELEBRATION and 🙏 U+1F64F PERSON WITH FOLDED HANDS to depict people, but Apple only shows a pair of hands for both of them. This is particularly bad with PERSON WITH FOLDED HANDS, which just looks like a high five. Almost every other font has followed suit (CELEBRATION, FOLDED HANDS). Google used to get this right, but changed it with the update.

    Celebration changed to pat-a-cake, for some reason

    👿 U+1F47F IMP suggests, er, an imp, especially since it’s right next to other “monster” characters like 👾 U+1F47E ALIEN MONSTER and 👹 U+1F479 JAPANESE OGRE. Apple appears to have copied its own 😈 U+1F608 SMILING FACE WITH HORNS from the emoticons block and changed the smile to a frown, producing something I would never guess is meant to be an imp. Google followed suit, just like most other fonts, resulting in the tragic loss of one of my favorite Noto glyphs and the only generic representation of a demon.

    This is going to wreak havoc on all my tweets about Doom

    👯 U+1F46F WOMAN WITH BUNNY EARS suggests a woman. Apple has two, for some reason, though that hasn’t been copied quite as much.

    ⬜ U+2B1C WHITE LARGE SQUARE needs a little explanation. Before Unicode contained any emoji (several of which are named with explicit colors), quite a few character names used “black” to mean “filled” and “white” to mean “empty”, referring to how the character would look when printed in black ink on white paper. “White large square” really means the outline of a square, in contrast to ⬛ U+2B1B BLACK LARGE SQUARE, which is solid. Unfortunately, both of these characters somehow ended up in virtually every emoji font, despite not being in the original lists of Japanese carriers’ emoji… and everyone gets it wrong, save for Microsoft. Every single font shows a solid square colored white. Except Google, who colors it blue. And Facebook, who has some kind of window frame, which it colors black for the BLACK glyph.

    When Apple screws up and doesn’t fix it, everyone else copies their screw-up for the sake of compatibility — and as far as I can tell, the only time Apple has ever changed emoji is for the addition of skin tones and when updating images of their own products. We’re letting Apple set a de facto standard for the appearance of text, even when they’re incorrect, because… well, I’m not even sure why.

    Hand gestures

    Returning briefly to the idea of diversity, Google also updated the glyphs for its dozen or so “hand gesture” emoji:

    Hmm I wonder where they got the inspiration for these

    They used to be pink outlines with a flat white fill, but now are a more realistic flat style with the same yellow as the blob faces and shading. This is almost certainly for the sake of supporting the skin tone modifiers later, though Noto doesn’t actually support them yet.

    The problem is, the new ones are much harder to tell apart at a glance! The shadows are very subtle, especially at small sizes, so they might as well all be yellow splats.

    I always saw the old glyphs as abstract symbols, rather than a crop of a person, even a cartoony person. That might be because I’m white as hell, though. I don’t know. If people of color generally saw them the same way, it seems a shame to have made them all less distinct.

    It’s not like the pink and white style would’ve prevented Noto from supporting skin tones in the future, either. Nothing says an emoji with a skin tone has to look exactly like the same emoji without one. The font could easily use the more abstract symbols by default, and switch to this more realistic style when combined with a skin tone.

    💩

    And finally, some kind of tragic accident has made 💩 U+1F4A9 PILE OF POO turn super goofy and grow a face.

    What even IS that now?

    Why? Well, you see, Apple’s has a face. And so does almost everyone else’s, now.

    I looked at the original draft proposal for this one, and SoftBank (the network the iPhone first launched on in Japan) also had a face for this character, whereas KDDI did not. So the true origin is probably just that one particular carrier happened to strike a deal to carry the iPhone first.

    Interop and confusion

    I’m sure the rationale for many of these changes was to reduce confusion when Android and iOS devices communicate. I’m sure plenty of people celebrated the changes on those grounds.

    I was subscribed to several Android Telegram issues about emoji before the issue tracker was shut down, so I got a glimpse into how people feel about this. One person was particularly adamant that in general, the recipient should always see exactly the same image that the sender chose. Which sounds… like it’s asking for embedded images. Which Telegram supports. So maybe use those instead?

    I grew up on the Internet, in a time when ^_^ looked terrible in mIRC’s default font of Fixedsys but just fine in PIRCH98. Some people used MS Comic Chat, which would try to encode actions in a way that looked like annoying noise to everyone else. Abbreviations were still a novelty, so you might not know what “ttfn” means.

    Somehow, we all survived. We caught on, we asked for clarification, we learned the rules, and life went on. All human communication is ambiguous, so it baffles me when people bring up “there’s more than one emoji font” as though it spelled the end of civilization. Someone might read what you wrote and interpret it differently than you intended? Damn, that is definitely a new and serious problem that we have no idea how to handle.

    It sounds to me how this would’ve sounded in 1998:

    A: ^_^
    B: Wow, that looks totally goofy over here. I’m using mIRC.
    A: Oh, I see the problem. Every IRC client should use Arial, like PIRCH does.

    That is, after all, the usual subtext: every font should just copy whatever Apple does. Let’s not.

    Look, science!

    Conveniently for me, someone just did a study on this. Here’s what I found most interesting:

    Overall, we found that if you send an emoji across platform boundaries (e.g., an iPhone to a Nexus), the sender and the receiver will differ by about 2.04 points on average on our -5 to 5 sentiment scale. However, even within platforms, the average difference is 1.88 points.

    In other words, people still interpret the same exact glyph differently — just like people sometimes interpret the same words differently.

    The gap between same-glyph and different-glyph is a mere 0.16 points out of a 10-point scale, a mere 1.6%. The paper still concludes that the designs should move closer together, and sure, they totally should — towards what the characters describe.

    To underscore that idea, note the summary page discusses U+1F601 😁 GRINNING FACE WITH SMILING EYES across five different fonts. Surely this should express something positive, right? Grins are positive, smiling eyes are positive; this might be the most positive face in Unicode. Indeed, every font was measured as expressing a very positive emotion, except Apple’s, which was apparently controversial but averaged out to slightly negative. Looking at the various renderings, I can totally see how Apple’s might be construed as a grimace.

    So in the name of interoperability, what should font vendors do here? Push Apple (and Twitter and Facebook, by the look of it) to change their glyph? Or should everyone else change, so we end up in a world where two-thirds of people think “grinning face with smiling eyes” is expressing negativity?

    A diversion: fonts

    Perhaps the real problem here is font support itself.

    You can’t install fonts or change default fonts on either iOS or Android (sans root). That Telegram developer who loves Apple’s emoji should absolutely be able to switch their Android devices to use Apple’s font… but that’s impossible.

    It’s doubly impossible because of a teensy technical snag. You see,

    • Apple added support for embedding PNG images in an OpenType font to OS X and iOS.

    • Google added support for embedding PNG images in an OpenType font to FreeType, the font rendering library used on Linux and Android. But they did it differently from Apple.

    • Microsoft added support for color layers in OpenType, so all of its emoji are basically several different monochrome vector images colored and stacked together. It’s actually an interesting approach — it makes the font smaller, it allows pieces to be reused between characters, and it allows the same emoji to be rendered in different palettes on different background colors almost for free.

    • Mozilla went way out into the weeds and added support for embedding SVG in OpenType. If you’re using Firefox, please enjoy these animated emoji. Those are just the letter “o” in plain text — try highlighting or copy/pasting it. The animation is part of the font. (I don’t know whether this mechanism can adapt to the current font color, but these particular soccer balls do not.)

    We have four separate ways to create an emoji font, all of them incompatible, none of them standard (yet? I think?). You can’t even make one set of images and save it as four separate fonts, because they’re all designed very differently: Apple and Google only support regular PNG images, Microsoft only supports stacked layers of solid colors, and Mozilla is ridiculously flexible but still prefers vectors. Apple and Google control the mobile market, so they’re likely to win in the end, which seems a shame since their approaches are the least flexible in terms of size and color and other text properties.

    I don’t think most people have noticed this, partly because even desktop operating systems don’t have an obvious way to change the emoji font (so who would think to try?), and partly because emoji mostly crop up on desktops via web sites which can quietly substitute images (like Twitter and Slack do). It’s not a situation I’d like to see become permanent, though.

    Consider, if you will, that making an emoji font is really hard — there are over 1200 high-resolution images to create, if you want to match Apple’s font. If you used any web forums or IM clients ten years ago, you’re probably also aware that most smiley packs are pretty bad. If you’re stuck on a platform where the default emoji font just horrifies you (for example), surely you’d like to be able to change the font system-wide.

    Disconnecting the fonts from the platforms would actually make it easier to create a new emoji font, because the ability to install more than one side-by-side means that no one font would need to cover everything. You could make a font that provides all the facial expressions, and let someone else worry about the animals. Or you could make a font that provides ZWJ sequences for every combination of an animal face and a facial expression. (Yes, please.) Or you could make a font that turns names of Pokémon into ligatures, so e-e-v-e-e displays as (eevee icon), similar to how Sans Bullshit Sans works.

    But no one can do any of this, so long as there’s no single extension that works everywhere.

    (Also, for some reason, I’ve yet to get Google’s font to work anywhere in Linux. I’m sure there are some fascinating technical reasons, but the upshot is that Google’s browser doesn’t support Google’s emoji font using Google’s FreeType patch that implements Google’s own font extension. It’s been like this for years, and there’s been barely any movement on it, leaving Linux as the only remotely-major platform that can’t seem to natively render color emoji glyphs — even though Android can.)

    Appendix

    Some miscellaneous thoughts:

    • I’m really glad that emoji have forced more developers to actually handle Unicode correctly. Having to deal with commonly-used characters outside of ASCII is a pretty big kick in the pants already, but most emoji are also in Plane 1, which means they don’t fit in a single JavaScript “character” — an issue that would otherwise be really easy to overlook. 💩 is

    • On the other hand, it’s a shame that the rise of emoji keyboards hasn’t necessarily made the rest of Unicode accessible. There are still plenty of common symbols, like ♫, that I can only type on my phone using the Japanese keyboard. I do finally have an input method on my desktop that lets me enter characters by name, which is nice. We’ve certainly improved since the olden days, when you just had to memorize that Alt0233 produced an é… or, wait, maybe English Windows users still have to do that.

    • Breadth of font support is still a problem outside of emoji, and in a plaintext environment there’s just no way to provide any fallback. Google’s Noto font family aspires to have full coverage — it’s named for “no tofu”, referring to the small boxes that often appear for undisplayable characters — but there are still quite a few gaps. Also, on Android, a character that you don’t have a font for just doesn’t appear at all, with no indication you’re missing anything. That’s one way to get no tofu, I guess.

    • Brands™ running ad campaigns revolving around emoji are probably the worst thing. Hey, if we had a standard way to make colored fonts, then Guinness could’ve just released a font with a darker 🍺 U+1F37A BEER MUG and 🍻 U+1F37B CLINKING BEER MUGS, rather than running a ridiculous ad campaign asking Unicode to add a stout emoji.

    • If you’re on a platform that doesn’t ship with an emoji font, you should really really get Symbola. It covers a vast swath of Unicode with regular old black-and-white vector glyphs, usually using the example glyphs from Unicode’s own documents.

    • The plural is “emoji”, dangit. ∎

    Top 10 Performance Tuning Techniques for Amazon Redshift

    Post Syndicated from Ian Meyers original https://blogs.aws.amazon.com/bigdata/post/Tx31034QG0G3ED1/Top-10-Performance-Tuning-Techniques-for-Amazon-Redshift

    Ian Meyers is a Principal Solutions Architect with Amazon Web Services

    Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post

    Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. Customers use Amazon Redshift for everything from accelerating existing database environments that are struggling to scale, to ingestion of web logs for big data analytics use cases. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

    Amazon Redshift can run any type of data model, from a production transaction system third-normal-form model, to star and snowflake schemas, or simple flat tables. As customers adopt Amazon Redshift, they must consider its architecture in order to ensure that their data model is correctly deployed and maintained by the database. This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each. If you address each of these items, you should be able to achieve optimal performance of queries and be able to scale effectively to meet customer demand.

    Issue #1: Incorrect column encoding

    Amazon Redshift is a column-oriented database, which means that rather than organising data on disk by rows, data is stored by column, and rows are extracted from column storage at runtime. This architecture is particularly well suited to analytics queries on tables with a large number of columns, where most queries only access a subset of all possible dimensions and measures. Amazon Redshift is able to only access those blocks on disk that are for columns included in the SELECT or WHERE clause, and doesn’t have to read all table data to evaluate a query. Data stored by column should also be encoded (see Choosing a Column Compression Type in the Amazon Redshift Database Developer Guide) , which means that it is heavily compressed to offer high read performance. This further means that Amazon Redshift doesn’t require the creation and maintenance of indexes: every column is almost like its own index, with just the right structure for the data being stored.

    Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from this best practice, run the following query to determine if any tables have NO column encoding applied:

    SELECT database, schema || ‘.’ || "table" AS "table", encoded, size
    FROM svv_table_info
    WHERE encoded=’N’
    ORDER BY 2;

    Afterward, review the tables and columns which aren’t encoded by running the following query:

    SELECT trim(n.nspname || ‘.’ || c.relname) AS "table",trim(a.attname) AS "column",format_type(a.atttypid, a.atttypmod) AS "type",
    format_encoding(a.attencodingtype::integer) AS "encoding", a.attsortkeyord AS "sortkey"
    FROM pg_namespace n, pg_class c, pg_attribute a
    WHERE n.oid = c.relnamespace AND c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped and n.nspname NOT IN (‘information_schema’,’pg_catalog’,’pg_toast’) AND format_encoding(a.attencodingtype::integer) = ‘none’ AND c.relkind=’r’ AND a.attsortkeyord != 1 ORDER BY n.nspname, c.relname, a.attnum;

    If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility on AWS Labs GitHub to apply encoding. This command line utility uses the ANALYZE COMPRESSION command on each table. If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. (Please note that the first column in a compound sort key should not be encoded, and is not encoded by this utility.)

    Issue #2 – Skewed table data

    Amazon Redshift is a distributed, shared nothing database architecture where each node in the cluster stores a subset of the data. When a table is created, decide whether to spread the data evenly among nodes (default), or place data on a node on the basis of one of the columns. By choosing columns for distribution that are commonly joined together, you can minimize the amount of data transferred over the network during the join. This can significantly increase performance on these types of queries.

    The selection of a good distribution key is the topic of many AWS articles, including Choose the Best Distribution Style; see a definitive guide to distribution and sorting of star schemas in the Optimizing for Star Schemas and Interleaved Sorting on Amazon Redshift blog post. In general, a good distribution key should exhibit the following properties:

    High cardinality – There should be a large number of unique data values in the column relative to the number of nodes in the cluster.

    Uniform distribution/low skew – Each unique value in the distribution key should occur in the table an even number of times. This allows Amazon Redshift to put the same number of records on each node in the cluster.

    Commonly joined – The columns in a distribution key should be those that you usually join to other tables. If you have many possible columns that fit this criterion, then you may choose the column that joins to the largest table.

    A skewed distribution key results in nodes not working equally hard as each other on query execution, requiring unbalanced CPU or memory, and ultimately only running as fast as the slowest node:

    If skew is a problem, you typically see that node performance is uneven on the cluster. Use one of the admin scripts in the Amazon Redshift Utils GitHub repository, such as table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster.

    If you find that you have tables with skewed distribution keys, then consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. Evaluate a candidate column as a distribution key by creating a new table using CTAS:

    CREATE TABLE MY_TEST_TABLE DISTKEY (<COLUMN NAME>) AS SELECT * FROM <TABLE NAME>;

    Run the table_inspector.sql script against the table again to analyze data skew.

    If there is no good distribution key in any of your records, you may find that moving to EVEN distribution works better, due to the lack of a single node being a hotspot. For small tables, you can also use DISTSTYLE ALL to place table data onto every node in the cluster.

    Issue #3 – Queries not benefiting from sort keys

    Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases but which does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern, then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an INTERLEAVED sort key.

    To determine which tables don’t have sort keys, and how often they have been queried, run the following query:

    SELECT database, table_id, schema || ‘.’ || "table" AS "table", size, nvl(s.num_qs,0) num_qs
    FROM svv_table_info t
    LEFT JOIN (SELECT tbl, COUNT(distinct query) num_qs
    FROM stl_scan s
    WHERE s.userid > 1
    AND s.perm_table_name NOT IN (‘Internal Worktable’,’S3′)
    GROUP BY tbl) s ON s.tbl = t.table_id
    WHERE t.sortkey1 IS NULL
    ORDER BY 5 desc;

    You can run a tutorial that walks you through how to address unsorted tables in the Amazon Redshift Developer Guide. You can also take advantage of another GitHub admin script that recommends sort keys based on query activity. Bear in mind that queries evaluated against a sort key column must not apply a SQL function to the sort key; instead, ensure that you apply the functions to the compared values so that the sort key is used. This is commonly found on TIMESTAMP columns that are used as sort keys.

    Issue #4 – Tables without statistics or which need vacuum

    Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). Without good statistics, the optimiser may make suboptimal or incorrect choices about the order in which to access tables, or how to join datasets together.

    The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics, and you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics:

    SELECT database, schema || ‘.’ || "table" AS "table", stats_off
    FROM svv_table_info
    WHERE stats_off > 5
    ORDER BY 2;

    In Amazon Redshift, data blocks are immutable. When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. Both of these operations cause the previous version of the row to continue consuming disk space and continue being scanned when a query scans the table. As a result, table storage space is increased and performance degraded due to otherwise avoidable disk I/O during scans. A VACUUM command recovers the space from deleted rows and restores the sort order.

    To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need reorganisation.

    Issue #5 – Tables with very large VARCHAR columns

    During processing of complex queries, intermediate query results might need to be stored in temporary blocks. These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and temporary disk space, which can affect query performance. For more information, see Use the Smallest Possible Column Size.

    Use the following query to generate a list of tables that should have their maximum column widths reviewed:

    SELECT database, schema || ‘.’ || "table" AS "table", max_varchar
    FROM svv_table_info
    WHERE max_varchar > 150
    ORDER BY 2;

    After you have a list of tables, identify which table columns have wide varchar columns and then determine the true maximum width for each wide column, using the following query:

    SELECT max(len(rtrim(column_name)))
    FROM table_name;

    In some cases, you may have large VARCHAR type columns because you are storing JSON fragments in the table, which you then query with JSON functions. If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use actually execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.

    If you find that the table has columns that are wider than necessary, then you need to re-create a version of the table with appropriate column widths by performing a deep copy.

    Issue #6 – Queries waiting on queue slots

    Amazon Redshift runs queries using a queuing system known as workload management (WLM). You can define up to 8 queues to separate workloads from each other, and set the concurrency on each queue to meet your overall throughput requirements.

    In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency.

    First, you need to determine if any queries are queuing, using the queuing_queries.sql admin script. Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. Keep in mind that increasing concurrency allows more queries to run, but they share the same memory allocation (unless you increase it). You may find that by increasing concurrency, some queries must use temporary disk storage to complete, which is also sub-optimal as well see next.

    Issue #7 – Queries that are disk-based

    If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. The additional disk I/O slows down the query, and can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation).

    To determine if any queries have been writing to disk, use the following query:

    SELECT
    q.query, trim(q.cat_text)
    FROM (SELECT query, replace( listagg(text,’ ‘) WITHIN GROUP (ORDER BY sequence), ‘\n’, ‘ ‘) AS cat_text FROM stl_querytext WHERE userid>1 GROUP BY query) q
    JOIN
    (SELECT distinct query FROM svl_query_summary WHERE is_diskbased=’t’ AND (LABEL LIKE ‘hash%’ OR LABEL LIKE ‘sort%’ OR LABEL LIKE ‘aggr%’) AND userid > 1) qs ON qs.query = q.query;

    Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete. You can also increase the WLM_QUERY_SLOT_COUNT (http://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html) for the session from the default of 1 to the maximum concurrency for the queue.  As outlined in Issue #6, this may result in queueing queries, so use with care

    Issue #8 – Commit queue waits

    Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.

    If you are committing too often on your database, you will start to see waits on the commit queue increase, which can be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.

    Issue #9 – Inefficient data loads

    Amazon Redshift best practices suggest the use of the COPY command to perform data loads. This API operation uses all compute nodes in the cluster to load data in parallel, from sources such as Amazon S3, Amazon DynamoDB, Amazon EMR HDFS file systems, or any SSH connection.

    When performing data loads, you should compress the files to be loaded whenever possible; Amazon Redshift supports both GZIP and LZO compression. It is more efficient to load a large number of small files than one large one, and the ideal file count is a multiple of the slice count. The number of slices per node depends on the node size of the cluster. For example, each DS1.XL compute node has two slices, and each DS1.8XL compute node has 16 slices. By ensuring you have an even number of files per slices, you can know that COPY execution will evenly use cluster resources and complete as quickly as possible.

    An anti-pattern is to insert data directly into Amazon Redshift, with single record inserts or the use of a multi-value INSERT statement, which allows up to 16 MB of data to be inserted at one time. These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node CPU or memory.

    Issue #10 – Inefficient use of Temporary Tables

    Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a query SELECT … INTO #TEMP_TABLE. The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and uses default storage properties.

    These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you are using select/into syntax you cannot set the column encoding or distribution and sort keys.

    It is highly recommended that you convert all select/into syntax to use the CREATE statement. This ensures that your temporary tables have column encoding and are distributed in a fashion that is sympathetic the other entities that are part of the workflow. To perform a conversion of a statement which uses:

    select column_a, column_b into #my_temp_table from my_table;

    You would analyse the temporary table for optimal column encoding:

    And then convert the select/into statement to:

    BEGIN;
    create temporary table my_temp_table(
    column_a varchar(128) encode lzo,
    column_b char(4) encode bytedict)
    distkey (column_a) — Assuming you intend to join this table on column_a
    sortkey (column_b); — Assuming you are sorting or grouping by column_b

    insert into my_temp_table select column_a, column_b from my_table;
    COMMIT;

    You may also wish to analyze statistics on the temporary table, if it is used as a join table for subsequent queries:

    analyze my_temp_table;

    This way, you retain the functionality of using temporary tables but control data placement on the cluster through distkey assignment and take advantage of the columnar nature of Amazon Redshift through use of Column Encoding.

    Tip: Using explain plan alerts

    The last tip is to use diagnostic information from the cluster during query execution. This is stored in an extremely useful view called STL_ALERT_EVENT_LOG. Use the perf_alert.sql admin script to diagnose issues that the cluster has encountered over the last seven days. This is an invaluable resource in understanding how your cluster develops over time.

    Summary

    Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. While Amazon Redshift can run any type of data model, you can avoid possible pitfalls that might decrease performance or increase cost, by being aware of how data is stored and managed. Run a simple set of diagnostic queries for common issues and ensure that you get the best performance possible.

    If you have questions or suggestions, please leave a comment below.

    UPDATE: This blog post has been translated into Japanese:

    ————————————

    Related

    Best Practices for Micro-Batch Loading on Amazon Redshift

    ;

    Top 10 Performance Tuning Techniques for Amazon Redshift

    Post Syndicated from Ian Meyers original https://blogs.aws.amazon.com/bigdata/post/Tx31034QG0G3ED1/Top-10-Performance-Tuning-Techniques-for-Amazon-Redshift

    Ian Meyers is a Principal Solutions Architect with Amazon Web Services

    Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post

    Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. Customers use Amazon Redshift for everything from accelerating existing database environments that are struggling to scale, to ingestion of web logs for big data analytics use cases. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

    Amazon Redshift can run any type of data model, from a production transaction system third-normal-form model, to star and snowflake schemas, or simple flat tables. As customers adopt Amazon Redshift, they must consider its architecture in order to ensure that their data model is correctly deployed and maintained by the database. This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each. If you address each of these items, you should be able to achieve optimal performance of queries and be able to scale effectively to meet customer demand.

    Issue #1: Incorrect column encoding

    Amazon Redshift is a column-oriented database, which means that rather than organising data on disk by rows, data is stored by column, and rows are extracted from column storage at runtime. This architecture is particularly well suited to analytics queries on tables with a large number of columns, where most queries only access a subset of all possible dimensions and measures. Amazon Redshift is able to only access those blocks on disk that are for columns included in the SELECT or WHERE clause, and doesn’t have to read all table data to evaluate a query. Data stored by column should also be encoded (see Choosing a Column Compression Type in the Amazon Redshift Database Developer Guide) , which means that it is heavily compressed to offer high read performance. This further means that Amazon Redshift doesn’t require the creation and maintenance of indexes: every column is almost like its own index, with just the right structure for the data being stored.

    Running an Amazon Redshift cluster without column encoding is not considered a best practice, and customers find large performance gains when they ensure that column encoding is optimally applied. To determine if you are deviating from this best practice, run the following query to determine if any tables have NO column encoding applied:

    SELECT database, schema || ‘.’ || "table" AS "table", encoded, size
    FROM svv_table_info
    WHERE encoded=’N’
    ORDER BY 2;

    Afterward, review the tables and columns which aren’t encoded by running the following query:

    SELECT trim(n.nspname || ‘.’ || c.relname) AS "table",trim(a.attname) AS "column",format_type(a.atttypid, a.atttypmod) AS "type",
    format_encoding(a.attencodingtype::integer) AS "encoding", a.attsortkeyord AS "sortkey"
    FROM pg_namespace n, pg_class c, pg_attribute a
    WHERE n.oid = c.relnamespace AND c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped and n.nspname NOT IN (‘information_schema’,’pg_catalog’,’pg_toast’) AND format_encoding(a.attencodingtype::integer) = ‘none’ AND c.relkind=’r’ AND a.attsortkeyord != 1 ORDER BY n.nspname, c.relname, a.attnum;

    If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility on AWS Labs GitHub to apply encoding. This command line utility uses the ANALYZE COMPRESSION command on each table. If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. (Please note that the first column in a compound sort key should not be encoded, and is not encoded by this utility.)

    Issue #2 – Skewed table data

    Amazon Redshift is a distributed, shared nothing database architecture where each node in the cluster stores a subset of the data. When a table is created, decide whether to spread the data evenly among nodes (default), or place data on a node on the basis of one of the columns. By choosing columns for distribution that are commonly joined together, you can minimize the amount of data transferred over the network during the join. This can significantly increase performance on these types of queries.

    The selection of a good distribution key is the topic of many AWS articles, including Choose the Best Distribution Style; see a definitive guide to distribution and sorting of star schemas in the Optimizing for Star Schemas and Interleaved Sorting on Amazon Redshift blog post. In general, a good distribution key should exhibit the following properties:

    High cardinality – There should be a large number of unique data values in the column relative to the number of nodes in the cluster.

    Uniform distribution/low skew – Each unique value in the distribution key should occur in the table an even number of times. This allows Amazon Redshift to put the same number of records on each node in the cluster.

    Commonly joined – The columns in a distribution key should be those that you usually join to other tables. If you have many possible columns that fit this criterion, then you may choose the column that joins to the largest table.

    A skewed distribution key results in nodes not working equally hard as each other on query execution, requiring unbalanced CPU or memory, and ultimately only running as fast as the slowest node:

    If skew is a problem, you typically see that node performance is uneven on the cluster. Use one of the admin scripts in the Amazon Redshift Utils GitHub repository, such as table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster.

    If you find that you have tables with skewed distribution keys, then consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. Evaluate a candidate column as a distribution key by creating a new table using CTAS:

    CREATE TABLE MY_TEST_TABLE DISTKEY (<COLUMN NAME>) AS SELECT * FROM <TABLE NAME>;

    Run the table_inspector.sql script against the table again to analyze data skew.

    If there is no good distribution key in any of your records, you may find that moving to EVEN distribution works better, due to the lack of a single node being a hotspot. For small tables, you can also use DISTSTYLE ALL to place table data onto every node in the cluster.

    Issue #3 – Queries not benefiting from sort keys

    Amazon Redshift tables can have a sort key column identified, which acts like an index in other databases but which does not incur a storage cost as with other platforms (for more information, see Choosing Sort Keys). A sort key should be created on those columns which are most commonly used in WHERE clauses. If you have a known query pattern, then COMPOUND sort keys give the best performance; if end users query different columns equally, then use an INTERLEAVED sort key.

    To determine which tables don’t have sort keys, and how often they have been queried, run the following query:

    SELECT database, table_id, schema || ‘.’ || "table" AS "table", size, nvl(s.num_qs,0) num_qs
    FROM svv_table_info t
    LEFT JOIN (SELECT tbl, COUNT(distinct query) num_qs
    FROM stl_scan s
    WHERE s.userid > 1
    AND s.perm_table_name NOT IN (‘Internal Worktable’,’S3′)
    GROUP BY tbl) s ON s.tbl = t.table_id
    WHERE t.sortkey1 IS NULL
    ORDER BY 5 desc;

    You can run a tutorial that walks you through how to address unsorted tables in the Amazon Redshift Developer Guide. You can also take advantage of another GitHub admin script that recommends sort keys based on query activity. Bear in mind that queries evaluated against a sort key column must not apply a SQL function to the sort key; instead, ensure that you apply the functions to the compared values so that the sort key is used. This is commonly found on TIMESTAMP columns that are used as sort keys.

    Issue #4 – Tables without statistics or which need vacuum

    Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). Without good statistics, the optimiser may make suboptimal or incorrect choices about the order in which to access tables, or how to join datasets together.

    The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics, and you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics:

    SELECT database, schema || ‘.’ || "table" AS "table", stats_off
    FROM svv_table_info
    WHERE stats_off > 5
    ORDER BY 2;

    In Amazon Redshift, data blocks are immutable. When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. Both of these operations cause the previous version of the row to continue consuming disk space and continue being scanned when a query scans the table. As a result, table storage space is increased and performance degraded due to otherwise avoidable disk I/O during scans. A VACUUM command recovers the space from deleted rows and restores the sort order.

    To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need reorganisation.

    Issue #5 – Tables with very large VARCHAR columns

    During processing of complex queries, intermediate query results might need to be stored in temporary blocks. These temporary tables are not compressed, so unnecessarily wide columns consume excessive memory and temporary disk space, which can affect query performance. For more information, see Use the Smallest Possible Column Size.

    Use the following query to generate a list of tables that should have their maximum column widths reviewed:

    SELECT database, schema || ‘.’ || "table" AS "table", max_varchar
    FROM svv_table_info
    WHERE max_varchar > 150
    ORDER BY 2;

    After you have a list of tables, identify which table columns have wide varchar columns and then determine the true maximum width for each wide column, using the following query:

    SELECT max(len(rtrim(column_name)))
    FROM table_name;

    In some cases, you may have large VARCHAR type columns because you are storing JSON fragments in the table, which you then query with JSON functions. If you query the top running queries for the database using the top_queries.sql admin script, pay special attention to SELECT * queries which include the JSON fragment column. If end users query these large columns but don’t use actually execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.

    If you find that the table has columns that are wider than necessary, then you need to re-create a version of the table with appropriate column widths by performing a deep copy.

    Issue #6 – Queries waiting on queue slots

    Amazon Redshift runs queries using a queuing system known as workload management (WLM). You can define up to 8 queues to separate workloads from each other, and set the concurrency on each queue to meet your overall throughput requirements.

    In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency.

    First, you need to determine if any queries are queuing, using the queuing_queries.sql admin script. Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. Keep in mind that increasing concurrency allows more queries to run, but they share the same memory allocation (unless you increase it). You may find that by increasing concurrency, some queries must use temporary disk storage to complete, which is also sub-optimal as well see next.

    Issue #7 – Queries that are disk-based

    If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. The additional disk I/O slows down the query, and can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation).

    To determine if any queries have been writing to disk, use the following query:

    SELECT
    q.query, trim(q.cat_text)
    FROM (SELECT query, replace( listagg(text,’ ‘) WITHIN GROUP (ORDER BY sequence), ‘\n’, ‘ ‘) AS cat_text FROM stl_querytext WHERE userid>1 GROUP BY query) q
    JOIN
    (SELECT distinct query FROM svl_query_summary WHERE is_diskbased=’t’ AND (LABEL LIKE ‘hash%’ OR LABEL LIKE ‘sort%’ OR LABEL LIKE ‘aggr%’) AND userid > 1) qs ON qs.query = q.query;

    Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete. You can also increase the WLM_QUERY_SLOT_COUNT (http://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html) for the session from the default of 1 to the maximum concurrency for the queue.  As outlined in Issue #6, this may result in queueing queries, so use with care

    Issue #8 – Commit queue waits

    Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.

    If you are committing too often on your database, you will start to see waits on the commit queue increase, which can be viewed with the commit_stats.sql admin script. This script shows the largest queue length and queue time for queries run in the past two days. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.

    Issue #9 – Inefficient data loads

    Amazon Redshift best practices suggest the use of the COPY command to perform data loads. This API operation uses all compute nodes in the cluster to load data in parallel, from sources such as Amazon S3, Amazon DynamoDB, Amazon EMR HDFS file systems, or any SSH connection.

    When performing data loads, you should compress the files to be loaded whenever possible; Amazon Redshift supports both GZIP and LZO compression. It is more efficient to load a large number of small files than one large one, and the ideal file count is a multiple of the slice count. The number of slices per node depends on the node size of the cluster. For example, each DS1.XL compute node has two slices, and each DS1.8XL compute node has 16 slices. By ensuring you have an even number of files per slices, you can know that COPY execution will evenly use cluster resources and complete as quickly as possible.

    An anti-pattern is to insert data directly into Amazon Redshift, with single record inserts or the use of a multi-value INSERT statement, which allows up to 16 MB of data to be inserted at one time. These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node CPU or memory.

    Issue #10 – Inefficient use of Temporary Tables

    Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. When the user disconnects the session, the tables are automatically deleted. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a query SELECT … INTO #TEMP_TABLE. The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and uses default storage properties.

    These default storage properties may cause issues if not carefully considered. Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. This is a sub-optimal data structure for many types of queries, and if you are using select/into syntax you cannot set the column encoding or distribution and sort keys.

    It is highly recommended that you convert all select/into syntax to use the CREATE statement. This ensures that your temporary tables have column encoding and are distributed in a fashion that is sympathetic the other entities that are part of the workflow. To perform a conversion of a statement which uses:

    select column_a, column_b into #my_temp_table from my_table;

    You would analyse the temporary table for optimal column encoding:

    And then convert the select/into statement to:

    BEGIN;
    create temporary table my_temp_table(
    column_a varchar(128) encode lzo,
    column_b char(4) encode bytedict)
    distkey (column_a) — Assuming you intend to join this table on column_a
    sortkey (column_b); — Assuming you are sorting or grouping by column_b

    insert into my_temp_table select column_a, column_b from my_table;
    COMMIT;

    You may also wish to analyze statistics on the temporary table, if it is used as a join table for subsequent queries:

    analyze my_temp_table;

    This way, you retain the functionality of using temporary tables but control data placement on the cluster through distkey assignment and take advantage of the columnar nature of Amazon Redshift through use of Column Encoding.

    Tip: Using explain plan alerts

    The last tip is to use diagnostic information from the cluster during query execution. This is stored in an extremely useful view called STL_ALERT_EVENT_LOG. Use the perf_alert.sql admin script to diagnose issues that the cluster has encountered over the last seven days. This is an invaluable resource in understanding how your cluster develops over time.

    Summary

    Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. While Amazon Redshift can run any type of data model, you can avoid possible pitfalls that might decrease performance or increase cost, by being aware of how data is stored and managed. Run a simple set of diagnostic queries for common issues and ensure that you get the best performance possible.

    If you have questions or suggestions, please leave a comment below.

    UPDATE: This blog post has been translated into Japanese:

    ————————————

    Related

    Best Practices for Micro-Batch Loading on Amazon Redshift

    ;