Tag Archives: research

Installing and Running JobServer for Apache Spark on Amazon EMR

Post Syndicated from Derek Graeber original https://blogs.aws.amazon.com/bigdata/post/Tx1KOALKLSDHC9J/Installing-and-Running-JobServer-for-Apache-Spark-on-Amazon-EMR

Derek Graeber is a senior consultant in big data analytics for AWS Professional Services

Working with customers who are running Apache Spark on Amazon EMR, I run into the scenario where data loaded into a SparkContext can and should be shared across multiple use cases. They ask a very valid question: “Once I load the data into Spark, how can I access the data to support ad hoc queries?” Open-source APIs are available that support doing this, such as JobServer, which exposes a RESTful interface to access the context.

“Okay, how do I put JobServer on EMR?” typically follows. There are a couple of ways. Amazon EMR recently started using Apache BigTop to support a much quicker release window (among other things) for out-of-the-box application components installed on EMR such as Spark, Pig, and Hive.

In this blog post, you will learn how to install JobServer on EMR using a bootstrap action (BA) derived from the JobServer GitHub repository. Then we’ll run JobServer using a sample dataset. To learn how to compile JobServer and install it on your Spark cluster, look at the JobServer readme file for EMR. All referenced code including the BA, Spark code, and commands, is located in this project’s GitHub repository.

Background and setup

For this approach, we assume that you have a working knowledge of Apache Spark running on EMR and can create a cluster with configurations using either the AWS Management Console or the AWS Command Line Interface (AWS CLI). For this exercise, we will define a cluster size and use the airline flight’s public dataset available on Amazon S3. This data is in Parquet format. We will create an Amazon EMR 4.7.1 cluster consisting of:

  • One r3.xlarge master instance
  • Five r3.xlarge core instances

Note that this cluster setup is completely arbitrary. This cluster is the one I typically use for proof-of-concept work. Note that the cluster is not optimized. (Optimization is outside the scope of this post.) You can modify your cluster nodes or Spark job configuration as you wish. For a reference, see AWS Big Data Blog post.

If you haven’t read the readme file for both JobServer and the EMR-JobServer configurations, do that before proceeding. Then get the project in GitHub and explore. The project is laid out in a typical Maven structure with additional directories for the configurations and BA.

Next, look at the version information in the JobServer readme file and determine the version of JobServer you’d like to use based on the version of Spark you are using. In this example, we are using Amazon EMR 4.7.1, which supports Spark 1.6.1. Thus, based on the readme, we will need version 0.6.2 (v0.6.2 branch) of JobServer. Make a note of this for later.

As you read the readme for EMR-JobServer, you’ll see there are two configuration files to be aware of:

  1. emr.sh – this file defines parameters related to your Spark installation. For our example, these points apply:

    1. You only need to modify the SPARK_VERSION value.
    2. We will use the emr_v1.6.1.sh file provided in this blog post’s sample code.
  2. emr.conf – this file defines the Spark run-time file used and some contexts that can be created. For our example, these points apply:

    1. We are creating a pre-defined spark-sql context for Hive.
    2. Researching the context definitions under the job-server-extras part of the GitHub project for JobServer helps a lot to understand these context factories.
    3. We will use the emr_contexts.conf file provided in this blog post’s sample code.

You can find copies of these configuration files in the GitHub project under <project-root>/jobserver-configs. Review and become familiar with them. We will be staging them on Amazon S3 for the cluster creation later on.

We also provide two BAs and an EMR configuration sample under <project-root>/BA:

  1. full_install_jobserver_BA.sh – this BA installs all necessary build components on your cluster, gets the project from GitHub, compiles, and creates a JobServer distribution. When installation is complete, this BA deploys JobServer and also puts the .tar file for the compiled code onto S3 for reuse.
  2. existing_build_jobserver_BA.sh – this BA looks for a precompiled distribution of JobServer in S3 and deploys that onto the cluster.
  3. configurations.json – this sample EMR configuration is provided for illustration purposes. Here, we’re setting the Spark cluster to use the maximumResourceAllocation option.

Why two BAs? When you determine the version of JobServer you want to use and begin to use it extensively, the overhead of installing the build frameworks and compiling the source code becomes redundant and time-consuming. You have built the distro and it is available on Amazon S3, so why save time by reusing it and streamlining the build process of the cluster,? This approach also means that your EMR cluster is cleaner, because it won’t have SBT and Git installed. However, this is just a matter of preference. We will walk through both approaches.

Compile and install JobServer from GitHub

In this section, we will work with full_install_jobserver_BA.sh, a bash script that performs the necessary tasks. We will put the BAs, EMR configurations, and JobServer configurations onto S3 for ease of access. For this to work, you’ll need to define which bucket on S3 you’d like to use as a staging area for the builds.

At the top of full_install_jobserver_BA.sh are the variables you need to configure:

  • JOBSERVER_VERSION – the GitHub version of JobServer you settled on (v0.6.2 in our case)
  • EMR_SH_FILE – the configuration file you created preceding, staged on Amazon S3 using the full object key
  • EMR_CONF_FILE – the configuration file you created preceding, staged on Amazon S3 using the full object key
  • S3_BUILD_ARCHIVE_DIR – the location in Amazon S3 where you want the compiled distribution to live after the cluster is created for reuse

Once you’ve defined these parameters in the BA, upload the following to S3: the BA (full_install_jobserver_BA.sh), the configuration for Amazon EMR (configurations.json), and the JobServer configurations (emr.sh and emr_contexts.conf in this case). I typically use the AWS CLI to move files to S3. Below are the CLI commands for this upload.

> aws s3 cp <localDir>full_install_jobserver_BA.sh s3://<my-bucket>/<my-object-path>/full_install_jobserver_BA_.sh

> aws s3 cp <localDir>configurations.json s3://<my-bucket>/<my-object-path>/configurations.json

> aws s3 cp <localDir>emr.sh s3://<my-bucket>/<my-object-path>/emr.sh

> aws s3 cp <localDir>emr_contexts.conf s3://<my-bucket>/

For this example, we’ll streamline the process to create a cluster and highlight only the pertinent areas using the AWS console:

  1. In the Advanced Options section, choose EMR 4.7.1, Hive, Hadoop, and Spark 1.6.1.
  2. Select the Load JSON from S3 option, and browse to the configurations.json file you staged.
  3. Define your cluster size and counts as described preceding.
  4. Choose Additional Options, Bootstrap Actions, Custom Action, and then Configure and Add.
  5. Under Script Location, browse to your BA (full_install_jobserver_BA.sh) in S3.
  6. Make sure you have access to an Amazon EC2 key pair, because you will need to use Secure Shell (SSH) to get onto the master node later.
  7. Create your cluster.

You can find a sample AWS CLI command for this cluster under <project-root>/commands/commands-cluster.txt for reference.

The cluster takes around 15 minutes or so to create because the BA is installing some build artifacts and compiling the code into a .tar file, then extracting the .tar file onto the server. The BA also copies the .tar to S3 for reuse.

When the cluster is in the waiting state, you can verify the build:

  • Use SSH to get onto the master node and verify that /mnt/lib/spark-jobserver exists.
  • Verify that job-server-$JOBSERVER_VERSION.tar.gz is in your S3 bucket.

To get familiar with the distribution, explore /mnt/lib/spark-jobserver.

Install JobServer from Amazon S3

In this section, we will work with the existing_build_jobserver_BA.sh, a bash script that performs the necessary tasks. As with the other BA, there are configuration values you need to edit:

  • EMR_SH_FILE – set this value to the full path and name of configuration file you created preceding, staged on Amazon S3
  • EMR_CONF_FILE – set this value to the full path and name of the configuration file you created preceding, staged on Amazon S3
  • S3_FULL_PATH_TAR – set this value to the location in Amazon S3 where the compiled distribution is located (from the full_install_jobserver_BA.sh BA)

Make sure the files are on S3 by running the following CLI commands.

> aws s3 cp <localDir>existing_build_jobserver_BA.sh s3://<my-bucket>/<my-object-path>/existing_build_jobserver_BA.sh

> aws s3 cp <localDir>configurations.json s3://<my-bucket>/<my-object-path>/configurations.json

> aws s3 cp <localDir>emr.sh s3://<my-bucket>/<my-object-path>/emr.sh

> aws s3 cp <localDir>emr_contexts.conf s3://<my-bucket>/<my-object-path>/emr_contexts.conf

Next, create the cluster as depicted preceding, but this time reference existing_build_jobserver_BA.sh. You need to make sure that the compiled version of the jobserver.tar file matches the EMR version and Spark version you are using.

This cluster should take the normal amount of time to create (in my case, under 10 minutes). Your time will vary based on your setup and options, but this BA reduces the creation time compared to full_install_jobserver_BA.sh. The installation is also located at /mnt/lib/spark-jobserver.

Run JobServer

Now that we have the cluster created, we will do the following:

  1. Use the FlightsBatch job in yarn-client mode through spark-submit as a reference to baseline the flight data sample in Spark.
  2. Start JobServer with a predefined Hive context in SparkSQL.
  3. Load the flight dataset into that JobServer context.
  4. Run ad hoc queries by using the RESTful interface of JobServer.

We will be doing this on the master node, so make sure you have the SOCKS proxy enabled so that you can access the Spark and JobServer UIs. This great AWS Big Data Blog post covers the finer points of spark-submit. If you’re running this cluster in production, you can open port 8090 on your master node security group. Consult your security representative to verify that this approach is acceptable.

Once you have the proxy enabled and have reached the master node by using SSH, be aware of these locations:

  • /mnt/lib/spark-jobserver – the location where JobServer is installed and where you will start and stop JobServer
  • /mnt/var/log/spark-jobserver – the location of the logs for JobServer; it’s very helpful to tail these logs

Next, you create the .jar file for the jobs we will execute. You will need maven to create the jar distribution. This .jar file will contain the traditional batch Spark job (com.amazonaws.proserv.blog.FlightsBatch) and also the JobServer artifacts that use the proper interfaces (com.amazonaws.proserv.blog.FlightsSql.scala). Create the .jar file with maven.

> mvn clean package

Put the .jar file (jobserverEmr-1.0.jar) onto your master node. We will use this file for our JobServer execution. In this example, we will put it at /mnt/lib/spark-jobserver. We will skip the actual benchmarking execution (FlightsBatch), but you will find sample spark-submit commands and all commands we run on the master node at <project-root>/commands/commands-flights.txt. For this size of cluster without optimization, it takes around 7 minutes to load the data into SparkSQL context. When the data is there, the SQL execution is quick, which is the point of using JobServer.

Loading JobServer

Now, let’s start JobServer with our predefined context.

> cd /mnt/lib/spark-jobserver

> ./server-start.sh

Verify that the server started by browsing to http://<master-node-dns>:8090 and confirming that hive-context has been created on the Contexts tab. The image below shows the JobServer UI with preloaded content.



Now, let’s add our .jar file so JobServer can have it available for execution. We will do this by using the REST interface. I use CURL, but you can use any REST-compliant interface.

> curl --data-binary @jobserverEmr-1.0.jar localhost:8090/jars/flightsample

Verify that the jar was successfully loaded by the Spark JobServer UI again by checking the Jars tab. The image below shows the JobServer UI with available jars.

Now we can load the data into hive-context and send ad hoc queries. We will separate these two steps. Typically, I open a second SSH window just to tail the logs. That is, I have 2 SSH sessions open at the same time:

> tail -200f /mnt/var/log/spark-jobserver/spark-job-server.log

First, load the data by using the RESTful interface:

> curl -d "loc = \"s3://us-east-1.elasticmapreduce.samples/flightdata/input\"" 'localhost:8090/jobs?appName=flightsample&classPath=com.amazonaws.proserv.blog.FlightsHiveLoad&context=hive-context'

Spark is now aware of the data and the directed acyclic graph (DAG), but the data hasn’t been acted on yet, so it hasn’t been loaded into the context. To do that, we run the first ad hoc query to perform an action that will load the data into an RDD:

> curl -d "sql = \"SELECT origin, count(*) AS total_departures FROM flights WHERE year >= '2000' GROUP BY origin ORDER BY total_departures DESC LIMIT 10\"" 'localhost:8090/jobs?appName=flightsample&classPath=com.amazonaws.proserv.blog.FlightsHiveTest&context=hive-context'

At this point, the benchmark comes in handy. We know that to load the entire dataset takes approximately 7 minutes, so we are not going to wait for the response for this first query. In the JobServer UI, we can see our initial request in the Running Jobs category. In the image below, you can see the JobServer UI with FlightHiveLoad executed and FlightsHiveTest SQL executing, in Running state.

Once this query is in the Completed Jobs category, we can execute the same query (or other queries) and wait for the response because the data is already loaded in memory.

For example, if we execute the same query we used to load the data with a timeout and a sync flag, we will get a response very quickly:

> curl -d "sql = \"SELECT origin, count(*) AS total_departures FROM flights WHERE year >= '2000' GROUP BY origin ORDER BY total_departures DESC LIMIT 10\"" 'localhost:8090/jobs?appName=flightsample&classPath=com.amazonaws.proserv.blog.FlightsHiveTest&context=hive-context&sync=true&timeout=150'


  "result": ["[ATL,5683237]", "[ORD,5063315]", "[DFW,4402632]", "[LAX,3318774]", "[DEN,3038788]", "[PHX,2801223]", "[IAH,2701763]", "[LAS,2313143]", "[DTW,2126781]", "[SFO,2115094]"]


> curl -d "sql = \"SELECT origin, dest, count(*) AS total_flights FROM flights WHERE year >= '2000' GROUP BY origin, dest ORDER BY total_flights DESC LIMIT 10\"" 'localhost:8090/jobs?appName=flightsample&classPath=com.amazonaws.proserv.blog.FlightsHiveTest&context=hive-context&sync=true&timeout=150'


  "result": ["[LAX,LAS,192576]", "[LAS,LAX,189801]", "[SFO,LAX,189702]", "[LAX,SFO,187680]", "[SAN,LAX,171133]", "[LAX,SAN,171059]", "[PHX,LAX,164053]", "[LAX,PHX,162193]", "[LGA,ORD,161917]", "[ORD,LGA,159844]"]


You can now issue ad hoc queries on the data currently in memory and correlate the requests by using the JobServer UI. The image shows the JobServer UI with multiple ad hoc queries executed


In the blog post, you learned how to use a bootstrap action to install Spark JobServer on EMR, both by compiling fully from source code in GitHub and by a precompiled distribution. We then demonstrated how we can start JobServer with a predefined Hive context in SparkSQL and warm that context with the flights dataset publicly available in Amazon S3. Finally, we walked through how to pose ad hoc queries by using the RESTful JobServer interface with the flights dataset to get synchronous responses without incurring the overhead of reloading the data into memory.

If you have any questions or suggestions, please leave a comment below.




Top 10 Performance Tuning Techniques for Amazon Redshift

Detecting When a Smartphone Has Been Compromised

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/07/detecting_when_.html

Andrew “bunnie” Huang and Edward Snowden have designed a smartphone case that detects unauthorized transmissions by the phone. Paper. Three news articles.

Looks like a clever design. Of course, it has to be outside the device; otherwise, it could be compromised along with the device. Note that this is still in the research design stage; there are no public prototypes.

Hot Startups on AWS – July 2016 – Depop, Nextdoor, Branch

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/hot-startups-on-aws-july-2016/

Today I would like to introduce a very special guest blogger! My daughter Tina is a Recruiting Coordinator for the AWS team and is making her professional blogging debut with today’s post.


It’s officially summer and it’s hot! Check out this month’s hot AWS-powered startups:

  • Depop – a social mobile marketplace for artists and friends to buy and sell products.
  • Nextdoor – building stronger and safer neighborhoods through technology.
  • Branch – provides free deep linking technology for mobile app developers to gain and retain users.

Depop (UK)
In 2011, Simon Beckerman and his brother, Daniel, set out to create a social, mobile marketplace that would make buying and selling from mobile a fun and interactive experience. The Depop founders recognized that the rise of m-commerce was changing the way that consumers wanted to communicate and interact with each other. Simon, who already ran PIG Magazine and the luxury eyewear brand RetroSuperFuture, wanted to create a space where artists and creatives like himself could share, buy and sell their possessions. After launching organically in Italy, Depop moved to Shoreditch, London in 2012 to establish its headquarter and has since grown considerably with offices in London, New York, and Milan.

With over 4 million users worldwide, Depop is growing and building a community of shop owners with a passion for fashion, music, art, vintage, and lifestyle pieces. The familiar and user-friendly interface allows users to follow, like, comment, and private message with other users and shop owners. Simply download the app (Android or iOS) and you are suddenly connected to millions of unique items ready for purchase. It’s not just clothes either – you can find home décor, vintage furniture, jewelry, and more. Filtering by location allows you to personalize your feed and shop locally for even more convenience. Buyers can scroll through an endless stream of items ready for purchase and have the option to either pick up in-person or have their items shipped directly to them. Selling items is just as easy – upload a photo, write a short description, set a price, and then list your product.

Depop chose AWS in order to move fast without needing a large operations team, following a DevOps approach. They use 12 distinct AWS services including Amazon S3 and Amazon CloudFront for image hosting, and Auto Scaling to deal with the unpredictable and fairly large changes in traffic throughout the day. Depop’s developers are able to support their own services in production without needing to call on a dedicated operations team.

Check out Depop’s Blog to keep up with the artists using the app!

Nextdoor (San Francisco)
Based in San Francisco, Nextdoor has helped more than 100,000 neighborhoods across the United States bring their communities closer together. In 2010, the founders of this startup were surprised to learn from a Pew research study that the majority of American adults knew only some (29%) or none (28%) of their neighbors by name. Recognizing an opportunity to bring back a sense of community to neighborhoods across the country, the idea for Nextdoor was born. Neighbors are using Nextdoor to ask questions, get to know one another, and exchange local advice and recommendations. For example, neighbors are able to help one another to:

  • Find trustworthy babysitters, plumbers, and dentists in the area.
  • Organize neighborhood events, such as garage sales and block parties.
  • Get assistance to find lost pets and missing packages.
  • Sell or give away items, like an old kitchen table or bike.
  • Report neighborhood crime and share safety concerns.

Nextdoor is also giving local agencies such as police and fire departments, and offices of emergency management the ability to connect with verified residents in their jurisdiction through a feature called Nextdoor for Public Agencies. This is incredibly beneficial for agencies to help residents with emergency preparedness, community engagement, crime prevention, and community policing. In his seminal work, Bowling Alone, Harvard Professor Robert Putnam learned that when social capital within a community is high, children do better in school, neighborhoods are safer, people prosper, the government is better, and people are happier and healthier overall. With a comprehensive list of helpful community guidelines, Nextdoor is creating stronger and safer neighborhoods with the power of technology. You can download the Nextdoor app for Android or iOS.

AWS  is the foundational infrastructure for both the online services in Nextdoor’s technology stack, and all of their offline data processing and analytics systems. Nextdoor uses over 25 different AWS services (Amazon EC2, Elastic Loud Balancing, Amazon Cloudfront, Amazon S3, Amazon DynamoDB, Amazon Redshift, and Amazon Kinesis to name a few) to quickly prototype, develop, and deploy new features for community members. Supporting millions of users in the US, Nextdoor runs their services across four AWS Regions worldwide, and has also recently expanded to Europe. In their own words, “Amazon makes it easy for us to flexibly grow our technology footprint with predictable costs in an automated fashion.”

Branch (Palo Alto)
The idea for Branch came in May 2014 when a group of Stanford business school graduates began working together to build and launch their own mobile app. They soon realized how challenging it was to grow their app, and saw that many of their friends were running into the same difficulties. The graduates saw the potential to create a deep linking platform to help apps get discovered, retain users, and grow exponentially. Branch reached its first million users within several months after its inception, and a little over a year later had climbed to one billion users and 5,000 apps. Companies such as Pinterest, Instacart, Mint, and Redfin are partnering with Branch to improve their user experience worldwide. Over 11,000 apps use the platform today.

As the number of smartphone users continues to increase, mobile apps are providing better user experiences, higher conversions, and better retention rates than the mobile web. The issue comes when mobile developers want to link users to the content they worked so hard to create – the transition between emails, ads, referrals, and more can often lead to broken experiences.

Mobile deep links allow users to share content that is within an app. Normal web links don’t work unless apps are downloaded on a device, and even then there is no standard way to find and share content as it is specific to every app. Branch allows content within apps to be shared just as they would be on the web. For example, imagine you are shopping for a fresh pair of shoes on the mobile web. You are ready to check out, but are prompted to download the store’s app to complete your purchase. Now that you’ve downloaded the app, you are brought back to the store’s homepage and need to restart your search from the beginning. With a Branch deep link, you instead would be linked directly back to checkout once you’ve installed the app, saving time and creating an overall better user experience.

Branch has grown exponentially over the past two years, and relies heavily on AWS to scale its infrastructure. Anticipating continued growth, Branch builds and maintains most of its infrastructure services with open source tools running on Amazon EC2 instances (Amazon API Gateway, Apache Kafka, Apache Zookeeper, Kubernetes, Redis, and Aerospike), and also use AWS services such as Elastic Load Balancing, Amazon CloudFront, Amazon Route 53, and Amazon RDS for PostgreSQL. These services allow Branch to maintain a 99.999% success rate on links with a latency of only 60 ms in the 99th percentile. To learn more about how they did this, read their recent blog post, Scaling to Billions of Requests a Day with AWS.

Tina Barr

Digital Citizens Slam Cloudflare For Enabling Piracy & Malware

Post Syndicated from Andy original https://torrentfreak.com/digital-citizens-slam-cloudflare-for-enabling-piracy-malware-160722/

For the past several years, one of the key educational strategies of entertainment industry companies has been to cast doubt on the credibility of so-called ‘pirate’ sites.

Previously there have been efforts to suggest that site operators make huge profits at the expense of artists who get nothing, but there are other recurring themes, mostly centered around fear.

One of the most prominent is that pirate sites are dangerous places to visit, with users finding themselves infected with viruses and malware while being subjected to phishing attacks.

This increasingly well-worn approach has just been revisited by consumer interest group Digital Citizens Alliance (DCA). In a new report titled ‘Enabling Malware’, the Hollywood-affiliated group calls out United States-based companies for helping pirate site operators “bait consumers and steal their personal information.”

“When you think of Internet crime, you probably imagine shadowy
individuals operating in Eastern Europe, China or Russia who come up with devious plans to steal your identity, trick you into turning over financial information or peddling counterfeits or stolen content. And you would be right,” DCA begin.

“But while many online criminals are based overseas, and often beyond the reach of U.S. prosecutors, they are aided by North American technology companies that ensure that overseas operators’ lifeline to the public – their websites – are available.”

DCA has examined the malware issue on pirate sites on previous occasions but this time around their attention turns to local service providers, including hosting platform Hawk Host and CDN company Cloudflare who (in)directly provide services to pirate sites.

“Are these companies doing anything illegal? No more than the landlord of an apartment isn’t doing anything illegal by renting to a drug dealer who has sellers showing up day and night,” DCA writes.

“But just like that landlord, more often than not these companies either look the other way or just don’t want to know.”

Faced with an investigative dead-end when it comes to tracing the operators of pirate sites, DCA criticizes Cloudflare for providing a service which effectively shields the true location of such platforms.

“In order to utilize CloudFlare’s CDN, DNS, and other protection services customers have to run all of their website traffic through the CloudFlare network. The end result of doing so is masked hosting information,” DCA reports.

“Instead of the actual hosting provider, IP address, domain name server, etc., a Whois search provides the information for CloudFlare’s network.”

To illustrate its point, DCA points to a pirate domain which presents itself as the famous Putlocker site but is actually a third-party clone operating from the dubious URL, Putlockerr.ac.

“From websites such as putlockerr.ac consumers are tricked into downloading malware. For example, when a consumer clicks to watch a movie, they are sent to a new screen in which they are told their video player is out of date and they must update it. The update, Digital Citizens’ researchers found, is the malware delivery mechanism.”

There’s little doubt that some of these low-level sites are in the malware game so DCA’s research is almost certainly sound. However, just like their colleagues at the MPAA and RIAA who regularly shift responsibility to Google, DCA lays the blame on Cloudflare, a more easily pinpointed target than a pirate site operator.

Unsurprisingly, Cloudflare isn’t particularly interested in getting involved in the online content-policing business.

“CloudFlare’s service protects and accelerates websites and applications. Because CloudFlare is not a host, we cannot control or remove customer content from the Internet,” the company said in a response to the report.

In common with Google, Cloudflare also says it makes efforts to stop the spread of malware but due to the nature of its business it is unable to physically remove content from the Internet.

“CloudFlare leaves the removal of online content to law enforcement agencies and complies with any legal requests made by the authorities,” the company notes.

“If we believe that one of our customers’ websites is distributing malware, CloudFlare will post an interstitial page that warns site visitors and asks them if they would like to proceed despite the warning. This practice follows established industry norms.”

Finally, while DCA says it has the safety of Internet users at heart, its malware report misses a great opportunity. Aside from criticizing companies like Cloudflare for not doing enough, it offers zero practical anti-malware advice to consumers.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Plan Bee

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/plan-bee/

Bees are important. I find myself saying this a lot and, slowly but surely, the media seems to be coming to this realisation too. The plight of the bee is finally being brought to our attention with increasing urgency.

A colony of bees make honey

Welcome to the house of buzz.

In the UK, bee colonies are suffering mass losses. Due to the use of bee-killing fertilisers and pesticides within the farming industry, the decline of pollen-rich plants, the destruction of hives by mites, and Colony Collapse Disorder (CCD), bees are in decline at a worrying pace.

Bee Collision

When you find the perfect GIF…

One hint of a silver lining is that increasing awareness of the crisis has led to a rise in the number of beekeeping hobbyists. As getting your hands on some bees is now as simple as ordering a box from the internet, keeping bees in your garden is a much less daunting venture than it once was. 

Taking this one step further, beekeepers are now using tech to monitor the conditions of their bees, improving conditions for their buzzy workforce while also recording data which can then feed into studies attempting to lessen the decline of the bee.

WDLabs recently donated a PiDrive to the Honey Bee Gardens Project in order to help beekeeper David Ammons and computer programmer Graham Total create The Hive Project, an electric beehive colony that monitors real-time bee data.

Electric Bee Hive

The setup records colony size, honey production, and bee health to help combat CCD.

Colony Collapse Disorder (CCD) is decidedly mysterious. Colonies hit by the disease seem to simply disappear. The hive itself often remains completely intact, full of honey at the perfect temperature, but… no bees. Dead or alive, the bees are nowhere to be found.

To try to combat this phenomenon, the electric hive offers 24/7 video coverage of the inner hive, while tracking the conditions of the hive population.

Bee bringing pollen into the hive

This is from the first live day of our instrumented beehive. This was the only bee we spotted all day that brought any pollen into the hive.

Ultimately, the team aim for the data to be crowdsourced, enabling researchers and keepers to gain the valuable information needed to fight CCD via a network of electric hives. While many people blame the aforementioned pollen decline and chemical influence for the rise of CCD, without the empirical information gathered from builds such as The Hive Project, the source of the problem, and therefore the solution, can’t be found.

Bee making honey

It has been brought to our attention that the picture here previously was of a wasp doing bee things. We have swapped it out for a bee.



Ammons and Total researched existing projects around the use of digital tech within beekeeping, and they soon understood that a broad analysis of bee conditions didn’t exist. While many were tracking hive weight, temperature, or honey population, there was no system in place for integrating such data collection into one place. This realisation spurred them on further.

“We couldn’t find any one project that took a broad overview of the whole area. Even if we don’t end up being the people who implement it, we intend to create a plan for a networked system of low-cost monitors that will assist both research and commercial beekeeping.”

With their mission statement firmly in place, the duo looked toward the Raspberry Pi as the brain of their colony. Finding the device small enough to fit within the hive without disruption, the power of the Pi allowed them to monitor multiple factors while also using the Pi Camera Module to record all video to the 314GB storage of the Western Digital PiDrive.

Data recorded by The Hive Project is vital to the survival of the bee, the growth of colony population, and an understanding of the conditions of the hive in changing climates. These are issues which affect us all. The honey bee is responsible for approximately 80% of pollination in the UK, and is essential to biodiversity. Here, I should hand over to a ‘real’ bee to explain more about the importance of bee-ing…

Bee Movie – Devastating Consequences – HD

Barry doesn’t understand why all the bee aren’t happy. Then, Vanessa shows Barry the devastating consequences of the bees being triumphant in their lawsuit against the human race.


The post Plan Bee appeared first on Raspberry Pi.

EFF Lawsuit Takes on DMCA Section 1201: Research and Technology Restrictions Violate the First Amendment

Post Syndicated from jake original http://lwn.net/Articles/695118/rss

The Electronic Frontier Foundation (EFF) has announced that it is suing the US government over provisions in the Digital Millennium Copyright Act (DMCA). The suit has been filed on behalf of Andrew “bunnie” Huang, who has a blog post describing the reasons behind the suit. The EFF also explained why these DMCA provisions should be ruled unconstitutional:
These provisions—contained in Section 1201 of the DMCA—make it unlawful for people to get around the software that restricts access to lawfully-purchased copyrighted material, such as films, songs, and the computer code that controls vehicles, devices, and appliances. This ban applies even where people want to make noninfringing fair uses of the materials they are accessing.

Ostensibly enacted to fight music and movie piracy, Section 1201 has long served to restrict people’s ability to access, use, and even speak out about copyrighted materials—including the software that is increasingly embedded in everyday things. The law imposes a legal cloud over our rights to tinker with or repair the devices we own, to convert videos so that they can play on multiple platforms, remix a video, or conduct independent security research that would reveal dangerous security flaws in our computers, cars, and medical devices. It criminalizes the creation of tools to let people access and use those materials.”

Canadian Man Behind Popular ‘Orcus RAT’

Post Syndicated from BrianKrebs original https://krebsonsecurity.com/2016/07/canadian-man-is-author-of-popular-orcus-rat/

Far too many otherwise intelligent and talented software developers these days apparently think they can get away with writing, selling and supporting malicious software and then couching their commerce as a purely legitimate enterprise. Here’s the story of how I learned the real-life identity of Canadian man who’s laboring under that same illusion as proprietor of one of the most popular and affordable tools for hacking into someone else’s computer.

Earlier this week I heard from Daniel Gallagher, a security professional who occasionally enjoys analyzing new malicious software samples found in the wild. Gallagher said he and members of @malwrhunterteam and @MalwareTechBlog recently got into a Twitter fight with the author of Orcus RAT, a tool they say was explicitly designed to help users remotely compromise and control computers that don’t belong to them.

A still frame from a Youtube video showing Orcus RAT's keylogging ability to steal passwords from Facebook users and other credentials.

A still frame from a Youtube video demonstrating Orcus RAT’s keylogging ability to steal passwords from Facebook and other sites.

The author of Orcus — a person going by the nickname “Ciriis Mcgraw” a.k.a. “Armada” on Twitter and other social networks — claimed that his RAT was in fact a benign “remote administration tool” designed for use by network administrators and not a “remote access Trojan” as critics charged. Gallagher and others took issue with that claim, pointing out that they were increasingly encountering computers that had been infected with Orcus unbeknownst to the legitimate owners of those machines.

The malware researchers noted another reason that Mcgraw couldn’t so easily distance himself from how his clients used the software: He and his team are providing ongoing technical support and help to customers who have purchased Orcus and are having trouble figuring out how to infect new machines or hide their activities online.

What’s more, the range of features and plugins supported by Armada, they argued, go well beyond what a system administrator would look for in a legitimate remote administration client like Teamviewer, including the ability to launch a keylogger that records the victim’s every computer keystroke, as well as a feature that lets the user peek through a victim’s Web cam and disable the light on the camera that alerts users when the camera is switched on.

A new feature of Orcus announced July 7 lets users configure the RAT so that it evades digital forensics tools used by malware researchers, including an anti-debugger and an option that prevents the RAT from running inside of a virtual machine.

Other plugins offered directly from Orcus’s tech support page (PDF) and authored by the RAT’s support team include a “survey bot” designed to “make all of your clients do surveys for cash;” a “USB/.zip/.doc spreader,” intended to help users “spread a file of your choice to all clients via USB/.zip/.doc macros;” a “Virustotal.com checker” made to “check a file of your choice to see if it had been scanned on VirusTotal;” and an “Adsense Injector,” which will “hijack ads on pages and replace them with your Adsense ads and disable adblocker on Chrome.”


Gallagher said he was so struck by the guy’s “smugness” and sheer chutzpah that he decided to look closer at any clues that Ciriis Mcgraw might have left behind as to his real-world identity and location. Sure enough, he found that Ciriis Mcgraw also has a Youtube account under the same name, and that a video Mcgraw posted in July 2013 pointed to a 33-year-old security guard from Toronto, Canada.

ciriis-youtubeGallagher noticed that the video — a bystander recording on the scene of a police shooting of a Toronto man — included a link to the domain policereview[dot]info. A search of the registration records attached to that Web site name show that the domain was registered to a John Revesz in Toronto and to the email address john.revesz@gmail.com.

A reverse WHOIS lookup ordered from Domaintools.com shows the same john.revesz@gmail.com address was used to register at least 20 other domains, including “thereveszfamily.com,” “johnrevesz.com, revesztechnologies[dot]com,” and — perhaps most tellingly —  “lordarmada.info“.

Johnrevesz[dot]com is no longer online, but this cached copy of the site from the indispensable archive.org includes his personal résumé, which states that John Revesz is a network security administrator whose most recent job in that capacity was as an IT systems administrator for TD Bank. Revesz’s LinkedIn profile indicates that for the past year at least he has served as a security guard for GardaWorld International Protective Services, a private security firm based in Montreal.

Revesz’s CV also says he’s the owner of the aforementioned Revesz Technologies, but it’s unclear whether that business actually exists; the company’s Web site currently redirects visitors to a series of sites promoting spammy and scammy surveys, come-ons and giveaways.


Contacted by KrebsOnSecurity, Revesz seemed surprised that I’d connected the dots, but beyond that did not try to disavow ownership of the Orcus RAT.

“Profit was never the intentional goal, however with the years of professional IT networking experience I have myself, knew that proper correct development and structure to the environment is no free venture either,” Revesz wrote in reply to questions about his software. “Utilizing my 15+ years of IT experience I have helped manage Orcus through its development.”

Revesz continued:

“As for your legalities question.  Orcus Remote Administrator in no ways violates Canadian laws for software development or sale.  We neither endorse, allow or authorize any form of misuse of our software.  Our EULA [end user license agreement] and TOS [terms of service] is very clear in this matter. Further we openly and candidly work with those prudent to malware removal to remove Orcus from unwanted use, and lock out offending users which may misuse our software, just as any other company would.”

Revesz said none of the aforementioned plugins were supported by Orcus, and were all developed by third-party developers, and that “Orcus will never allow implementation of such features, and or plugins would be outright blocked on our part.”

In an apparent contradiction to that claim, plugins that allow Orcus users to disable the Webcam light on a computer running the software and one that enables the RAT to be used as a “stresser” to knock sites and individuals users offline are available directly from Orcus Technologies’ Github page.

Revesz’s also offers a service to help people cover their tracks online. Using his alter ego “Armada” on the hacker forum Hackforums[dot]net, Revesz also sells a “bulletproof dynamic DNS service” that promises not to keep records of customer activity.

Dynamic DNS services allow users to have Web sites hosted on servers that frequently change their Internet addresses. This type of service is useful for people who want to host a Web site on a home-based Internet address that may change from time to time, because dynamic DNS services can be used to easily map the domain name to the user’s new Internet address whenever it happens to change.


Unfortunately, these dynamic DNS providers are extremely popular in the attacker community, because they allow bad guys to keep their malware and scam sites up even when researchers manage to track the attacking IP address and convince the ISP responsible for that address to disconnect the malefactor. In such cases, dynamic DNS allows the owner of the attacking domain to simply re-route the attack site to another Internet address that he controls.

Free dynamic DNS providers tend to report or block suspicious or outright malicious activity on their networks, and may well share evidence about the activity with law enforcement investigators. In contrast, Armada’s dynamic DNS service is managed solely by him, and he promises in his ad on Hackforums that the service — to which he sells subscriptions of various tiers for between $30-$150 per year — will not log customer usage or report anything to law enforcement.

According to writeups by Kaspersky Lab and Heimdal Security, Revesz’s dynamic DNS service has been seen used in connection with malicious botnet activity by another RAT known as Adwind.  Indeed, Revesz’s service appears to involve the domain “nullroute[dot]pw”, which is one of 21 domains registered to a “Ciriis Mcgraw,” (as well as orcus[dot]pw and orcusrat[dot]pw).

I asked Gallagher (the researcher who originally tipped me off about Revesz’s activities) whether he was persuaded at all by Revesz’s arguments that Orcus was just a tool and that Revesz wasn’t responsible for how it was used.

Gallagher said he and his malware researcher friends had private conversations with Revesz in which he seemed to acknowledge that some aspects of the RAT went too far, and promised to release software updates to remove certain objectionable functionalities. But Gallagher said those promises felt more like the actions of someone trying to cover himself.

“I constantly try to question my assumptions and make sure I’m playing devil’s advocate and not jumping the gun,” Gallagher said. “But I think he’s well aware that what he’s doing is hurting people, it’s just now he knows he’s under the microscope and trying to do and say enough to cover himself if it ever comes down to him being questioned by law enforcement.”

Detecting Spoofed Messages Using Clock Skew

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/07/detecting_spoof_1.html

Two researchers are working on a system to isn’t new).

To perform that fingerprinting, they use a weird characteristic of all computers: tiny timing errors known as “clock skew.” Taking advantage of the fact that those errors are different in every computer­including every computer inside a car­the researchers were able to assign a fingerprint to each ECU based on its specific clock skew. The CIDS’ device then uses those fingerprints to differentiate between the ECUs, and to spot when one ECU impersonates another, like when a hacker corrupts the vehicle’s radio system to spoof messages that are meant to come from a brake pedal or steering system.

Paper: “Fingerprinting Electronic Control Units for Vehicle Intrusion Detection,” by Kyong-Tak Cho and Kang G. Shin.

Abstract: As more software modules and external interfaces are getting added on vehicles, new attacks and vulnerabilities are emerging. Researchers have demonstrated how to compromise in-vehicle Electronic Control Units (ECUs) and control the vehicle maneuver. To counter these vulnerabilities, various types of defense mechanisms have been proposed, but they have not been able to meet the need of strong protection for safety-critical ECUs against in-vehicle network attacks. To mitigate this deficiency, we propose an anomaly-based intrusion detection system (IDS), called Clock-based IDS (CIDS). It measures and then exploits the intervals of periodic in-vehicle messages for fingerprinting ECUs. The thus-derived fingerprints are then used for constructing a baseline of ECUs’ clock behaviors with the Recursive Least Squares (RLS) algorithm. Based on this baseline, CIDS uses Cumulative Sum (CUSUM) to detect any abnormal shifts in the identification errors — a clear sign of intrusion. This allows quick identification of in-vehicle network intrusions with a low false-positive rate of 0.055%. Unlike state-of-the-art IDSs, if an attack is detected, CIDS’s fingerprinting of ECUs also facilitates a rootcause analysis; identifying which ECU mounted the attack. Our experiments on a CAN bus prototype and on real vehicles have shown CIDS to be able to detect a wide range of in-vehicle network attacks.

Carbanak Gang Tied to Russian Security Firm?

Post Syndicated from BrianKrebs original https://krebsonsecurity.com/2016/07/carbanak-gang-tied-to-russian-security-firm/

Among the more plunderous cybercrime gangs is a group known as “Carbanak,” Eastern European hackers blamed for stealing more than a billion dollars from banks. Today we’ll examine some compelling clues that point to a connection between the Carbanak gang’s staging grounds and a Russian security firm that claims to work with some of the world’s largest brands in cybersecurity.

The Carbanak gang derives its name from the banking malware used in countless high-dollar cyberheists. The gang is perhaps best known for hacking directly into bank networks using poisoned Microsoft Office files, and then using that access to force bank ATMs into dispensing cash. Russian security firm Kaspersky Lab estimates that the Carbanak Gang has likely stolen upwards of USD $1 billion — but mostly from Russian banks.

Image: Kaspersky

Image: Kaspersky

I recently heard from security researcher Ron Guilmette, an anti-spam crusader whose sleuthing has been featured on several occasions on this site and in the blog I wrote for The Washington Post. Guilmette said he’d found some interesting commonalities in the original Web site registration records for a slew of sites that all have been previously responsible for pushing malware known to be used by the Carbanak gang.

For example, the domains “weekend-service[dot]com” “coral-trevel[dot]com” and “freemsk-dns[dot]com” all were documented by multiple security firms as distribution hubs for Carbanak crimeware. Historic registration or “WHOIS” records maintained by Domaintools.com for all three domains contain the same phone and fax numbers for what appears to be a Xicheng Co. in China — 1066569215 and 1066549216, each preceded by either a +86 (China’s country code) or +01 (USA). Each domain record also includes the same contact address: “williamdanielsen@yahoo.com“.

According to data gathered by ThreatConnect, a threat intelligence provider [full disclosure: ThreatConnect is an advertiser on this blog], at least 484 domains were registered to the williamdanielsen@yahoo.com address or to one of 26 other email addresses that listed the same phone numbers and Chinese company.  “At least 304 of these domains have been associated with a malware plugin [that] has previously been attributed to Carbanak activity,” ThreatConnect told KrebsOnSecurity.

Going back to those two phone numbers, 1066569215 and 1066549216; at first glance they appear to be sequential, but closer inspection reveals they differ slightly in the middle. Among the very few domains registered to those Chinese phone numbers that haven’t been seen launching malware is a Web site called “cubehost[dot]biz,” which according to records was registered in Sept. 2013 to a 28-year-old Artem Tveritinov of Perm, Russia.

Cubehost[dot]biz is a dormant site, but it appears to be the sister property to a Russian security firm called Infocube (also spelled “Infokube”). The InfoKube web site — infokube.ru — is also registered to Mr. Tveritinov of Perm, Russia; there are dozens of records in the WHOIS history for infokube.ru, but only the oldest, original record from 2011 contains the email address atveritinov@gmail.com. 

That same email address was used to register a four-year-old profile account at the popular Russian social networking site Vkontakte for Artyom “LioN” Tveritinov from Perm, Russia. The “LioN” bit is an apparent reference to an Infokube anti-virus product by the same name.

Mr. Tveritinov is quoted as “the CEO of InfoKub” in a press release from FalconGaze, a Moscow-based data security firm that partnered with the InfoKube to implement “data protection and employee monitoring” at a Russian commercial research institute. InfoKube’s own press releases say the company also has been hired to develop “a system to protect information from unauthorized access” undertaken for the City of Perm, Russia, and for consulting projects relating to “information security” undertaken for and with the State Ministry of Interior of Russia.

The company’s Web site claims that InfoKube partners with a variety of established security firms — including Symantec and Kaspersky. The latter confirmed InfoKube was “a very minor partner” of Kaspersky’s, mostly involved in systems integration. Zyxel, another partner listed on InfoKube’s partners page, said it had no partners named InfoKube. Slovakia-based security firm ESET said “Infokube is not and has never been a partner of ESET in Russia.”

Presented with Guilmette’s findings, I was keen to ask Mr. Tveritinov how the phone and fax numbers for a Chinese entity whose phone number has become synonymous with cybercrime came to be copied verbatim into Cubehost’s Web site registration records. I sent requests for comment to Mr. Tveritinov via email and through his Vkontakte page.

Initially, I received a friendly reply from Mr. Tveritinov via email expressing curiosity about my inquiry, and asking how I’d discovered his email address. In the midst of composing a more detailed follow-up reply, I noticed that the Vkontakte social networking profile that Tveritinov had maintained regularly since April 2012 was being permanently deleted before my eyes. Tveritinov’s profile page and photos actually disappeared from the screen I had up on one monitor as I was in the process of composing an email to him in the other.

Not long after Tveritinov’s Vkontakte page was deleted, I heard from him via email. Ignoring my question about the sudden disappearance of his social media account, Tveritinov said he never registered cubehost.biz and that his personal information was stolen and used in the registration records for cubehost.biz.

“Our company never did anything illegal, and conducts all activities according to the laws of Russian Federation,” Tveritinov said in an email. “Also, it’s quite stupid to use our own personal data to register domains to be used for crimes, as [we are] specialists in the information security field.”

Turns out, InfoKube/Cubehost also runs an entire swath of Internet addresses managed by Petersburg Internet Network (PIN) Ltd., an ISP in Saint Petersburg, Russia that has a less-than-stellar reputation for online badness.

For example, many of the aforementioned domain names that security firms have conclusively tied to Carbanak distribution (e.g., freemsk-dns[dot].com) are hosted in Internet address space assigned to Cubehost. A search of the RIPE registration records for the block of addresses at turns up a physical address in Ras al Khaimah, an emirate of the United Arab Emirates (UAE) that has sought to build a reputation as a tax shelter and a place where it is easy to create completely anonymous offshore companies. The same listing says abuse complaints about Internet addresses in that address block should be sent to “info@cubehost.biz.”

This PIN hosting provider in St. Petersburg has achieved a degree of notoriety in its own right and is probably worthy of additional scrutiny given its reputation as a haven for all kinds of online ne’er-do-wells. In fact, Doug Madory, director of Internet analysis at Internet performance management firm Dyn, has referred to the company as “…perhaps the leading contender for being named the Mos Eisley of the Internet” (a clever reference to the spaceport full of alien outlaws in the 1977 movie Star Wars).

Madory explained that PIN’s hard-won bad reputation stems from the ISP’s documented propensity for absconding with huge chunks of Internet address blocks that don’t actually belong to it, and then re-leasing that purloined Internet address space to spammers and other Internet miscreants.

For his part, Guilmette points to a decade’s worth of other nefarious activity going on at the Internet address space apparently assigned to Tveritinov and his company. For example, in 2013 Microsoft seized a bunch of domains parked there that were used as controllers for Citadel online banking malware, and all of those domains had the same “Xicheng Co.” data in their WHOIS records.  A Sept. 2011 report on the security blog dynamoo.com notes several domains with that Xicheng Co. WHOIS information showing up in online banking heists powered by the Sinowal banking Trojan way back in 2006.

“If Mr. Tveritinov, has either knowledge of, or direct involvement in even a fraction of the criminal goings-on within his address block, then the possibility that he may perhaps also have a role in other and additional criminal enterprises… including perhaps even the Carbanak cyber banking heists… becomes all the more plausible and probable,” Guilmette said.

It remains unclear to what extent the Carbanak gang is still active. Last month, authorities in Russia arrested 50 people allegedly tied to the organized cybercrime group, whose members reportedly hail from Russia, China, Ukraine and other parts of Europe. The action was billed as the biggest ever crackdown on financial hackers in Russia.

Weekly roundup: doing better

Post Syndicated from Eevee original https://eev.ee/dev/2016/07/17/weekly-roundup-doing-better/

July is themeless.

I’m doing better!

  • art: Daily Pokémon continue, perhaps a bit too sporadically to be called “daily” but whatever. Also a sunset painting that came out really cool, damn. And this painting of a new Sun/Moon Pokémon. And a lot of other doodling.

    I’ve been trying out a bunch of Krita brushes, and I’m not quite happy with any of them, but I’m getting enough of a feel for what I like to start making my own.

  • twitter: I finally automated @leafeon_brands, my blocklist of advertisers. It now automatically blocks ads shown to my primary account.

  • blog: I wrote some stuff about color, which somehow took way longer than I’d expected — I was hoping for a day or two, and it feels like it took most of the week. That wraps up my June posts, so, er, I really gotta get moving on July.

  • book: I had a book idea that seems to have a lot more staying power than the last one or two, and I did a bunch of research and planning for it. I’ll talk about it later, when I have something to show.

  • flora: I was dragged into fixing shipping for the Floraverse store, which thankfully mostly worked itself out before it became too much of a nightmare.

I’m working on two posts at the moment, so I should be able to catch up soon. As soon as I can, I want to find some blocks of time to experiment with art, work on Runed Awakening and this book idea, and spruce up veekun.

Some stuff about color

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/16/some-stuff-about-color/

I’ve been trying to paint more lately, which means I have to actually think about color. Like an artist, I mean. I’m okay at thinking about color as a huge nerd, but I’m still figuring out how to adapt that.

While I work on that, here is some stuff about color from the huge nerd perspective, which may or may not be useful or correct.


Hues are what we usually think of as “colors”, independent of how light or dim or pale they are: general categories like purple and orange and green.

Strictly speaking, a hue is a specific wavelength of light. I think it’s really weird to think about light as coming in a bunch of wavelengths, so I try not to think about the precise physical mechanism too much. Instead, here’s a rainbow.

rainbow spectrum

These are all the hues the human eye can see. (Well, the ones this image and its colorspace and your screen can express, anyway.) They form a nice spectrum, which wraps around so the two red ends touch.

(And here is the first weird implication of the physical interpretation: purple is not a real color, in the sense that there is no single wavelength of light that we see as purple. The actual spectrum runs from red to blue; when we see red and blue simultaneously, we interpret it as purple.)

The spectrum is divided by three sharp lines: yellow, cyan, and magenta. The areas between those lines are largely dominated by red, green, and blue. These are the two sets of primary colors, those hues from which any others can be mixed.

Red, green, and blue (RGB) make up the additive primary colors, so named because they add light on top of black. LCD screens work exactly this way: each pixel is made up of three small red, green, and blue rectangles. It’s also how the human eye works, which is fascinating but again a bit too physical.

Cyan, magenta, and yellow are the subtractive primary colors, which subtract light from white. This is how ink, paint, and other materials work. When you look at an object, you’re seeing the colors it reflects, which are the colors it doesn’t absorb. A red ink reflects red light, which means it absorbs green and blue light. Cyan ink only absorbs red, and yellow ink only absorbs blue; if you mix them, you’ll get ink that absorbs both red and blue green, and thus will appear green. A pure black is often included to make CMYK; mixing all three colors would technically get you black, but it might be a bit muddy and would definitely use three times as much ink.

The great kindergarten lie

Okay, you probably knew all that. What confused me for the longest time was how no one ever mentioned the glaring contradiction with what every kid is taught in grade school art class: that the primary colors are red, blue, and yellow. Where did those come from, and where did they go?

I don’t have a canonical answer for that, but it does make some sense. Here’s a comparison: the first spectrum is a full rainbow, just like the one above. The second is the spectrum you get if you use red, blue, and yellow as primary colors.

a full spectrum of hues, labeled with color names that are roughly evenly distributed
a spectrum of hues made from red, blue, and yellow

The color names come from xkcd’s color survey, which asked a massive number of visitors to give freeform names to a variety of colors. One of the results was a map of names for all the fully-saturated colors, providing a rough consensus for how English speakers refer to them.

The first wheel is what you get if you start with red, green, and blue — but since we’re talking about art class here, it’s really what you get if you start with cyan, magenta, and yellow. The color names are spaced fairly evenly, save for blue and green, which almost entirely consume the bottom half.

The second wheel is what you get if you start with red, blue, and yellow. Red has replaced magenta, and blue has replaced cyan, so neither color appears on the wheel — red and blue are composites in the subtractive model, and you can’t make primary colors like cyan or magenta out of composite colors.

Look what this has done to the distribution of names. Pink and purple have shrunk considerably. Green is half its original size and somewhat duller. Red, orange, and yellow now consume a full half of the wheel.

There’s a really obvious advantage here, if you’re a painter: people are orange.

Yes, yes, we subdivide orange into a lot of more specific colors like “peach” and “brown”, but peach is just pale orange, and brown is just dark orange. Everyone, of every race, is approximately orange. Sunburn makes you redder; fear and sickness make you yellower.

People really like to paint other people, so it makes perfect sense to choose primary colors that easily mix to make people colors.

Meanwhile, cyan and magenta? When will you ever use those? Nothing in nature remotely resembles either of those colors. The true color wheel is incredibly, unnaturally bright. The reduced color wheel is much more subdued, with only one color that stands out as bright: yellow, the color of sunlight.

You may have noticed that I even cheated a little bit. The blue in the second wheel isn’t the same as the blue from the first wheel; it’s halfway between cyan and blue, a tertiary color I like to call azure. True pure blue is just as unnatural as true cyan; azure is closer to the color of the sky, which is reflected as the color of water.

People are orange. Sunlight is yellow. Dirt and rocks and wood are orange. Skies and oceans are blue. Blush and blood and sunburn are red. Sunsets are largely red and orange. Shadows are blue, the opposite of yellow. Plants are green, but in sun or shade they easily skew more blue or yellow.

All of these colors are much easier to mix if you start with red, blue, and yellow. It may not match how color actually works, but it’s a useful approximation for humans. (Anyway, where will you find dyes that are cyan or magenta? Blue is hard enough.)

I’ve actually done some painting since I first thought about this, and would you believe they sell paints in colors other than bright red, blue, and yellow? You can just pick whatever starting colors you want and the whole notion of “primary” goes a bit out the window. So maybe this is all a bit moot.

More on color names

The way we name colors fascinates me.

A “basic color term” is a single, unambiguous, very common name for a group of colors. English has eleven: red, orange, yellow, green, blue, purple, black, white, gray, pink, and brown.

Of these, orange is the only tertiary hue; brown is the only name for a specifically low-saturation color; pink and grey are the only names for specifically light shades. I can understand grey — it’s handy to have a midpoint between black and white — but the other exceptions are quite interesting.

Looking at the first color wheel again, “blue” and “green” together consume almost half of the spectrum. That seems reasonable, since they’re both primary colors, but “red” is relatively small; large chunks of it have been eaten up by its neighbors.

Orange is a tertiary color in either RGB or CMYK: it’s a mix of red and yellow, a primary and secondary color. Yet we ended up with a distinct name for it. I could understand if this were to give white folks’ skin tones their own category, similar to the reasons for the RBY art class model, but we don’t generally refer to white skin as “orange”. So where did this color come from?

Sometimes I imagine a parallel universe where we have common names for other tertiary colors. How much richer would the blue/green side of the color wheel be if “chartreuse” or “azure” were basic color terms? Can you even imagine treating those as distinct colors, not just variants or green or blue? That’s exactly how we treat orange, even though it’s just a variant of red.

I can’t speak to whether our vocabulary truly influences how we perceive or think (and that often-cited BBC report seems to have no real source). But for what it’s worth, I’ve been trying to think of “azure” as distinct for a few years now, and I’ve had a much easier time dealing with blues in art and design. Giving the cyan end of blue a distinct and common name has given me an anchor, something to arrange thoughts around.

Come to think of it, yellow is an interesting case as well. A decent chunk of the spectrum was ultimately called “yellow” in the xkcd map; here’s that chunk zoomed in a bit.

full range of xkcd yellows

How much of this range would you really call yellow, rather than green (or chartreuse!) or orange? Yellow is a remarkably specific color: mixing it even slightly with one of its neighbors loses some of its yellowness, and darkening it moves it swiftly towards brown.

I wonder why this is. When we see a yellowish-orange, are we inclined to think of it as orange because it looks like orange under yellow sunlight? Is it because yellow is between red and green, and the red and green receptors in the human eye pick up on colors that are very close together?

Most human languages develop their color terms in a similar order, with a split between blue and green often coming relatively late in a language’s development. Of particular interest to me is that orange and pink are listed as a common step towards the end — I’m really curious as to whether that happens universally and independently, or it’s just influence from Western color terms.

I’d love to see a list of the basic color terms in various languages, but such a thing is proving elusive. There’s a neat map of how many colors exist in various languages, but it doesn’t mention what the colors are. It’s easy enough to find a list of colors in various languages, like this one, but I have no idea whether they’re basic in each language. Note also that this chart only has columns for English’s eleven basic colors, even though Russian and several other languages have a twelfth basic term for azure. The page even mentions this, but doesn’t include a column for it, which seems ludicrous in an “omniglot” table.

The only language I know many color words in is Japanese, so I went delving into some of its color history. It turns out to be a fascinating example, because you can see how the color names developed right in the spelling of the words.

See, Japanese has a couple different types of words that function like adjectives. Many of the most common ones end in -i, like kawaii, and can be used like verbs — we would translate kawaii as “cute”, but it can function just as well as “to be cute”. I’m under the impression that -i adjectives trace back to Old Japanese, and new ones aren’t created any more.

That’s really interesting, because to my knowledge, only five Japanese color names are in this form: kuroi (black), shiroi (white), akai (red), aoi (blue), and kiiroi (yellow). So these are, necessarily, the first colors the language could describe. If you compare to the chart showing progression of color terms, this is the bottom cell in column IV: white, red, yellow, green/blue, and black.

A great many color names are compounds with iro, “color” — for example, chairo (brown) is cha (tea) + iro. Of the five basic terms above, kiiroi is almost of that form, but unusually still has the -i suffix. (You might think that shiroi contains iro, but shi is a single character distinct from i. kiiroi is actually written with the kanji for iro.) It’s possible, then, that yellow was the latest of these five words — and that would give Old Japanese words for white, red/yellow, green/blue, and black, matching the most common progression.

Skipping ahead some centuries, I was surprised to learn that midori, the word for green, was only promoted to a basic color fairly recently. It’s existed for a long time and originally referred to “greenery”, but it was considered to be a shade of blue (ao) until the Allied occupation after World War II, when teaching guidelines started to mention a blue/green distinction. (I would love to read more details about this, if you have any; the West’s coming in and adding a new color is a fascinating phenomenon, and I wonder what other substantial changes were made to education.)

Japanese still has a number of compound words that use ao (blue!) to mean what we would consider green: aoshingou is a green traffic light, aoao means “lush” in a natural sense, aonisai is a greenhorn (presumably from the color of unripe fruit), aojiru is a drink made from leafy vegetables, and so on.

This brings us to at least six basic colors, the fairly universal ones: black, white, red, yellow, blue, and green. What others does Japanese have?

From here, it’s a little harder to tell. I’m not exactly fluent and definitely not a native speaker, and resources aimed at native English speakers are more likely to list colors familiar to English speakers. (I mean, until this week, I never knew just how common it was for aoi to mean green, even though midori as a basic color is only about as old as my parents.)

I do know two curious standouts: pinku (pink) and orenji (orange), both English loanwords. I can’t be sure that they’re truly basic color terms, but they sure do come up a lot. The thing is, Japanese already has names for these colors: momoiro (the color of peach — flowers, not the fruit!) and daidaiiro (the color of, um, an orange). Why adopt loanwords for concepts that already exist?

I strongly suspect, but cannot remotely qualify, that pink and orange weren’t basic colors until Western culture introduced the idea that they could be — and so the language adopted the idea and the words simultaneously. (A similar thing happened with grey, natively haiiro and borrowed as guree, but in my limited experience even the loanword doesn’t seem to be very common.)

Based on the shape of the words and my own unqualified guesses of what counts as “basic”, the progression of basic colors in Japanese seems to be:

  1. black, white, red (+ yellow), blue (+ green) — Old Japanese
  2. yellow — later Old Japanese
  3. brown — sometime in the past millenium
  4. green — after WWII
  5. pink, orange — last few decades?

And in an effort to put a teeny bit more actual research into this, I searched the Leeds Japanese word frequency list (drawn from websites, so modern Japanese) for some color words. Here’s the rank of each. Word frequency is generally such that the actual frequency of a word is inversely proportional to its rank — so a word in rank 100 is twice as common as a word in rank 200. The five -i colors are split into both noun and adjective forms, so I’ve included an adjusted rank that you would see if they were counted as a single word, using ab / (a + b).

  • white: 1010 ≈ 1959 (as a noun) + 2083 (as an adjective)
  • red: 1198 ≈ 2101 (n) + 2790 (adj)
  • black: 1253 ≈ 2017 (n) + 3313 (adj)
  • blue: 1619 ≈ 2846 (n) + 3757 (adj)
  • green: 2710
  • yellow: 3316 ≈ 6088 (n) + 7284 (adj)
  • orange: 4732 (orenji), n/a (daidaiiro)
  • pink: 4887 (pinku), n/a (momoiro)
  • purple: 6502 (murasaki)
  • grey: 8472 (guree), 10848 (haiiro)
  • brown: 10622 (chairo)
  • gold: 12818 (kin’iro)
  • silver: n/a (gin’iro)
  • navy: n/a (kon)

n/a” doesn’t mean the word is never used, only that it wasn’t in the top 15,000.

I’m not sure where the cutoff is for “basic” color terms, but it’s interesting to see where the gaps lie. I’m especially surprised that yellow is so far down, and that purple (which I hadn’t even mentioned here) is as high as it is. Also, green is above yellow, despite having been a basic color for less than a century! Go, green.

For comparison, in American English:

  • black: 254
  • white: 302
  • red: 598
  • blue: 845
  • green: 893
  • yellow: 1675
  • brown: 1782
  • golden: 1835
  • græy: 1949
  • pink: 2512
  • orange: 3171
  • purple: 3931
  • silver: n/a
  • navy: n/a

Don’t read too much into the actual ranks; the languages and corpuses are both very different.

Color models

There are numerous ways to arrange and identify colors, much as there are numerous ways to identify points in 3D space. There are also benefits and drawbacks to each model, but I’m often most interested in how much sense the model makes to me as a squishy human.

RGB is the most familiar to anyone who does things with computers — it splits a color into its red, green, and blue channels, and measures the amount of each from “none” to “maximum”. (HTML sets this range as 0 to 255, but you could just as well call it 0 to 1, or -4 to 7600.)

RGB has a couple of interesting problems. Most notably, it’s kind of difficult to read and write by hand. You can sort of get used to how it works, though I’m still not particularly great at it. I keep in mind these rules:

  1. The largest channel is roughly how bright the color is.

    This follows pretty easily from the definition of RGB: it’s colored light added on top of black. The maximum amount of every color makes white, so less than the maximum must be darker, and of course none of any color stays black.

  2. The smallest channel is how pale (desaturated) the color is.

    Mixing equal amounts of red, green, and blue will produce grey. So if the smallest channel is green, you can imagine “splitting” the color between a grey (green, green, green), and the leftovers (red – green, 0, blue – green). Mixing grey with a color will of course make it paler — less saturated, closer to grey — so the bigger the smallest channel, the greyer the color.

  3. Whatever’s left over tells you the hue.

It might be time for an illustration. Consider the color (50%, 62.5%, 75%). The brightness is “capped” at 75%, the largest channel; the desaturation is 50%, the smallest channel. Here’s what that looks like.

illustration of the color (50%, 62.5%, 75%) split into three chunks of 50%, 25%, and 25%

Cutting out the grey and the darkness leaves a chunk in the middle of actual differences between the colors. Note that I’ve normalized it to (0%, 50%, 100%), which is the percentage of that small middle range. Removing the smallest and largest channels will always leave you with a middle chunk where at least one channel is 0% and at least one channel is 100%. (Or it’s grey, and there is no middle chunk.)

The odd one out is green at 50%, so the hue of this color is halfway between cyan (green + blue) and blue. That hue is… azure! So this color is a slightly darkened and fairly dull azure. (The actual amount of “greyness” is the smallest relative to the largest, so in this case it’s about ⅔ grey, or about ⅓ saturated.) Here’s that color.

a slightly darkened, fairly dull azure

This is a bit of a pain to do in your head all the time, so why not do it directly?

HSV is what you get when you directly represent colors as hue, saturation, and value. It’s often depicted as a cylinder, with hue represented as an angle around the color wheel: 0° for red, 120° for green, and 240° for blue. Saturation ranges from grey to a fully-saturated color, and value ranges from black to, er, the color. The azure above is (210°, ⅓, ¾) in HSV — 210° is halfway between 180° (cyan) and 240° (blue), ⅓ is the saturation measurement mentioned before, and ¾ is the largest channel.

It’s that hand-waved value bit that gives me trouble. I don’t really know how to intuitively explain what value is, which makes it hard to modify value to make the changes I want. I feel like I should have a better grasp of this after a year and a half of drawing, but alas.

I prefer HSL, which uses hue, saturation, and lightness. Lightness ranges from black to white, with the unperturbed color in the middle. Here’s lightness versus value for the azure color. (Its lightness is ⅝, the average of the smallest and largest channels.)

comparison of lightness and value for the azure color

The lightness just makes more sense to me. I can understand shifting a color towards white or black, and the color in the middle of that bar feels related to the azure I started with. Value looks almost arbitrary; I don’t know where the color at the far end comes from, and it just doesn’t seem to have anything to do with the original azure.

I’d hoped Wikipedia could clarify this for me. It tells me value is the same thing as brightness, but the mathematical definition on that page matches the definition of intensity from the little-used HSI model. I looked up lightness instead, and the first sentence says it’s also known as value. So lightness is value is brightness is intensity, but also they’re all completely different.

Wikipedia also says that HSV is sometimes known as HSB (where the “B” is for “brightness”), but I swear I’ve only ever seen HSB used as a synonym for HSL. I don’t know anything any more.

Oh, and in case you weren’t confused enough, the definition of “saturation” is different in HSV and HSL. Good luck!

Wikipedia does have some very nice illustrations of HSV and HSL, though, including depictions of them as a cone and double cone.

(Incidentally, you can use HSL directly in CSS now — there are hsl() and hsla() CSS3 functions which evaluate as colors. Combining these with Sass’s scale-color() function makes it fairly easy to come up with decent colors by hand, without having to go back and forth with an image editor. And I can even sort of read them later!)

An annoying problem with all of these models is that the idea of “lightness” is never quite consistent. Even in HSL, a yellow will appear much brighter than a blue with the same saturation and lightness. You may even have noticed in the RGB split diagram that I used dark red and green text, but light blue — the pure blue is so dark that a darker blue on top is hard to read! Yet all three colors have the same lightness in HSL, and the same value in HSV.

Clearly neither of these definitions of lightness or brightness or whatever is really working. There’s a thing called luminance, which is a weighted sum of the red, green, and blue channels that puts green as a whopping ten times brighter than blue. It tends to reflect how bright colors actually appear.

Unfortunately, luminance and related values are only used in fairly obscure color models, like YUV and Lab. I don’t mean “obscure” in the sense that nobody uses them, but rather that they’re very specialized and not often seen outside their particular niches: YUV is very common in video encoding, and Lab is useful for serious photo editing.

Lab is pretty interesting, since it’s intended to resemble how human vision works. It’s designed around the opponent process theory, which states that humans see color in three pairs of opposites: black/white, red/green, and yellow/blue. The idea is that we perceive color as somewhere along these axes, so a redder color necessarily appears less green — put another way, while it’s possible to see “yellowish green”, there’s no such thing as a “yellowish blue”.

(I wonder if that explains our affection for orange: we effectively perceive yellow as a fourth distinct primary color.)

Lab runs with this idea, making its three channels be lightness (but not the HSL lightness!), a (green to red), and b (blue to yellow). The neutral points for a and b are at zero, with green/blue extending in the negative direction and red/yellow extending in the positive direction.

Lab can express a whole bunch of colors beyond RGB, meaning they can’t be shown on a monitor, or even represented in most image formats. And you now have four primary colors in opposing pairs. That all makes it pretty weird, and I’ve actually never used it myself, but I vaguely aspire to do so someday.

I think those are all of the major ones. There’s also XYZ, which I think is some kind of master color model. Of course there’s CMYK, which is used for printing, but it’s effectively just the inverse of RGB.

With that out of the way, now we can get to the hard part!


I called RGB a color model: a way to break colors into component parts.

Unfortunately, RGB alone can’t actually describe a color. You can tell me you have a color (0%, 50%, 100%), but what does that mean? 100% of what? What is “the most blue”? More importantly, how do you build a monitor that can display “the most blue” the same way as other monitors? Without some kind of absolute reference point, this is meaningless.

A color space is a color model plus enough information to map the model to absolute real-world colors. There are a lot of these. I’m looking at Krita’s list of built-in colorspaces and there are at least a hundred, most of them RGB.

I admit I’m bad at colorspaces and have basically done my best to not ever have to think about them, because they’re a big tangled mess and hard to reason about.

For example! The effective default RGB colorspace that almost everything will assume you’re using by default is sRGB, specifically designed to be this kind of global default. Okay, great.

Now, sRGB has gamma built in. Gamma correction means slapping an exponent on color values to skew them towards or away from black. The color is assumed to be in the range 0–1, so any positive power will produce output from 0–1 as well. An exponent greater than 1 will skew towards black (because you’re multiplying a number less than 1 by itself), whereas an exponent less than 1 will skew away from black.

What this means is that halfway between black and white in sRGB isn’t (50%, 50%, 50%), but around (73%, 73%, 73%). Here’s a great example, borrowed from this post (with numbers out of 255):

alternating black and white lines alongside gray squares of 128 and 187

Which one looks more like the alternating bands of black and white lines? Surely the one you pick is the color that’s actually halfway between black and white.

And yet, in most software that displays or edits images, interpolating white and black will give you a 50% gray — much darker than the original looked. A quick test is to scale that image down by half and see whether the result looks closer to the top square or the bottom square. (Firefox, Chrome, and GIMP get it wrong; Krita gets it right.)

The right thing to do here is convert an image to a linear colorspace before modifying it, then convert it back for display. In a linear colorspace, halfway between white and black is still 50%, but it looks like the 73% grey. This is great fun: it involves a piecewise function and an exponent of 2.4.

It’s really difficult to reason about this, for much the same reason that it’s hard to grasp text encoding problems in languages with only one string type. Ultimately you still have an RGB triplet at every stage, and it’s very easy to lose track of what kind of RGB that is. Then there’s the fact that most images don’t specify a colorspace in the first place so you can’t be entirely sure whether it’s sRGB, linear sRGB, or something entirely; monitors can have their own color profiles; you may or may not be using a program that respects an embedded color profile; and so on. How can you ever tell what you’re actually looking at and whether it’s correct? I can barely keep track of what I mean by “50% grey”.

And then… what about transparency? Should a 50% transparent white atop solid black look like 50% grey, or 73% grey? Krita seems to leave it to the colorspace: sRGB gives the former, but linear sRGB gives the latter. Does this mean I should paint in a linear colorspace? I don’t know! (Maybe I’ll give it a try and see what happens.)

Something I genuinely can’t answer is what effect this has on HSV and HSL, which are defined in terms of RGB. Is there such a thing as linear HSL? Does anyone ever talk about this? Would it make lightness more sensible?

There is a good reason for this, at least: the human eye is better at distinguishing dark colors than light ones. I was surprised to learn that, but of course, it’s been hidden from me by sRGB, which is deliberately skewed to dedicate more space to darker colors. In a linear colorspace, a gradient from white to black would have a lot of indistinguishable light colors, but appear to have severe banding among the darks.

several different black to white gradients

All three of these are regular black-to-white gradients drawn in 8-bit color (i.e., channels range from 0 to 255). The top one is the naïve result if you draw such a gradient in sRGB: the midpoint is the too-dark 50% grey. The middle one is that same gradient, but drawn in a linear colorspace. Obviously, a lot of dark colors are “missing”, in the sense that we could see them but there’s no way to express them in linear color. The bottom gradient makes this more clear: it’s a gradient of all the greys expressible in linear sRGB.

This is the first time I’ve ever delved so deeply into exactly how sRGB works, and I admit it’s kind of blowing my mind a bit. Straightforward linear color is so much lighter, and this huge bias gives us a lot more to work with. Also, 73% being the midpoint certainly explains a few things about my problems with understanding brightness of colors.

There are other RGB colorspaces, of course, and I suppose they all make for an equivalent CMYK colorspace. YUV and Lab are families of colorspaces, though I think most people talking about Lab specifically mean CIELAB (or “L*a*b*”), and there aren’t really any competitors. HSL and HSV are defined in terms of RGB, and image data is rarely stored directly as either, so there aren’t really HSL or HSV colorspaces.

I think that exhausts all the things I know.

Real world color is also a lie

Just in case you thought these problems were somehow unique to computers. Surprise! Modelling color is hard because color is hard.

I’m sure you’ve seen the checker shadow illusion, possibly one of the most effective optical illusions, where the presence of a shadow makes a gray square look radically different than a nearby square of the same color.

Our eyes are very good at stripping away ambient light effects to tell what color something “really” is. Have you ever been outside in bright summer weather for a while, then come inside and everything is starkly blue? Lingering compensation for the yellow sunlight shifting everything to be slightly yellow; the opposite of yellow is blue.

Or, here, I like this. I’m sure there are more drastic examples floating around, but this is the best I could come up with. Here are some Pikachu I found via GIS.

photo of Pikachu plushes on a shelf

My question for you is: what color is Pikachu?

Would you believe… orange?

photo of Pikachu plushes on a shelf, overlaid with color swatches; the Pikachu in the background are orange

In each box, the bottom color is what I color-dropped, and the top color is the same hue with 100% saturation and 50% lightness. It’s the same spot, on the same plush, right next to each other — but the one in the background is orange, not yellow. At best, it’s brown.

What we see as “yellow in shadow” and interpret to be “yellow, but darker” turns out to be another color entirely. (The grey whistles are, likewise, slightly blue.)

Did you know that mirrors are green? You can see it in a mirror tunnel: the image gets slightly greener as it goes through the mirror over and over.

Distant mountains and other objects, of course, look bluer.

This all makes painting rather complicated, since it’s not actually about painting things the color that they “are”, but painting them in such a way that a human viewer will interpret them appropriately.

I, er, don’t know enough to really get very deep here. I really should, seeing as I keep trying to paint things, but I don’t have a great handle on it yet. I’ll have to defer to Mel’s color tutorial. (warning: big)

Blending modes

You know, those things in Photoshop.

I’ve always found these remarkably unintuitive. Most of them have names that don’t remotely describe what they do, the math doesn’t necessarily translate to useful understanding, and they’re incredibly poorly-documented. So I went hunting for some precise definitions, even if I had to read GIMP’s or Krita’s source code.

In the following, A is a starting image, and B is something being drawn on top with the given blending mode. (In the case of layers, B is the layer with the mode, and A is everything underneath.) Generally, the same operation is done on each of the RGB channels independently. Everything is scaled to 0–1, and results are generally clamped to that range.

I believe all of these treat layer alpha the same way: linear interpolation between A and the combination of A and B. If B has alpha t, and the blending mode is a function f, then the result is t × f(A, B) + (1 - t) × A.

If A and B themselves have alpha, the result is a little more complicated, and probably not that interesting. It tends to work how you’d expect. (If you’re really curious, look at the definition of BLEND() in GIMP’s developer docs.)

  • Normal: B. No blending is done; new pixels replace old pixels.

  • Multiply: A × B. As the name suggests, the channels are multiplied together. This is very common in digital painting for slapping on a basic shadow or tinting a whole image.

    I think the name has always thrown me off just a bit because “Multiply” sounds like it should make things bigger and thus brighter — but because we’re dealing with values from 0 to 1, Multiply can only ever make colors darker.

    Multiplying with black produces black. Multiplying with white leaves the other color unchanged. Multiplying with a gray is equivalent to blending with black. Multiplying a color with itself squares the color, which is similar to applying gamma correction.

    Multiply is commutative — if you swap A and B, you get the same result.

  • Screen: 1 - (1 - A)(1 - B). This is sort of an inverse of Multiply; it multiplies darkness rather than lightness. It’s defined as inverting both colors, multiplying, and inverting the result. Accordingly, Screen can only make colors lighter, and is also commutative. All the properties of Multiply apply to Screen, just inverted.

  • Hard Light: Equivalent to Multiply if B is dark (i.e., less than 0.5), or Screen if B is light. There’s an additional factor of 2 included to compensate for how the range of B is split in half: Hard Light with B = 0.4 is equivalent to Multiply with B = 0.8, since 0.4 is 0.8 of the way to 0.5. Right.

    This seems like a possibly useful way to apply basic highlights and shadows with a single layer? I may give it a try.

    The math is commutative, but since B is checked and A is not, Hard Light is itself not commutative.

  • Soft Light: Like Hard Light, but softer. No, really. There are several different versions of this, and they’re all a bit of a mess, not very helpful for understanding what’s going on.

    If you graphed the effect various values of B had on a color, you’d have a straight line from 0 up to 1 (at B = 0.5), and then it would abruptly change to a straight line back down to 0. Soft Light just seeks to get rid of that crease. Here’s Hard Light compared with GIMP’s Soft Light, where A is a black to white gradient from bottom to top, and B is a black to white gradient from left to right.

    graphs of combinations of all grays with Hard Light versus Soft Light

    You can clearly see the crease in the middle of Hard Light, where B = 0.5 and it transitions from Multiply to Screen.

  • Overlay: Equivalent to either Hard Light or Soft Light, depending on who you ask. In GIMP, it’s Soft Light; in Krita, it’s Hard Light except the check is done on A rather than B. Given the ambiguity, I think I’d rather just stick with Hard Light or Soft Light explicitly.

  • Difference: abs(A - B). Does what it says on the tin. I don’t know why you would use this? Difference with black causes no change; Difference with white inverts the colors. Commutative.

  • Addition and Subtract: A + B and A - B. I didn’t think much of these until I discovered that Krita has a built-in brush that uses Addition mode. It’s essentially just a soft spraypaint brush, but because it uses Addition, painting over the same area with a dark color will gradually turn the center white, while the fainter edges remain dark. The result is a fiery glow effect, which is pretty cool. I used it manually as a layer mode for a similar effect, to make a field of sparkles. I don’t know if there are more general applications.

    Addition is commutative, of course, but Subtract is not.

  • Divide: A ÷ B. Apparently this is the same as changing the white point to 1 - B. Accordingly, the result will blow out towards white very quickly as B gets darker.

  • Dodge and Burn: A ÷ (1 - B) and 1 - (1 - A) ÷ B. Inverses in the same way as Multiply and Screen. Similar to Divide, but with B inverted — so Dodge changes the white point to B, with similar caveats as Divide. I’ve never seen either of these effects not look horrendously gaudy, but I think photographers manage to use them, somehow.

  • Darken Only and Lighten Only: min(A, B) and max(A, B). Commutative.

  • Linear Light: (2 × A + B) - 1. I think this is the same as Sai’s “Lumi and Shade” mode, which is very popular, at least in this house. It works very well for simple lighting effects, and shares the Soft/Hard Light property that darker colors darken and lighter colors lighten, but I don’t have a great grasp of it yet and don’t know quite how to explain what it does. So I made another graph:

    graph of Linear Light, with a diagonal band of shading going from upper left to bottom right

    Super weird! Half the graph is solid black or white; you have to stay in that sweet zone in the middle to get reasonable results.

    This is actually a combination of two other modes, Linear Dodge and Linear Burn, combined in much the same way as Hard Light. I’ve never encountered them used on their own, though.

  • Hue, Saturation, Value: Work like you might expect: converts A to HSV and replaces either its hue, saturation, or value with Bs.

  • Color: Uses HSL, unlike the above three. Combines Bs hue and saturation with As lightness.

  • Grain Extract and Grain Merge: A - B + 0.5 and A + B - 0.5. These are clearly related to film grain, somehow, but their exact use eludes me.

    I did find this example post where someone combines a photo with a blurred copy using Grain Extract and Grain Merge. Grain Extract picked out areas of sharp contrast, and Grain Merge emphasized them, which seems relevant enough to film grain. I might give these a try sometime.

Those are all the modes in GIMP (except Dissolve, which isn’t a real blend mode; also, GIMP doesn’t have Linear Light). Photoshop has a handful more. Krita has a preposterous number of other modes, no, really, it is absolutely ridiculous, you cannot even imagine.

I may be out of things

There’s plenty more to say about color, both technically and design-wise — contrast and harmony, color blindness, relativity, dithering, etc. I don’t know if I can say any of it with any particular confidence, though, so perhaps it’s best I stop here.

I hope some of this was instructive, or at least interesting!

‘Tor and Bitcoin Hinder Anti-Piracy Efforts’

Post Syndicated from Ernesto original https://torrentfreak.com/tor-and-bitcoin-hinder-anti-piracy-efforts-160715/

euipoTo avoid enforcement efforts, pirate sites often go to extremes to hide themselves from rightsholders and authorities.

Increasingly, this also means that they use various encryption technologies to increase their resilience and anonymity.

Several of these techniques are highlighted in a new report published by the European Union Intellectual Property Office (EUIPO).

The report gives a broad overview of the business models that are used to illegally exploit intellectual property. This includes websites dedicated to counterfeit goods, but also online piracy hubs such as torrent sites and file-hosting platforms.

EUIPO hopes that mapping out these business models will help to counter the ongoing threat they face.

“The study will provide enhanced understanding to policymakers, civil society and private businesses. At the same time, it will help to identify and better understand the range of responses necessary to tackle the challenge of large scale online IPR infringements,” EUIPO notes.

According to the research, several infringing business models rely on encryption-based technologies. The Tor network and Bitcoin, for example, are repeatedly mentioned as part of this “shadow landscape”.

“It more and more relies on new encrypted technologies like the TOR browser and the Bitcoin virtual currency, which are employed by infringers of IPR to generate income and hide the proceeds of crime from the authorities,” the report reads.

According to the report, Bitcoin’s threat is that the transactions can’t be easily traced to a person or company. This is problematic, since copyright enforcement efforts are often based on a follow-the-money approach.

“There are no public records connecting Bitcoin wallet IDs with personal information of individuals. Because of these Bitcoin transactions are considered semi-anonymous,” EUIPO writes.

Similarly, sites and services that operate on the darknet, such as the Tor network, are harder to take down. Their domain names can’t be seized, for example, and darknet sites are not subject to ISP blockades.

“Through the use of TOR, a user’s Internet traffic is encrypted and routed in specific ways to achieve security and anonymity,” the report notes.

While the report doesn’t list any names, it describes various popular torrent, streaming and file-hosting sites. In one specific case, it mentions an e-book portal that operates exclusively on the darknet, generating revenue from Bitcoin donations.

Most traditional pirate sites still operate on the ‘open’ Internet. However, several sites now allow users to donate Bitcoin and both The Pirate Bay and KickassTorrents both have a dedicated darknet address as well.

EUIPO is clearly worried about these developments, but the group doesn’t advocate a ban of encryption-based services as they also have legitimate purposes.

However, it signals that these and other trends should be followed with interest, as they make it harder to tackle various forms of counterfeiting and piracy online.

As part of the efforts to cut back various forms of copyright infringement, EUIPO also announced a new partnership with Europol this week. The organizations launched the Intellectual Property Crime Coordinated Coalition which aims to strengthen the fight against counterfeiting and piracy.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Nintendo Cracks Down on Pokémon Go Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/nintendo-cracks-pokemon-go-piracy-160714/

pokeThe Pokémon Go game is taking the world by storm, despite the fact that it’s not yet officially released in most countries.

The game came out in Australia, New Zealand, and the United States last week, and over the past few days Germany and the UK joined in.

However, that doesn’t mean people elsewhere can’t play it yet.

As the craze spread, so did the various pirated copies, which have been downloaded millions of times already. The Internet is littered with unauthorized Pokémon Go files and guides explaining how to install the game on various platforms.

To give an indication of how massive Pokémon Go piracy is, research from Similarweb revealed that as of yesterday 6.8% of all Android devices in Canada and the Netherlands had the game installed.

In fact, it’s safe to say that unauthorized copies are more popular than the official ones, for the time being.

The APK files for Android are shared widely on torrent sites. At The Pirate Bay, for example, it’s the most shared Android game by far. Even more impressive, it also sent millions of extra daily visitors to APKmirror.com, which hosts copies of the game as well.

Most pirated Android games


Nintendo is obviously not happy with this black market distribution. Although it doesn’t seem to hurt its stock value, the company is targeting the piracy issue behind the scenes.

TorrentFreak spotted several takedown requests on behalf of Nintendo that were sent to Google Blogspot and Google Search this week. The notices list various links to pirated copies of the game, asking Google to remove them.

One of the takedown notices


Thus far the efforts have done little to stop the distribution. The files are still widely shared on torrent sites and various direct download services. The copies on APKmirror.com remain online as well.

In fact, it’s virtually impossible to stop a game that’s gone viral from being shared online. Even if it issues thousands of takedown requests, Nintendo won’t be able to catch ’em all.

Nintendo probably has good reasons to roll Pokémon Go out gradually, but the best anti-piracy strategy is obviously to make the game available worldwide as quickly as possible.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Moving the Utilization Needle with Hadoop Overcommit

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/147408435396


By Nathan Roberts and Jason Lowe

Hadoop was developed at Yahoo more than 10 years ago and its usage continues to demonstrate significant growth year-in and year-out. Due to Hadoop being an Apache open source project, this growth is not limited to just Yahoo. Hundreds, if not thousands, of companies are turning to Hadoop to power their big data analytics. Below is a graph of the Gigabyte-hours consumed per day on Yahoo Hadoop clusters (using 1 GB of RAM for 1 hour = 1GB-hour). As you can see in the graph, the demand for compute resources on our clusters show no signs of slowing down.


The resource scheduler within Hadoop (a.k.a. YARN – Yet Another Resource Negotiator), makes it easy to support new big-data applications within our Hadoop clusters. This flexibility helps fuel this sustained growth. We’ve fully embraced YARN by supporting several new application domains like: Hive_on_Tez, Pig_on_Tez, Spark, and various machine learning applications. As we find new use cases for the existing frameworks and continue adding even more frameworks, the demand for big-data compute resources will continue to grow for the foreseeable future.

20,000,000 GB-hours in this chart equate to millions of dollars per year worth of compute. Due to the size of this investment, we are constantly exploring ways to make the most efficient use of our compute hardware as possible. Over the past couple of years, there have been several achievements in this area:

  • We have developed tools that allow application owners to measure and optimize the resource utilization of their applications. As an example, one such tool compares allocated container sizes vs. the size of the MapReduce tasks running within the containers.
  • Hive and Pig have migrated to the Tez DAG execution framework which is significantly more efficient than MapReduce for these workloads.
  • Several performance improvements have been made to increase the scale at which we can run Hadoop. Central daemons such as the Namenode and Resourcemanager can throttle large clusters if not operating efficiently.

And most recently,

  • We’ve introduced a unique Dynamic Overcommit feature which improves the coordination between the YARN scheduler and the actual hardware utilization within the cluster. This additional coordination allows YARN to take advantage of the fact that containers don’t always make use of the resource they’ve reserved. At the 9th Annual Hadoop Summit last month, which Yahoo co-hosted, this was the focus of one of Yahoo’s keynote addresses. We also devoted time to the subject in a breakout session. Videos of the talks can be seen here and here, respectively.

The concept of Dynamic Overcommit is simple: Employ techniques to take advantage of reserved, but unused resources in the cluster.

The graph below illustrates the opportunities that Dynamic Overcommit offers. The shaded portions illustrate times when the cluster is fully reserved while CPU and Memory utilizations are well below 100%. This is the first-order opportunity that Dynamic Overcommit immediately addresses, i.e. improve system utilization when YARN is fully utilized. Perhaps more importantly though, once the first opportunity is addressed, it will mean we can run the cluster with less overall headroom (because we’re making full-use of the available physical resource). Less headroom means an overall increase in utilization, and this is where the big wins are. Let’s say we can increase average CPU utilization from 40% to 60% – that’s a 50% increase in the amount of actual work the cluster is getting done!


How Does Dynamic Overcommit Work?

Every container running in a YARN cluster declares the amount of Memory and CPU it will need to perform its job. YARN monitors all containers to make sure they do not exceed their declaration. In the case of memory, this is a non-negotiable contract – if a container exceeds its memory allocation, it is immediately terminated.  Obviously, applications will avoid this situation; therefore, it’s essentially guaranteed that all containers will have some amount of “padding” built into their configuration. Furthermore, most containers don’t use all their resource all of the time, leading to even more unused resource. When you add up all this unused resource, it turns out to be a significant opportunity. Dynamic Overcommit takes advantage of these unused resources by safely and effectively over-booking the nodes in the cluster.

The challenge with overbooking resources is this simple question: “What happens when everyone actually wants all the resources they requested?” With the Dynamic Overcommit feature, we take a very simple, yet highly-effective approach – we overcommit nodes based on their current utilization ratio and then in the event a node runs out of a resource, we preempt the most-recently launched containers until the situation improves. One of the advantages of Hadoop is that it is designed from the ground up to deal with failures. If an individual component fails, the application frameworks know how to re-run the work somewhere else, with only minimal impact to the application. Other than having to re-run pre-empted containers, there should be no other impact to applications.

The following diagram illustrates the basic design of Dynamic Overcommit. Notice how the node resource was adjusted from 128GB to 160GB; this is overcommit in action.



We have enabled Dynamic Overcommit in several of our Hadoop clusters, but one large research cluster in particular is where we perform most of our measurements and tuning. This section describes results from this research cluster. The shaded regions in the following graph illustrate times when this cluster is highly-utilized from a YARN perspective. Prior to overcommit being enabled, CPU and Memory utilization would have been in the 40-50% range. With overcommit, these metrics are in the 50-80% range.


But, what about work lost due to too much overcommit? The following graph illustrates GB-hours gained due to overcommit vs GB-hours lost due to preemption. For the time period shown we gained 3.3 million GB-hour at the price of 502 GB-hours lost due to preemption. This demonstrates that with the current configuration, it’s very rare for us to have to preempted containers.


What’s next for Dynamic Overcommit?

  • Sync with the Apache community on how best to get this capability into Apache. Our implementation has been very successful and is serving us well within Yahoo. Now we look forward to working with the Hadoop community to help finalize similar capabilities in Apache Hadoop. For reference, related Apache jira are listed below:

Security Effectiveness of the Israeli West Bank Barrier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/07/security_effect.html

Interesting analysis:

Abstract: Objectives — Informed by situational crime prevention (SCP) this study evaluates the effectiveness of the “West Bank Barrier” that the Israeli government began to construct in 2002 in order to prevent suicide bombing attacks.

Methods — Drawing on crime wave models of past SCP research, the study uses a time series of terrorist attacks and fatalities and their location in respect to the Barrier, which was constructed in different sections over different periods of time, between 1999 and 2011.

Results — The Barrier together with associated security activities was effective in preventing suicide bombings and other attacks and fatalities with little if any apparent displacement. Changes in terrorist behavior likely resulted from the construction of the Barrier, not from other external factors or events.

Conclusions — In some locations, terrorists adapted to changed circumstances by committing more opportunistic attacks that require less planning. Fatalities and attacks were also reduced on the Palestinian side of the Barrier, producing an expected “diffusion of benefits” though the amount of reduction was considerably more than in past SCP studies. The defensive roles of the Barrier and offensive opportunities it presents, are identified as possible explanations. The study highlights the importance of SCP in crime and counter-terrorism policy.

Unfortunately, the whole paper is behind a paywall.

Note: This is not a political analysis of the net positive and negative effects of the wall, just a security analysis. Of course any full analysis needs to take the geopolitics into account. The comment section is not the place for this broader discussion.

ALPHA vs. The Pro – Judgement Day

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/alpha-vs-pro-judgement-day/

Firstly, lets set the mood. I need you to watch this video.

Go on. Stop what you’re doing and press play. I can wait…

Star Wars: The Force Awakens Trailer Top Gun

Top Gun Little mashup with all aircraft fights and new scenes! Follow us on Facebook! https://www.facebook.com/dangdogblog/

Done? How good was that, right? RIGHT?! Mmmhmm, I knew you’d like it.

Now, onto ALPHA…

I’ll set the scene.

Imagine it’s the mid eighties. You’re name is Dr Myles Dyson and you’ve just invented the neural-net processor. You see your invention as a massive success, a gift to humanity, a major stepping stone across the treacherous waters toward world piece.

… and then Sarah Connor shoots you.


That’s Cyberdyne. This is Psibernetix. My bad. I’ll start again.

University of Cincinnati doctoral graduate Nick Ernest may not have built the neural-net processor (thankfully), but he’s definitely created something on that level. Ernest and his team at Psibernetix have created ALPHA, an AI set to be the ultimate wingman of the sky(net)… which runs on a Raspberry Pi.

Exciting, yes? Let me explain…

ALPHA is an artificial intelligence with the capability to out-manoeuvre even the most seasoned fighter pilot pro… and to prove this, ALPHA was introduced to retired U.S. Air Force pilot Col. Gene Lee in a head-to-head dogfight simulation.

When pitted against Col. Gene Lee, who now works as an instructor and Air Battle Manager for the U.S. Air Force, ALPHA repeatedly shot down the pro, never allowing Lee to get a single shot in.

“I was surprised at how aware and reactive it was. It seemed to be aware of my intentions, and reacting instantly to my changes in flight and missile deployment. It knew how to defeat the shot I was taking. It moved instantly between defensive and offensive actions as needed.”

Before ALPHA, pilots training with simulated missions against AIs would often be able to ‘trick’ the system, understanding the limitations of the technology involved to win over their virtual opponents. However, with ALPHA this was simply not the case, instead leaving Lee exhausted and thoroughly defeated by the simulations.

“I go home feeling washed out. I’m tired, drained and mentally exhausted. This may be artificial intelligence, but it represents a real challenge.”

Prior to their work alongside Col. Gene Lee, ALPHA was set up against the current AI resources used for training manned and unmanned teams as part of the Air Force research programme. Much like its sessions with Lee, ALPHA outperformed the existing programmes, repeatedly beating the AIs in various situations.

ALPHA vs. Gene Lee

Nick Ernest, David Carroll and Gene Lee vs ALPHA

In the long term, ALPHA looks set to continue to advance in the field with additional development options, such as aerodynamic and sensor models, in the works. The aim is for ALPHA to work as an AI wingman for existing pilots. With current pilots hitting speeds of 1500 miles per hour at altitudes thousands of feet in the air, ALPHA can provide response times that beat their human counterparts by miles; this would allow for Unmanned Combat Aerial Vehicles (UCAVs) to defend pilots against hostile attack in the skies, while learning from enemy action.

This ability to run ALPHA on such a low-budget PC make the possibilities for using the AI in the field all that more achievable. As confirmed by Ernest himself (we emailed him to check), the AI and its algorithms can react to the simulated flight’s events, and eventually real-life situations, with ease, using the processing power of a $35 computer. 

And that, ladies and gentlemen, is incredible.

tom cruise top gun

This blog post was bought to you by the 1980’s*. You’re most welcome.

*Yes, we know Terminator 2 was released in 1991. Give us some slack.

The post ALPHA vs. The Pro – Judgement Day appeared first on Raspberry Pi.

Google’s Post-Quantum Cryptography

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/07/googles_post-qu.html

News has been bubbling about an announcement by Google that it’s starting to experiment with public-key cryptography that’s resistant to cryptanalysis by a quantum computer. Specifically, it’s experimenting with the New Hope algorithm.

It’s certainly interesting that Google is thinking about this, and probably okay that it’s available in the Canary version of Chrome, but this algorithm is by no means ready for operational use. Secure public-key algorithms are very hard to create, and this one has not had nearly enough analysis to be trusted. Lattice-based public-key cryptosystems such as New Hope are particularly subtle — and we cryptographers are still learning a lot about how they can be broken.

Targets are important in cryptography, and Google has turned New Hope into a good one. Consider this an opportunity to advance our cryptographic knowledge, not an offer of a more-secure encryption option. And this is the right time for this area of research, before quantum computers make discrete-logarithm and factoring algorithms obsolete.

Report on the Vulnerabilities Equities Process

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/07/report_on_the_v.html

I have written before on the vulnerabilities equities process (VEP): the system by which the US government decides whether to disclose and fix a computer vulnerability or keep it secret and use it offensively. Ari Schwartz and Bob Knake, both former Directors for Cybersecurity Policy at the White House National Security Council, have written a report describing the process as we know it, with policy recommendations for improving it.

Basically, their recommendations are focused on improving the transparency, oversight, and accountability (three things I repeatedly recommend) of the process. In summary:

  • The President should issue an Executive Order mandating government-wide compliance with the VEP.

  • Make the general criteria used to decide whether or not to disclose a vulnerability public.
  • Clearly define the VEP.
  • Make sure any undisclosed vulnerabilities are reviewed periodically.
  • Ensure that the government has the right to disclose any vulnerabilities it purchases.
  • Transfer oversight of the VEP from the NSA to the DHS.
  • Issue an annual report on the VEP.
  • Expand Congressional oversight of the VEP.
  • Mandate oversight by other independent bodies inside the Executive Branch.
  • Expand funding for both offensive and defensive vulnerability research.

These all seem like good ideas to me. This is a complex issue, one I wrote about in Data and Goliath (pages 146-50), and one that’s only going to get more important in the Internet of Things.

News article.

Weekly roundup: short reprieve

Post Syndicated from Eevee original https://eev.ee/dev/2016/07/10/weekly-roundup-short-reprieve/

I was feeling pretty run down at the end of June. I think I wore myself out a little bit. DUMP 2, then Under Construction, then DUMP 3 (which I missed), and all the while fretting over Under Construction.

I took this past SGDQ week “off” and spent it mostly doodling. I’m a bit better now! I’m a post behind for June, a third of the way into July; don’t worry, I’ll catch up.

July doesn’t have a theme. I’ve got some stuff to do, and I’ll do it.

  • art: The 30-minute daily Pokémon continue, though not quite so “daily” for a bit there. I also made a quick birthday gift for a friend, spent a preposterous amount of time painting a hypothetical evolution, and drew an Extyrannomon for Extyrannomon. Plus a lot of doodling.

    Oh, and I put together an art more good chart for the first half of this year.

  • zdoom: My experiment with embedding Lua is a little cleaner — you can now embed a Lua script in a map and call it from a linedef (switch, etc.), making it slightly more of a real proof-of-concept. I also did some research into how to serialize the entire state of the interpreter, for the sake of quicksaves.

  • gamedev: I did a little more work on rainblob, the tiny PICO-8 platformer I started a month or so ago. It now supports multiple “rooms” and has a couple simple intro puzzles. I also wrote about 20% of a Tetris clone using pentominoes while watching SGDQ’s Tetris runs.

  • veekun: I gathered all of Bulbapedia’s Sugimori art to replace the rather low-res and incomplete collection veekun has at the moment. Not up yet, though. I looked into the current state of extracting skeletal animations, yet again, and did not find any traces of success. Alas.

Back to work this week, then!