Website blocking has become one of the favorite anti-piracy tools of the entertainment industries in recent years.
The UK is a leader on this front, with the High Court ordering local ISPs to block access to many popular file-sharing sites.
Over time the number of blocked URLs has expanded to well over 1,000, with popular torrent, streaming, and direct download sites being the main targets.
While research has shown that this approach is somewhat effective, there are plenty of options through which people can circumvent the blockades, including many reverse proxies.
Similarly, pirate sites can simply switch to a new domain name to evade the court orders, and new sites are allowed to flourish in the shadow of those that are no longer available.
This week we decided to take a look at the current pirate site landscape in the UK, with some surprising results.
As it turns out, the list of top ten most-used pirate sites in the UK includes several sites that are on the ISPs blockists. In some cases the sites remain accessible on their original domain names, via the HTTPS URL.
As we’ve highlighted before, not all ISPs are able to block HTTPS traffic, which allows their subscribers to load The Pirate Bay and other blocked sites just fine.
There are also websites that intentionally help visitors to circumvent the blocks by registering new domain names. Unblocked.vip, for example, has cycled through various domain names in order to remain available.
And then there are the newcomers. 123movies.to deserves a mention here as it’s currently the most-used pirate site in the UK. With an Alexa rank of 81, it’s even one of the 100 most-visited sites in the country.
Below we’ve made an overview of the ten most-used pirate sites in the UK. Several of these are on the blocklist, with a current or previous URL. This suggests that the blocking efforts are not as effective as rightsholders would like them to be.
The conclusion is also in line with research from Italy, which suggested that site-blocking can actually be counterproductive. Similarly, a UK report revealed that it significantly boosts traffic to non-blocked websites.
While the entertainment industries still see enough value in website blocking, it’s clear that it’s not the silver bullet that will defeat piracy. And at a rate of £14,000 per site, it comes at a high cost.
The label “pirate site” applies to sites that have been classified as such by entertainment industry groups. It’s worth noting that at the time of writing, several of the sites (*) had already started redirecting to new domain names. Putlocker.is is currently down.
This column is from The MagPi issue 50. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.
Last month, the Raspberry Pi Foundation hit a major milestone by selling its ten millionth computer. Besides taking the opportunity to celebrate – and celebrate we did – it’s also a good time to reflect on the impact that the device has had over the last four and a half years. As you may know already, we don’t just make an ultra-affordable computer. Our mission is to put the power of digital making into the hands of people all over the world; the Raspberry Pi computer helps us do that.
There are many ways in which the Raspberry Pi has a positive impact on the world. It’s used in classrooms, libraries, hackspaces, research laboratories, and within the industrial environment. People of all ages use Raspberry Pi, in these contexts and others, to learn about computing and to create things with computers that we never could have imagined.
But I believe the biggest impact we’ve had was to encourage more people to experiment with computers once again. It used to be that in order to use a computer, you had to have fairly good knowledge of how it worked, and often you needed to know how to program it. Since then, computers have become much more mainstream and consumer-friendly. On the one hand, that change has had an incredible impact on our society, giving more people access to the power of computing and the internet. However, there was a trade-off. In order to make computers easier to use, they also became less ‘tinker-friendly’.
When I was a kid in the 1980s, our family had an old IBM PC in our basement, that was decommissioned from my father’s workplace. On that computer, I learned how to use the DOS prompt to work with files, I created my own menu system out of batch files, and most importantly, I learned my first ever programming language: BASIC.
I feel very lucky that I had access to that computer. That kind of early exposure had such a huge impact on my life. For years I continued to learn programming, both in school and in my own time. Even though I’ve benefited greatly from the mainstream, consumer-friendly technology that has since become available, I still use and build upon the skills that I learned as a kid on that IBM PC. Programming languages and hardware have changed a lot, but the fundamental concepts of computing have remained mostly the same.
The Next Generation
I expect that the Raspberry Pi has a very similar impact on young people today. For them, it fills the void that was left when computers became less like programmable machines and more like consumer products. I suspect that, just like with me, this impact will linger for years to come as these young people grow up and enter a workforce that’s increasingly dependent on their digital skills. And if even just a tiny bit of interest in computing is the spark, then I believe that a tinker-friendly computer like Raspberry Pi is the kindling.
Here’s where that ten million number comes into play. Admittedly, not everyone who is exposed to a Raspberry Pi will be affected by it. But even if you guess conservatively that only a small fraction of all the Raspberry Pis out in the world serve to inspire a young person, it still adds up to an incredible impact on many lives; not just right now, but for many years to come. It’s quite possible that many of tomorrow’s computer scientists and technology specialists are experimenting with a few of the first ten million Raspberry Pis right now.
As part of our ongoing plan to expand the AWS footprint, I am happy to announce that our new US East (Ohio) Region is now available. In conjunction with the existing US East (Northern Virginia) Region, AWS customers in the Eastern part of the United States have fast, low-latency access to the suite of AWS infrastructure services.
The Region supports all sizes of C4, D2, I2, M4, R3, T2, and X1 instances. As is the case with all of our newer Regions, instances must be launched within a Virtual Private Cloud (read Virtual Private Clouds for Everyone to learn more).
Well Connected Here are some round-trip network metrics that you may find interesting (all names are airport codes, as is apparently customary in the networking world; all times are +/- 2 ms):
12 ms to IAD (home of the US East (Northern Virginia) Region).
Also on the networking front, we have agreed to work together with Ohio State University to provide AWS Direct Connect access to OARnet. This 100-gigabit network connects colleges, schools, medical research hospitals, and state government across Ohio. This connection provides local teachers, students, and researchers with a dedicated, high-speed network connection to AWS.
14 Regions, 38 Availability Zones, and Counting Today’s launch of this 3-AZ Region expands our global footprint to a grand total of 14 Regions and 38 Availability Zones. We are also getting ready to open up a second AWS Region in China, along with other new AWS Regions in Canada, France, and the UK.
Since there’s been some industry-wide confusion about the difference between Regions and Availability Zones of late, I think it is important to understand the differences between these two terms. Each Region is a physical location where we have one or more Availability Zones or AZs. Each Availability Zone, in turn, consists of one or more data centers, each with redundant power, networking, and connectivity, all housed in separate facilities. Having two or more AZ’s in each Region gives you the ability to run applications that are more highly available, fault tolerant, and durable than would be the case if you were limited to a single AZ.
Around the office, we sometimes play with analogies that can serve to explain the difference between the two terms. My favorites are “Hotels vs. hotel rooms” and “Apple trees vs. apples.” So, pick your analogy, but be sure that you know what it means!
Virtual reality (VR) 360° videos are the next frontier of how we engage with and consume content. Unlike a traditional scenario in which a person views a screen in front of them, VR places the user inside an immersive experience. A viewer is “in” the story, and not on the sidelines as an observer.
Ivan Sutherland, widely regarded as the father of computer graphics, laid out the vision for virtual reality in his famous speech, “Ultimate Display” in 1965 . In that he said, “You shouldn’t think of a computer screen as a way to display information, but rather as a window into a virtual world that could eventually look real, sound real, move real, interact real, and feel real.”
Over the years, significant advancements have been made to bring reality closer to that vision. With the advent of headgear capable of rendering 3D spatial audio and video, realistic sound and visuals can be virtually reproduced, delivering immersive experiences to consumers.
When it comes to entertainment and sports, streaming in VR has become the new 4K HEVC/UHD of 2016. This has been accelerated by the release of new camera capture hardware like GoPro and streaming capabilities such as 360° video streaming from Facebook and YouTube. Yahoo streams lots of engaging sports, finance, news, and entertainment video content to tens of millions of users. The opportunity to produce and stream such content in 360° VR opens a unique opportunity to Yahoo to offer new types of engagement, and bring the users a sense of depth and visceral presence.
While this is not an experience that is live in product, it is an area we are actively exploring. In this blog post, we take a look at what’s involved in building an end-to-end VR streaming workflow for both Live and Video on Demand (VOD). Our experiments and research goes from camera rig setup, to video stitching, to encoding, to the eventual rendering of videos on video players on desktop and VR headsets. We also discuss challenges yet to be solved and the opportunities they present in streaming VR.
1. The Workflow
Yahoo’s video platform has a workflow that is used internally to enable streaming to an audience of tens of millions with the click of a few buttons. During experimentation, we enhanced this same proven platform and set of APIs to build a complete 360°/VR experience. The diagram below shows the end-to-end workflow for streaming 360°/VR that we built on Yahoo’s video platform.
Figure 1: VR Streaming Workflow at Yahoo
1.1. Capturing 360° video
In order to capture a virtual reality video, you need access to a 360°-capable video camera. Such a camera uses either fish-eye lenses or has an array of wide-angle lenses to collectively cover a 360 (θ) by 180 (ϕ) sphere as shown below.
Though it sounds simple, there is a real challenge in capturing a scene in 3D 360° as most of the 360° video cameras offer only 2D 360° video capture.
In initial experiments, we tried capturing 3D video using two cameras side-by-side, for left and right eyes and arranging them in a spherical shape. However this required too many cameras – instead we use view interpolation in the stitching step to create virtual cameras.
Another important consideration with 360° video is the number of axes the camera is capturing video with. In traditional 360° video that is captured using only a single-axis (what we refer as horizontal video), a user can turn their head from left to right. But this setup of cameras does not support a user tilting their head at 90°.
To achieve true 3D in our setup, we went with 6-12 GoPro cameras having 120° field of view (FOV) arranged in a ring, and an additional camera each on top and bottom, with each one outputting 2.7K at 30 FPS.
1.2. Stitching 360° video
Because a 360° view is a spherical video, the surface of this sphere needs to be projected onto a planar surface in 2D so that video encoders can process it. There are two popular layouts:
Equirectangular layout: This is the most widely-used format in computer graphics to represent spherical surfaces in a rectangular form with an aspect ratio of 2:1. This format has redundant information at the poles which means some pixels are over-represented, introducing distortions at the poles compared to the equator (as can be seen in the equirectangular mapping of the sphere below).
Figure 2: Equirectangular Layout 
CubeMap layout: CubeMap layout is a format that has also been used in computer graphics. It contains six individual 2D textures that map to six sides of a cube. The figure below is a typical cubemap representation. In a cubemap layout, the sphere is projected onto six faces and the images are folded out into a 2D image, so pieces of a video frame map to different parts of a cube, which leads to extremely efficient compact packing. Cubemap layouts require about 25% fewer pixels compared to equirectangular layouts.
Figure 3: CubeMap Layout 
In our setup, we experimented with a couple of stitching softwares. One was from Vahana VR , and the other was a modified version of the open-source Surround360 technology that works with a GoPro rig . Both softwares output equirectangular panoramas for the left and the right eye. Here are the steps involved in stitching together a 360° image:
Raw frame image processing: Converts uncompressed raw video data to RGB, which involves several steps starting from black-level adjustment, to applying Demosaic algorithms in order to figure out RGB color parts for each pixel based on the surrounding pixels. This also involves gamma correction, color correction, and anti vignetting (undoing the reduction in brightness on the image periphery). Finally, this stage applies sharpening and noise-reduction algorithms to enhance the image and suppress the noise.
Calibration: During the calibration step, stitching software takes steps to avoid vertical parallax while stitching overlapping portions in adjacent cameras in the rig. The purpose is to align everything in the scene, so that both eyes see every point at the same vertical coordinate. This step essentially matches the key points in images among adjacent camera pairs. It uses computer vision algorithms for feature detection like Binary Robust Invariant Scalable Keypoints (BRISK)  and AKAZE .
Optical Flow: During stitching, to cover the gaps between adjacent real cameras and provide interpolated view, optical flow is used to create virtual cameras. The optical flow algorithm finds the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It uses OpenCV algorithms to find the optical flow .
Below are the frames produced by the GoPro camera rig:
Figure 4: Individual frames from 12-camera rig
Figure 5: Stitched frame output with PtGui
Figure 6: Stitched frame with barrel distortion using Surround360
Figure 7: Stitched frame after removing barrel distortion using Surround360
To get the full depth in stereo, the rig is set-up so that i = r * sin(FOV/2 – 360/n). where:
i = IPD/2 where IPD is the inter-pupillary distance between eyes.\
r = Radius of the rig.
FOV = Field of view of GoPro cameras, 120 degrees.
n = Number of cameras which is 12 in our setup.
Given IPD is normally 6.4 cms, i should be greater than 3.2 cm. This implies that with a 12-camera setup, the radius of the the rig comes to 14 cm(s). Usually, if there are more cameras it is easier to avoid black stripes.
For a truly immersive experience, users expect 4K (3840 x 2160) quality resolution at 60 frames per second (FPS) or higher. Given typical HMDs have a FOV of 120 degrees, a full 360° video needs a resolution of at least 12K (11520 x 6480). 4K streaming needs a bandwidth of 25 Mbps . So for 12K resolution, this effectively translates to > 75 Mbps and even more for higher framerates. However, average wifi in US has bandwidth of 15 Mbps .
One way to address the bandwidth issue is by reducing the resolution of areas that are out of the field of view. Spatial sub-sampling is used during transcoding to produce multiple viewport-specific streams. Each viewport-specific stream has high resolution in a given viewport and low resolution in the rest of the sphere.
On the player side, we can modify traditional adaptive streaming logic to take into account field of view. Depending on the video, if the user moves his head around a lot, it could result in multiple buffer fetches and could result in rebuffering. Ideally, this will work best in videos where the excessive motion happens in one field of view at a time and does not span across multiple fields of view at the same time. This work is still in an experimental stage.
The default output format from stitching software of both Surround360 and Vahana VR is equirectangular format. In order to reduce the size further, we pass it through a cubemap filter transform integrated into ffmpeg to get an additional pixel reduction of ~25%  .
At the end of above steps, the stitching pipeline produces high-resolution stereo 3D panoramas which are then ingested into the existing Yahoo Video transcoding pipeline to produce multiple bit-rates HLS streams.
1.3. Adding a stitching step to the encoding pipeline
Live – In order to prepare for multi-bitrate streaming over the Internet, a live 360° video-stitched stream in RTMP is ingested into Yahoo’s video platform. A live Elemental encoder was used to re-encode and package the live input into multiple bit-rates for adaptive streaming on any device (iOS, Android, Browser, Windows, Mac, etc.)
Video on Demand – The existing Yahoo video transcoding pipeline was used to package multiple bit-rates HLS streams from raw equirectangular mp4 source videos.
1.4. Rendering 360° video into the player
The spherical video stream is delivered to the Yahoo player in multiple bit rates. As a user changes their viewing angle, different portion of the frame are shown, presenting a 360° immersive experience. There are two types of VR players currently supported at Yahoo:
VR Display Capabilities: It has attributes to indicate position support, orientation support, and has external display.
VR Layer: Contains the HTML5 canvas element which is presented by VR Display when its submit frame is called. It also contains attributes defining the left bound and right bound textures within source canvas for presenting to an eye.
VREye Parameters: Has information required to correctly render a scene for given eye. For each eye, it has offset the distance from middle of the user’s eyes to the center point of one eye which is half of the interpupillary distance (IPD). In addition, it maintains the current FOV of the eye, and the recommended renderWidth and render Height of each eye viewport.
Get VR Displays: Returns a list of VR Display(s) HMDs accessible to the browser.
For web devices which support only monoscopic rendering like desktop browsers without HMD, it creates a single Perspective Camera object specifying the FOV and aspect ratio. As the device’s requestAnimationFrame is called it renders the new frames. As part of rendering the frame, it first calculates the projection matrix for FOV and sets the X (user’s right), Y (Up), Z (behind the user) coordinates of the camera position.
For devices that support stereoscopic rendering like mobile phones from Samsung Gear, the webvr player creates two PerspectiveCamera objects, one for the left eye and one for the right eye. Each Perspective camera queries the VR device capabilities to get the eye parameters like FOV, renderWidth and render Height every time a frame needs to be rendered at the native refresh rate of HMD. The key difference between stereoscopic and monoscopic is the perceived sense of depth that the user experiences, as the video frames separated by an offset are rendered by separate canvas elements to each individual eye.
Cardboard VR – Google provides a VR sdk for both iOS and Android . This simplifies common VR tasks like-lens distortion correction, spatial audio, head tracking, and stereoscopic side-by-side rendering. For iOS, we integrated Cardboard VR functionality into our Yahoo Video SDK, so that users can watch stereoscopic 3D videos on iOS using Google Cardboard.
With all the pieces in place, and experimentation done, we were able to successfully do a 360° live streaming of an internal company-wide event.
Figure 8: 360° Live streaming of Yahoo internal event
In addition to demonstrating our live streaming capabilities, we are also experimenting with showing 360° VOD videos produced with a GoPro-based camera rig. Here is a screenshot of one of the 360° videos being played in the Yahoo player.
Figure 9: Yahoo Studios produced 360° VOD content in the Yahoo Player
3. Challenges and Opportunities
3.1. Enormous amounts of data
As we alluded to in the video processing section of this post, delivering 4K resolution videos for each eye for each FOV at a high frame-rate remains a challenge. While FOV-adaptive streaming does reduce the size by providing high resolution streams separately for each FOV, providing an impeccable 60 FPS or more viewing experience still requires a lot more data than the current internet pipes can handle. Some of the other possible options which we are closely paying attention to are:
Compression efficiency with HEVC and VP9 – New codecs like HEVC and VP9 have the potential to provide significant compression gains. HEVC open source codecs like x265 have shown a 40% compression performance gain compared to the currently ubiquitous H.264/AVC codec. LIkewise, a VP9 codec from Google has shown similar 40% compression performance gains. The key challenge is the hardware decoding support and the browser support. But with Apple and Microsoft very much behind HEVC and Firefox and Chrome already supporting VP9, we believe most browsers would support HEVC or VP9 within a year.
Using 10 bit color depth vs 8 bit color depth – Traditional monitors support 8 bpc (bits per channel) for displaying images. Given each pixel has 3 channels (RGB), 8 bpc maps to 256x256x256 color/luminosity combinations to represent 16 million colors. With 10 bit color depth, you have the potential to represent even more colors. But the biggest stated advantage of using 10 bit color depth is with respect to compression during encoding even if the source only uses 8 bits per channel. Both x264 and x265 codecs support 10 bit color depth, with ffmpeg already supporting encoding at 10 bit color depth.
3.2. Six degrees of freedom
With current camera rig workflows, users viewing the streams through HMD are able to achieve three degrees of Freedom (DoF) i.e., the ability to move up/down, clockwise/anti-clockwise, and swivel. But you still can’t get a different perspective when you move inside it i.e., move forward/backward. Until now, this true six DoF immersive VR experience has only been possible in CG VR games. In video streaming, LightField technology-based video cameras produced by Lytro are the first ones to capture light field volume data from all directions . But Lightfield-based videos require an order of magnitude more data than traditional fixed FOV, fixed IPD, fixed lense camera rigs like GoPro. As bandwidth problems get resolved via better compressions and better networks, achieving true immersion should be possible.
VR streaming is an emerging medium and with the addition of 360° VR playback capability, Yahoo’s video platform provides us a great starting point to explore the opportunities in video with regard to virtual reality. As we continue to work to delight our users by showing immersive video content, we remain focused on optimizing the rendering of high-quality 4K content in our players. We’re looking at building FOV-based adaptive streaming capabilities and better compression during delivery. These capabilities, and the enhancement of our webvr player to play on more HMDs like HTC Vive and Oculus Rift, will set us on track to offer streaming capabilities across the entire spectrum. At the same time, we are keeping a close watch on advancements in supporting spatial audio experiences, as well as advancements in the ability to stream volumetric lightfield videos to achieve true six degrees of freedom, with the aim of realizing the true potential of VR.
Glossary – VR concepts:
VR – Virtual reality, commonly referred to as VR, is an immersive computer-simulated reality experience that places viewers inside an experience. It “transports” viewers from their physical reality into a closed virtual reality. VR usually requires a headset device that takes care of sights and sounds, while the most-involved experiences can include external motion tracking, and sensory inputs like touch and smell. For example, when you put on VR headgear you suddenly start feeling immersed in the sounds and sights of another universe, like the deck of the Star Trek Enterprise. Though you remain physically at your place, VR technology is designed to manipulate your senses in a manner that makes you truly feel as if you are on that ship, moving through the virtual environment and interacting with the crew.
360 degree video – A 360° video is created with a camera system that simultaneously records all 360 degrees of a scene. It is a flat equirectangular video projection that is morphed into a sphere for playback on a VR headset. A standard world map is an example of equirectangular projection, which maps the surface of the world (sphere) onto orthogonal coordinates.
Spatial Audio – Spatial audio gives the creator the ability to place sound around the user. Unlike traditional mono/stereo/surround audio, it responds to head rotation in sync with video. While listening to spatial audio content, the user receives a real-time binaural rendering of an audio stream .
FOV – A human can naturally see 170 degrees of viewable area (field of view). Most consumer grade head mounted displays HMD(s) like Oculus Rift and HTC Vive now display 90 degrees to 120 degrees.
Monoscopic video – A monoscopic video means that both eyes see a single flat image, or video file. A common camera setup involves six cameras filming six different fields of view. Stitching software is used to form a single equirectangular video. Max output resolution on 2D scopic videos on Gear VR is 3480×1920 at 30 frames per second.
Presence – Presence is a kind of immersion where the low-level systems of the brain are tricked to such an extent that they react just as they would to non-virtual stimuli.
Latency – It’s the time between when you move your head, and when you see physical updates on the screen. An acceptable latency is anywhere from 11 ms (for games) to 20 ms (for watching 360 vr videos).
Head Tracking – There are two forms:
Positional tracking – movements and related translations of your body, eg: sway side to side.
Traditional head tracking – left, right, up, down, roll like clock rotation.
This is a guest post from Troy Washburn, Sr. DevOps Manager @ Rent-A-Center, Inc., and Ashay Chitnis, Flux7 architect.
Rent-A-Center in their own words: Rent-A-Center owns and operates more than 3,000 rent-to-own retail stores for name-brand furniture, electronics, appliances and computers across the US, Canada, and Puerto Rico.
Rent-A-Center (RAC) wanted to roll out an ecommerce platform that would support the entire online shopping workflow using SAP’s Hybris platform. The goal was to implement a cloud-based solution with a cluster of Hybris servers which would cater to online web-based demand.
The challenge: to run the Hybris clusters in a microservices architecture. A microservices approach has several advantages including the ability for each service to scale up and down to meet fluctuating changes in demand independently. RAC also wanted to use Docker containers to package the application in a format that is easily portable and immutable. There were four types of containers necessary for the architecture. Each corresponded to a particular service:
1. Apache: Received requests from the external Elastic Load Balancing load balancer. Apache was used to set certain rewrite and proxy http rules. 2. Hybris: An external Tomcat was the frontend for the Hybris platform. 3. Solr Master: A product indexing service for quick lookup. 4. Solr Slave: Replication of master cache to directly serve product searches from Hybris.
To deploy the containers in a microservices architecture, RAC and AWS consultants at Flux7 started by launching Amazon ECS resources with AWS CloudFormation templates. Running containers on ECS requires the use of three primary resources: clusters, services, and task definitions. Each container refers to its task definition for the container properties, such as CPU and memory. And, each of the above services stored its container images in Amazon ECR repositories.
This post describes the architecture that we created and implemented.
At first glance, scaling on ECS can seem confusing. But the Flux7 philosophy is that complex systems only work when they are a combination of well-designed simple systems that break the problem down into smaller pieces. The key insight that helped us design our solution was understanding that there are two very different scaling operations happening. The first is the scaling up of individual tasks in each service and the second is the scaling up of the cluster of Amazon EC2 instances.
During implementation, Service Auto Scaling was released by the AWS team and so we researched how to implement task scaling into the existing solution. As we were implementing the solution through AWS CloudFormation, task scaling needed to be done the same way. However, the new scaling feature was not available for implementation through CloudFormation and so the natural course was to implement it using AWS Lambda–backed custom resources.
A corresponding Lambda function is implemented in Node.js 4.3, while automatic scaling happens by monitoring the CPUUtilizationAmazon CloudWatch metric. The ECS policies below are registered with CloudWatch alarms that are triggered when specific thresholds are crossed. Similarly, by using the MemoryUtilization CloudWatch metric, ECS scaling can be made to scale in and out as well.
Scaling ECS services and EC2 instances automatically
The key to understanding cluster scaling is to start by understanding the problem. We are no longer running a homogeneous workload in a simple environment. We have a cluster hosting a heterogeneous workload with different requirements and different demands on the system.
This clicked for us after we phrased the problem as, “Make sure the cluster has enough capacity to launch ‘x’ more instances of a task.” This led us to realize that we were no longer looking at an overall average resource utilization problem, but rather a discrete bin packing problem.
The problem is inherently more complex. (Anyone remember from algorithms class how the discrete Knapsack problem is NP-hard, but the continuous knapsack problem can easily be solved in polynomial time? Same thing.) So we have to check on each individual instance if a particular task can be scheduled on it, and if for any task we don’t cross the required capacity threshold, then we need to allocate more instance capacity.
To ensure that ECS scaling always has enough resources to scale out and has just enough resources after scaling in, it was necessary that the Auto Scaling group scales according to three criteria:
1. ECS task count in relation to the host EC2 instance count in a cluster 2. Memory reservation 3. CPU reservation
We implemented the first criteria for the Auto Scaling group. Instead of using the default scaling abilities, we set group scaling in and out using Lambda functions that were triggered periodically by a combination of the AWS::Lambda::Permission and an AWS::Events::Rule resources, as we wanted specific criteria for scaling.
Future versions of this piece of code will incorporate the other two criteria along with the ability to use CloudWatch alarms to trigger scaling.
Using advanced ECS features like Service Auto Scaling in conjunction with Lambda to meet RAC’s business requirements, RAC and Flux7 were able to Dockerize SAP Hybris in production for the first time ever.
Further, ECS and CloudFormation give users the ability to implement robust solutions while still providing the ability to roll back in case of failures. With ECS as a backbone technology, RAC has been able to deploy a Hybris setup with automatic scaling, self-healing, one-click deployment, CI/CD, and PCI compliance consistent with the company’s latest technology guidelines and meeting the requirements of their newly-formed culture of DevOps and extreme agility.
If you have any questions or suggestions, please comment below.
Welcome to a new Backblaze blog feature we’re calling “Tech To Track.” Each installment gives us a chance to round up links we think are particularly cool or interesting and present them here with a bit of commentary.
Unlimited cloud storage for your photos…if you have the right phone
Google’s new Pixel smartphone has Android fans excited. The sleek design, AMOLED screen and 12.3 megapixel camera (with an 8MP front-facing camera) are all tentpole features. The best part is that it doesn’t catch on fire like the Samsung Galaxy Note 7. But where to put all those pictures, especially when you can shoot 4K video? “Smart Storage” is Google’s answer. Google offers free storage for full-resolution photos and videos you shoot with the Pixel. Don’t let unlimited cloud storage make you complacent, though! Keep a local backup of those pictures, then back up that hard drive with Backblaze.
Finding spots in your home or office where your Wi-Fi doesn’t work? Does your internet connection slow way down or get unreliable in the bedroom or the family room? Wi-Fi “mesh” networks are an increasingly popular way to overcome such issues. Instead of one big Wi-Fi network router, smaller devices work together to broadcast a stronger network. Eero, Luma, Ubiquiti, and Linksys already play in this space. Now Google’s muscling in. Google Wifi comes with “Network Assist” technology to get you the best speed available. Better yet, it’s manageable through an Android and iOS smartphone app. Look for it to be available in the U.S. in time for the holidays starting at $199, with three packs for $299 – a lot cheaper than Eero’s offering.
Spinning hard disk drives continue to be the price per gigabyte leaders, but don’t expect that to last forever. Solid state drives (SSDs) continue to plummet in price as more companies and consumers demand the storage technology. Samsung says that by 2020, it’ll offer 512GB SSDs that will cost the same as a 1TB hard drive. We’re looking forward to the day we can build out a Backblaze Storage Pod with SSDs and not break the bank.
CERN is running out of disk space for data from the LHC
The Large Hadron Collider is running out of disk space. The world’s biggest particle accelerator is operating more efficiently and with better reliability than CERN scientists expected. That means a lot more data to collect. CERN researchers have short-changed themselves on the amount of storage they need to store and analyze all that data. Hey, CERN, give us a call – we’re be happy to help with B2 Cloud Storage!
Graham Burke can be accused of many things but moderating his words is certainly not one of them.
The outspoken co-chief of media company Village Roadshow has been front and center of many of Australia’s movie piracy battles and has authored some of their most controversial comments.
Speaking at the 71st Australian International Movie Convention today, Burke continued the trend. He launched a fresh attack on Internet piracy, accusing pirate site operators of terrible crimes and site users of undermining the livelihoods of creators.
“Nothing is more important or urgent, as every day that passes tens of thousands of our movies are stolen and it is a devastating contagious plague,” a copy of Burke’s speech obtained by The Australian (subscription) reads.
According to the Village Roadshow chief, the main problem is the sites that facilitate this “theft”, which are not only extremely dangerous places to visit but are run by equally dangerous people.
“We are sending our kids to very dangerous online neighborhoods — the pirates are not good guys,” Burke said.
“These aren’t roguish, basement-dwelling computer geeks — these are the same type of people that sell heroin.”
Describing pirate site operators as often having connections to “organised, international crime syndicates”, Burke warned that they only care about revenue, making “tens of millions blitzing our kids with [high-risk] advertising.”
Interestingly, Burke said that nearly three-quarters of people acknowledge that piracy is theft but noted that many downloaders are unaware that what they are doing is “wrong” because government inaction means that “dangerous” pirate sites are still open for business.
“In our research we repeatedly come across people who have not been told [piracy is wrong and is theft], and assume from continued practice, that it is socially and legally acceptable, and that it does no harm or that their individual activity won’t make any difference,” he said.
“People wouldn’t go into a 7-Eleven and swipe a Mars bar. People are fundamentally honest and fundamentally decent.”
But with site-blocking and making more content legally available only part of the solution, the Village Roadshow chief says his company has decided that taking action against the public is now required. Repeat infringers, Burke says, will now be subjected to legal action.
“We are planning to pursue our legal rights to protect our copyright by suing repeat infringers — not for a king’s ransom but akin to the penalty for parking a car in a loading zone,” ABC reports.
“If the price of an act of thievery is set at say AUS$300 (US$228), we believe most people will think twice.”
While it’s too early to estimate exactly how many Aussie pirates might be caught up in the dragnet, it’s fair to say the numbers could be considerable. Mad Max: Fury Road, a Village Roadshow produced movie, is said to have been illegally downloaded 3.5 million times. Australia has a population of around 23.5 million.
However, the age group of people said to be carrying out much of the pirating presents a problem. Burke says that piracy among adults has dropped in the past year due to the availability of services such as Netflix. However, the growing threat appears to come from a much younger age group.
“There has been some decline in piracy amongst Australian adults in the last year and part of this is due to new streaming services … which demonstrates that when product is legally available, this is a critical factor,” Burke said.
“However, before we get too comfortable by this decline in total piracy, the emphasis on movies is worse and illegal online activity of 12 to 17-year-old Australians has almost doubled since last year — with a whopping 31 per cent pirating movies.”
And there lies the dilemma. While Burke thinks that fines might be the answer to further reducing piracy among the adult population, he’s going to have a crisis on his hands if he starts targeting his big problem group – children. Kids can be sued in Australia but that sounds like a horrible proposition that will only undermine the campaign’s goals.
Whoever his company ‘fines’ or goes on to sue, Burke says the money accrued will go back into education campaigns to further reduce piracy. It’s a model previously employed by the RIAA, who eventually abandoned the strategy.
What if a hard drive could tell you it was going to fail before it actually did? Is that possible? Each day Backblaze records the SMART stats that are reported by the 67,814 hard drives we have spinning in our Sacramento data center. SMART stands for Self-Monitoring, Analysis and Reporting Technology and is a monitoring system included in hard drives that reports on various attributes of the state of a given drive.
While we’ve looked at SMART stats before, this time we’ll dig into the SMART stats we use in determining drive failure and we’ll also look at a few other stats we find interesting.
We use Smartmontools to capture the SMART data. This is done once a day for each hard drive. We add in a few elements, such as drive model, serial number, etc. and create a row in the daily log for each drive. You can download these logs files from our website. Drives which have failed are marked as such and their data is no longer logged. Sometimes a drive will be removed from service even though it has not failed, like when we upgrade a Storage Pod by replacing 1TB drives with 4TB drives. In this case, the 1TB drive is not marked as a failure, but the SMART data will no longer be logged.
SMART stats we use to predict Hard Drive failure
For the last few years we’ve used the following five SMART stats as a means of helping determine if a drive is going to fail.
Reallocated Sectors Count
Reported Uncorrectable Errors
Current Pending Sector Count
Uncorrectable Sector Count
When the RAW value for one of these five attributes is greater than zero, we have a reason to investigate. We also monitor RAID array status, Backblaze Vault array status and other Backblaze internal logs to identify potential drive problems. These tools generally only report exceptions, so on any given day the number of investigations is manageable even though we have nearly 70,000 drives.
Let’s stay focused on SMART stats and take a look at the table below which shows percentage of both failed and operational drives, which are reporting a RAW value that is greater than zero for the SMART stat listed.
While no single SMART stat is found in all failed hard drives, here’s what happens when we consider all five SMART stats as a group.
Operational drives with one or more of our five SMART stats greater than zero – 4.2%
Failed drives with one or more of our five SMART stats greater than zero – 76.7%
That means that 23.3% of failed drives showed no warning from the SMART stats we record. Are these stats useful? I’ll let you decide if you’d like to have a sign of impending drive failure 76.7% of the time. But before you decide, read on.
Having a given drive stat with a value that is greater than zero may mean nothing at the moment. For example, a drive may have a SMART 5 raw value of 2, meaning two drive sectors have been remapped. On it’s own such a value means little until combined with other factors. The reality is it can take a fair amount of intelligence (both human and artificial) during the evaluation process to reach the conclusion that an operational drive is going to fail.
One thing that helps is when we observe multiple SMART errors. The following chart shows the incidence of having one, two, three, four or all five of the SMART stats we track have a raw value that is greater than zero. To clarify, a value of 1 means that of the five SMART stats we track only one has a value greater than zero, while a value of 5 means that all five SMART stats we track have a value greater than zero. But, before we decide that multiple errors help, let’s take a look at the correlation between these SMART stats as seen in the chart below.
In most instances the stats have little correlation and can be considered independent. Only SMART 197 and 198 have a good correlation meaning we could consider them as “one indicator” versus two. Why do we continue to collect both SMART 197 and SMART 198? Two reasons: 1) the correlation isn’t perfect so there’s room for error, and 2) not all drive manufacturers report both attributes.
How does understanding the correlation, of lack thereof, of these SMART stats help us? Let’s say, a drive reported a SMART 5 raw value of 10 and SMART 197 raw value of 20. From that we could conclude the drive is deteriorating and should be scheduled for replacement. Whereas, if the same drive had SMART 197 raw value of 5 and a SMART 198 raw value of 20 and no other errors, we might hold off on replacing the drive awaiting more data, such as the frequency of the errors occurring.
So far it might sound like we will fail a hard drive if we just observe enough SMART values that are greater than zero, but we also have to factor time into the equation. The SMART stats we track, with the exception of SMART 197, are cumulative in nature, meaning we need to consider the time period over which the errors were reported.
For example, let’s start with a hard drive that jumps from zero to 20 Reported Uncorrectable Errors (SMART 187) in one day. Compare that to a second drive which has a count of 60 SMART 187 errors, with one error occurring on average once a month over a five year period. Which drive is a better candidate for failure?
Another stat to consider: SMART 189 – High Fly Writes
This is a stat we’ve been reviewing to see if it will join our current list of five SMART stats we use today. This stat is the cumulative count of the number of times the recording head “flies” outside its normal operating range. Below we list the percentage of operational and failed drives where the SMART 189 raw value is greater than zero.
Failed Drives: 47.0%
Operational Drives: 16.4%
The false positive percentage of operational drives having a greater than zero value may at first glance seem to render this stat meaningless. But what if I told you that for most of the operational drives with SMART 189 errors, that those errors were distributed fairly evenly over a long period of time. For example, there was one error a week on average for 52 weeks. In addition, what if I told you that many of the failed drives with this error had a similar number of errors, but they were distributed over a much shorter period of time, for example 52 errors over a one-week period. Suddenly SMART 189 looks very interesting in predicting failure by looking for clusters of High Fly Writes over a small period of time. We are currently in the process of researching the use of SMART 189 to determine if we can define a useful range of rates at which errors occur.
SMART 12 – Power Cycles
Is it better to turn off your computer when you are not using it or should you leave it on? The debate has raged on since the first personal computers hit the market in the 80’s. On one-hand turning off a computer “saves” the components inside and saves a little on your electricity bill. On the other-hand the shut-down / start-up process is tough on the components, especially the hard drive.
Will analyzing the SMART 12 data finally allow us to untie this Gordian knot?
Let’s compare the number of power cycles (SMART 12) of failed drives versus operational drives.
Failed Drives were power cycled on average: 27.7 times
Operational Drives were power cycled on average: 10.2 times
At first blush, it would seem we should keep our systems running as the failed drives had 175% more power cycles versus drives that have not failed. Alas, I don’t think we can declare victory just yet. First, we don’t power cycle our drives very often. On average, drives get power-cycled about once every couple of months. That’s not quite the same as turning off your computer every night. Second, we didn’t factor in the age range of the drives. To do that we’d need a lot more data points to get results we could rely on. That means, sadly, we don’t have enough data to reach a conclusion.
Perhaps one of our stat-geek readers will be able to tease out a conclusion regarding power cycles. Regardless, everyone is invited to download and review our hard drive stats data including the SMART stats for each drive. If you find anything interesting let us know.
Every few years, a researcher replicates a security study by littering USB sticks around an organization’s grounds and waiting to see how many people pick them up and plug them in, causing the autorun function to install innocuous malware on their computers. These studies are great for making security professionals feel superior. The researchers get to demonstrate their security expertise and use the results as “teachable moments” for others. “If only everyone was more security aware and had more security training,” they say, “the Internet would be a much safer place.”
Enough of that. The problem isn’t the users: it’s that we’ve designed our computer systems’ security so badly that we demand the user do all of these counterintuitive things. Why can’t users choose easy-to-remember passwords? Why can’t they click on links in emails with wild abandon? Why can’t they plug a USB stick into a computer without facing a myriad of viruses? Why are we trying to fix the user instead of solving the underlying security problem?
Traditionally, we’ve thought about security and usability as a trade-off: a more secure system is less functional and more annoying, and a more capable, flexible, and powerful system is less secure. This “either/or” thinking results in systems that are neither usable nor secure.
Our industry is littered with examples. First: security warnings. Despite researchers’ good intentions, these warnings just inure people to them. I’ve read dozens of studies about how to get people to pay attention to security warnings. We can tweak their wording, highlight them in red, and jiggle them on the screen, but nothing works because users know the warnings are invariably meaningless. They don’t see “the certificate has expired; are you sure you want to go to this webpage?” They see, “I’m an annoying message preventing you from reading a webpage. Click here to get rid of me.”
Next: passwords. It makes no sense to force users to generate passwords for websites they only log in to once or twice a year. Users realize this: they store those passwords in their browsers, or they never even bother trying to remember them, using the “I forgot my password” link as a way to bypass the system completely — effectively falling back on the security of their e-mail account.
And finally: phishing links. Users are free to click around the Web until they encounter a link to a phishing website. Then everyone wants to know how to train the user not to click on suspicious links. But you can’t train users not to click on links when you’ve spent the past two decades teaching them that links are there to be clicked.
We must stop trying to fix the user to achieve security. We’ll never get there, and research toward those goals just obscures the real problems. Usable security does not mean “getting people to do what we want.” It means creating security that works, given (or despite) what people do. It means security solutions that deliver on users’ security goals without — as the 19th-century Dutch cryptographer Auguste Kerckhoffs aptly put it — “stress of mind, or knowledge of a long series of rules.”
I’ve been saying this for years. Security usability guru (and one of the guest editors of this issue) M. Angela Sasse has been saying it even longer. People — and developers — are finally starting to listen. Many security updates happen automatically so users don’t have to remember to manually update their systems. Opening a Word or Excel document inside Google Docs isolates it from the user’s system so they don’t have to worry about embedded malware. And programs can run in sandboxes that don’t compromise the entire computer. We’ve come a long way, but we have a lot further to go.
“Blame the victim” thinking is older than the Internet, of course. But that doesn’t make it right. We owe it to our users to make the Information Age a safe place for everyone — not just those with “security awareness.”
This essay previously appeared in the Sep/Oct issue of IEEE Security & Privacy.
The trust of our users is ISRG’s most critical asset. Transparency regarding legal requests is an important part of making sure our users can trust us, and to that end we will be publishing reports twice annually. Reports will be published three months after the period covered in order to allow us time to research all requests and orders received during the period.
Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades. Since images and user-generated content dominate the Internet today, filtering NSFW images becomes an essential component of Web and mobile applications. With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.
Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.
Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.
Convolutional Neural Network (CNN) architectures and tradeoffs
In recent years, CNNs have become very successful in image classification problems   . Since 2012, new CNN architectures have continuously improved the accuracy of the standard ImageNet classification challenge. Some of the major breakthroughs include AlexNet (2012) , GoogLeNet , VGG (2013)  and Residual Networks (2015) . These networks have different tradeoffs in terms of runtime, memory requirements, and accuracy. The main indicators for runtime and memory requirements are:
Flops or connections – The number of connections in a neural network determine the number of compute operations during a forward pass, which is proportional to the runtime of the network while classifying an image.
Parameters -–The number of parameters in a neural network determine the amount of memory needed to load the network.
Ideally we want a network with minimum flops and minimum parameters, which would achieve maximum accuracy.
Training a deep neural network for NSFW classification
We train the models using a dataset of positive (i.e. NSFW) images and negative (i.e. SFW – suitable/safe for work) images. We are not releasing the training images or other details due to the nature of the data, but instead we open source the output model which can be used for classification by a developer.
We use the Caffe deep learning library and CaffeOnSpark; the latter is a powerful open source framework for distributed learning that brings Caffe deep learning to Hadoop and Spark clusters for training models (Big shout out to Yahoo’s CaffeOnSpark team!).
While training, the images were resized to 256×256 pixels, horizontally flipped for data augmentation, and randomly cropped to 224×224 pixels, and were then fed to the network. For training residual networks, we used scale augmentation as described in the ResNet paper , to avoid overfitting. We evaluated various architectures to experiment with tradeoffs of runtime vs accuracy.
MS_CTC  – This architecture was proposed in Microsoft’s constrained time cost paper. It improves on top of AlexNet in terms of speed and accuracy maintaining a combination of convolutional and fully-connected layers.
Squeezenet  – This architecture introduces the fire module which contain layers to squeeze and then expand the input data blob. This helps to save the number of parameters keeping the Imagenet accuracy as good as AlexNet, while the memory requirement is only 6MB.
VGG  – This architecture has 13 conv layers and 3 FC layers.
GoogLeNet  – GoogLeNet introduces inception modules and has 20 convolutional layer stages. It also uses hanging loss functions in intermediate layers to tackle the problem of diminishing gradients for deep networks.
ResNet-50  – ResNets use shortcut connections to solve the problem of diminishing gradients. We used the 50-layer residual network released by the authors.
ResNet-50-thin – The model was generated using our pynetbuilder tool and replicates the Residual Network paper’s 50-layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here.
Tradeoffs of different architectures: accuracy vs number of flops vs number of params in network.
The deep models were first pre-trained on the ImageNet 1000 class dataset. For each network, we replace the last layer (FC1000) with a 2-node fully-connected layer. Then we fine-tune the weights on the NSFW dataset. Note that we keep the learning rate multiplier for the last FC layer 5 times the multiplier of other layers, which are being fine-tuned. We also tune the hyper parameters (step size, base learning rate) to optimize the performance.
We observe that the performance of the models on NSFW classification tasks is related to the performance of the pre-trained model on ImageNet classification tasks, so if we have a better pretrained model, it helps in fine-tuned classification tasks. The graph below shows the relative performance on our held-out NSFW evaluation set. Please note that the false positive rate (FPR) at a fixed false negative rate (FNR) shown in the graph is specific to our evaluation dataset, and is shown here for illustrative purposes. To use the models for NSFW filtering, we suggest that you plot the ROC curve using your dataset and pick a suitable threshold.
Comparison of performance of models on Imagenet and their counterparts fine-tuned on NSFW dataset.
We are releasing the thin ResNet 50 model, since it provides good tradeoff in terms of accuracy, and the model is lightweight in terms of runtime (takes < 0.5 sec on CPU) and memory (~23 MB). Please refer our git repository for instructions and usage of our model. We encourage developers to try the model for their NSFW filtering use cases. For any questions or feedback about performance of model, we encourage creating a issue and we will respond ASAP.
Results can be improved by fine-tuning the model for your dataset or use case. If you achieve improved performance or you have trained a NSFW model with different architecture, we encourage contributing to the model or sharing the link on our description page.
Disclaimer: The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project.
 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition” arXiv preprint arXiv:1512.03385 (2015).
 Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.”; arXiv preprint arXiv:1409.1556(2014).
 Iandola, Forrest N., Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1MB model size.”; arXiv preprint arXiv:1602.07360 (2016).
 He, Kaiming, and Jian Sun. “Convolutional neural networks at constrained time cost.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353-5360. 2015.
 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9. 2015.
 Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks” In Advances in neural information processing systems, pp. 1097-1105. 2012.
Research into online piracy comes in all shapes and sizes, often with equally mixed results. The main question is often whether piracy is hurting sales.
New research conducted by economists from the European Commission’s Directorate-General for Internal Market, Industry, Entrepreneurship and SMEs, tries to find answers for the movie industry.
For a new paper titled “Movie Piracy and Displaced Sales in Europe,” the researchers conducted a large-scale survey among 30,000 respondents from six countries, documenting their movie consumption patterns.
Using statistical models and longitudinal data, they were able to estimate how piracy affects legal sales and if this differs from country to country.
Perhaps unsurprisingly, the findings show that not every pirated movie is a lost sale. Instead, for every hundred films that are first viewed from a pirated source, 37 viewings from paid movies are ‘lost’.
This results in a displacement rate of 0.37, which is still a high number of course, also compared to previous research.
It’s worth noting that in some cases piracy actually has a beneficial effect. This is true for movies that people have seen more than twice.
“Interestingly, we found evidence of a sampling effect: for movies that are seen more than twice, first unpaid consumption slightly increases paid second consumption,” the researchers write.
However, the sampling effect doesn’t outweigh the loss in sales. Overall the researchers estimate that online piracy leads to a significant loss in revenue for the movie industry.
“Using a back-of-the-envelope calculation, we show that this implies that unpaid movie viewings reduced movie sales in Europe by about 4.4% during the sample period,” they write.
This negative effect is driven by a relatively small group of consumers. Roughly 20% of the respondents with the highest movie consumption are responsible for 94% of lost movie sales. Or put differently, the most avid film fans pirate the most.
Interestingly, there are large between-country differences too. In Germany online movie piracy results in ‘only’ a 1.65% loss, this figure is 10.41% for Spain. The UK (2.89%), France (5.73%), Poland (7.21%) and Sweden (7.65%) rank somewhere in between.
According to the researchers, their findings can help policymakers to decide what the best anti-piracy enforcement strategies are. In addition, changes between countries could help to evaluate existing and future measures and inspire future research.
“The estimates that we provide can help policy makers to asses the efficient use of public resources to be spent on copyright enforcement of movies.”
“In particular, since we find that virtually all the lost sales of movies are due to a very small group of individuals, most damages of movie piracy could therefore potentially be prevented with well targeted policies,” the researchers conclude.
Interesting research from Sasha Romanosky at RAND:
Abstract: In 2013, the US President signed an executive order designed to help secure the nation’s critical infrastructure from cyberattacks. As part of that order, he directed the National Institute for Standards and Technology (NIST) to develop a framework that would become an authoritative source for information security best practices. Because adoption of the framework is voluntary, it faces the challenge of incentivizing firms to follow along. Will frameworks such as that proposed by NIST really induce firms to adopt better security controls? And if not, why? This research seeks to examine the composition and costs of cyber events, and attempts to address whether or not there exist incentives for firms to improve their security practices and reduce the risk of attack. Specifically, we examine a sample of over 12 000 cyber events that include data breaches, security incidents, privacy violations, and phishing crimes. First, we analyze the characteristics of these breaches (such as causes and types of information compromised). We then examine the breach and litigation rate, by industry, and identify the industries that incur the greatest costs from cyber events. We then compare these costs to bad debts and fraud within other industries. The findings suggest that public concerns regarding the increasing rates of breaches and legal actions may be excessive compared to the relatively modest financial impact to firms that suffer these events. Public concerns regarding the increasing rates of breaches and legal actions, conflict, however, with our findings that show a much smaller financial impact to firms that suffer these events. Specifically, we find that the cost of a typical cyber incident in our sample is less than $200 000 (about the same as the firm’s annual IT security budget), and that this represents only 0.4% of their estimated annual revenues.
The result is that it often makes business sense to underspend on cybersecurity and just pay the costs of breaches:
Romanosky analyzed 12,000 incident reports and found that typically they only account for 0.4 per cent of a company’s annual revenues. That compares to billing fraud, which averages at 5 per cent, or retail shrinkage (ie, shoplifting and insider theft), which accounts for 1.3 per cent of revenues.
As for reputational damage, Romanosky found that it was almost impossible to quantify. He spoke to many executives and none of them could give a reliable metric for how to measure the PR cost of a public failure of IT security systems.
He also noted that the effects of a data incident typically don’t have many ramifications on the stock price of a company in the long term. Under the circumstances, it doesn’t make a lot of sense to invest too much in cyber security.
What’s being left out of these costs are the externalities. Yes, the costs to a company of a cyberattack are low to them, but there are often substantial additional costs borne by other people. The way to look at this is not to conclude that cybersecurity isn’t really a problem, but instead that there is a significant market failure that governments need to address.
NetSeer significantly reduces costs, improves the reliability of its real-time ad-bidding cluster, and delivers 100-millisecond response times using AWS. The company offers online solutions that help advertisers and publishers match search queries and web content to relevant ads. NetSeer runs its bidding cluster on AWS, taking advantage of Amazon EC2 Spot Fleet Instances.
New York Public Library revamped its fractured IT environment—which had older technology and legacy computing—to a modernized platform on AWS. The New York Public Library has been a provider of free books, information, ideas, and education for more than 17 million patrons a year. Using Amazon EC2, Elastic Load Balancer, Amazon RDS and Auto Scaling, NYPL is able to build scalable, repeatable systems quickly at a fraction of the cost.
MakerBot uses AWS to understand what its customers need, and to go to market faster with new and innovative products. MakerBot is a desktop 3-D printing company with more than 100 thousand customers using its 3-D printers. MakerBot uses Matillion ETL for Amazon Redshift to process data from a variety of sources in a fast and cost-effective way.
University of Maryland, College Park uses the AWS cloud to create a stable, secure and modern technical environment for its students and staff while ensuring compliance. The University of Maryland is a public research university located in the city of College Park, Maryland, and is the flagship institution of the University System of Maryland. The university uses AWS to migrate all of their datacenters to the cloud, as well as Amazon WorkSpaces to give students access to software anytime, anywhere and with any device.
By Francisco Perez-Sorrosal, Ohad Shacham, Kostas Tsioutsiouliklis, and Edward Bortnikov
We are proud to announce that Omid (“Hope” in Persian), Yahoo’s transaction manager for HBase , has been accepted as an Apache Incubator project. Yahoo has been a long-time contributor to the Apache community in the Hadoop ecosystem, including HBase, YARN, Storm, and Pig. Our acceptance as an Apache Incubator project is another step forward following the success of ZooKeeper  and BookKeeper , which were born at Yahoo and graduated to top-level Apache projects.
These days, most NoSQL databases, including HBase, do not provide the OLTP support available in traditional relational databases, forcing the applications running on top of them to trade transactional support for greater agility and scalability. However, transactions are essential in many applications using NoSQL datastores as the main source of data, for example, in incremental content processing systems. Omid enables these applications to benefit from the best of both worlds: the scalability provided by NoSQL datastores, such as HBase, and the concurrency and atomicity provided by transaction processing systems.
Omid provides a high-performant ACID transactional framework with Snapshot Isolation guarantees on top of HBase , being able to scale to thousands of clients triggering transactions on application data. It’s one of the few open-source transactional frameworks that can scale beyond 100K transactions per second on mid-range hardware while incurring minimal impact on the latency accessing the datastore.
At its core, Omid utilizes a lock-free approach to support multiple concurrent clients. Its design relies on a centralized conflict detection component called Transaction Status Oracle (TSO), which efficiently resolves write-set collisions among concurrent transactions . Another important benefit is that Omid does not require any modification of the underlying key-value datastore – HBase in this case. Moreover, the recently-added high-availability algorithm eliminates the single point of failure represented by the TSO in those deployments that require a higher degree of dependability . Last but not least, the API is very simple – mimicking the transaction manager APIs in the relational world: begin, commit, rollback – and the client and server configuration processes have been simplified to help both application developers and system administrators.
Efforts toward growing the community have already been underway in the last few months. Apache Hive  contributors from Hortonworks expressed interest in storing Hive metadata in HBase using Omid, and this led to a fruitful collaboration that resulted in Omid now supporting HBase 1.x versions. Omid could also be used as the transaction manager in other SQL abstraction layers on top of HBase such as Apache Phoenix , or as the transaction coordinator in distributed systems, such as the Apache DistributedLog project  and Pulsar, a distributed pub-sub messaging platform recently open sourced by Yahoo.
Since its inception in 2011 at Yahoo Research, Omid has matured to operate at Web scale in a production environment. For example, since 2014 Omid has been used at Yahoo – along with other Hadoop technologies – to power our incremental content ingestion platform for search and personalization products. In this role, Omid is serving millions of transactions per day over HBase data.
We have decided to move the Omid project to “the Apache Way” because we think it is the next logical step after having battle-tested the project in production at Yahoo and having open-sourced the code in Yahoo’s public Github in 2012 (The Omid Github repository currently has 269 stars and 101 forks, and we were asked by our colleagues in the Open Source community to release it as an Apache Incubator project.). As we aim to form a larger Omid community outside Yahoo, we think that the Apache Software Foundation is the perfect umbrella to achieve this. We invite the Apache community to contribute by providing patches, reviewing code, proposing new features or improvements, and giving talks at conferences such as Hadoop Summit, HBaseCon, ApacheCon, etc. under the Apache rules.
We see Omid being recognized as an Apache Incubator Project as the first step in growing a vibrant community around this technology. We are confident that contributors in the Apache community will add more features to Omid and further enhance the current performance and latency. Stay tuned to @ApacheOmid on Twitter!
Neural networks are good at identifying faces, even if they’re blurry:
In a paper released earlier this month, researchers at UT Austin and Cornell University demonstrate that faces and objects obscured by blurring, pixelation, and a recently-proposed privacy system called P3 can be successfully identified by a neural network trained on image datasets — in some cases at a more consistent rate than humans.
“We argue that humans may no longer be the ‘gold standard’ for extracting information from visual data,” the researchers write. “Recent advances in machine learning based on artificial neural networks have led to dramatic improvements in the state of the art for automated image recognition. Trained machine learning models now outperform humans on tasks such as object recognition and determining the geographic location of an image.”
Two weeks ago, the International Federation of the Phonographic Industry published research which claimed that half of 16 to 24-year-olds use stream-ripping tools to copy music from sites like YouTube.
The industry group said that the problem of stream-ripping has become so serious that in volume terms it had overtaken downloading from ‘pirate’ sites. Given today’s breaking news, the timing of the report was no coincidence.
Earlier today in a California District Court, a huge coalition of recording labels sued the world’s largest YouTube ripping site. UMG Recordings, Capitol Records, Warner Bros, Sony Music, Arista Records, Atlantic Records and several others claim that YouTube-MP3 (YTMP3), owner Philip Matesanz, and Does 1-10 have infringed their rights.
“YTMP3 rapidly and seamlessly removes the audio tracks contained in videos streamed from YouTube that YTMP3’s users access, converts those audio tracks to an MP3 format, copies and stores them on YTMP3’s servers, and then distributes copies of the MP3 audio files from its servers to its users in the United States, enabling its users to download those MP3 files to their computers, tablets, or smartphones,” the complaint reads.
The labels allege that YouTube-MP3 is one of the most popular sites in the entire world and as a result its owner, German-based company PMD Technologies UG, is profiting handsomely from their intellectual property.
“Defendants are depriving Plaintiffs and their recording artists of the fruits of their labor, Defendants are profiting from the operation of the YTMP3 website. Through the promise of illicit delivery of free music, Defendants have attracted millions of users to the YTMP3 website, which in turn generates advertising revenues for Defendants,” the labels add.
And it’s very clear that the labels mean business. YouTube-MP3 is being sued for direct, contributory, vicarious and inducement of copyright infringement, plus circumvention of technological measures.
Among other things, the labels are also demanding a preliminary and permanent injunction forbidding the Defendants from further infringing their rights. They also want YouTube-MP3’s domain name to be surrendered.
“This is a coordinated action to protect the rights of artists and labels from the blatant infringements of YouTube-mp3, the world’s single-largest ‘stream ripping’ site,” says IFPI Chief Executive Frances Moore.
“Music companies and digital services today offer fans more options than ever before to listen to music legally, when and where they want to do so – over hundreds of services with scores of millions of tracks – all while compensating artists and labels. Stream ripping sites should not be allowed jeopardize this.”
Cary Sherman, the Chairman and CEO of the Recording Industry Association of America (RIAA) says that YouTube-MP3 is making money on the back of their business and needs to be stopped.
“This site is raking in millions on the backs of artists, songwriters and labels. We are doing our part, but everyone in the music ecosystem who says they believe that artists should be compensated for their work has a role to play,” Sherman says.
“It should not be so easy to engage in this activity in the first place, and no stream ripping site should appear at the top of any search result or app chart.”
BPI Chief Executive Geoff Taylor says that it’s time for web services and related companies to stop supporting similar operations.
“It’s time to stop illegal sites like this building huge fortunes by ripping off artists and labels. Fans have access now to a fantastic range of legal music streaming services, but they can only exist if we take action to tackle the online black market,” Taylor says.
“We hope that responsible advertisers, search engines and hosting providers will also reflect on the ethics of supporting sites that enrich themselves by defrauding creators.”
TorrentFreak contacted YouTube-MP3 owner Philip Matesanz for comment but at the time of publication we were yet to receive a response.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.