Tag Archives: Technical

TVAddons: A Law Firm is Not Spying on Our Kodi Users

Post Syndicated from Andy original https://torrentfreak.com/tvaddons-a-law-firm-is-not-spying-on-our-kodi-users-170918/

A few months ago, TVAddons was without doubt the leading repository for third-party Kodi addons.

During March, the platform had 40 million unique users connected to the site’s servers, together transferring a petabyte of addons and updates.

In June, however, things started to fall apart. After news broke that the site was being sued in a federal court in Texas, TVAddons disappeared. It was assumed these events were connected but it later transpired the platform was being used in Canada as well, and that was the true reason for the downtime.

While it’s easy to be wise after the event, in hindsight it might’ve been better for the platform to go public about the Canadian matter quite a bit sooner than it did. Of course, there are always legal considerations that prevent early disclosure, but when popular sites disappear into a black hole, two plus two can quickly equal five when fed through the web’s rumor machine.

Things weren’t helped in July when it was discovered that the site’s former domains had been handed over to a Canada-based law firm. Again, no official explanation was forthcoming and again, people became concerned.

If this had been a plaintiff’s law firm, people would’ve had good reason to worry, since it would have been technically possible to spy on TVAddons’ users. However, as the truth began to filter out and court papers became available, it soon became crystal clear that simply wasn’t the case.

The bottom line, which is backed up by publicly available court papers, is that the law firm holding the old TVAddons domains is not the law firm suing TVAddons. Instead, it was appointed by the court to hold TVAddons’ property until the Canadian lawsuit is brought to a conclusion, whenever that might be.

“They have a legal obligation to protect our property at all cost, and prevent anyone (especially the law firm who is suing is) from gaining access to them,” says TVAddons.

“The law firm who is holding them is doing nothing more than protecting our property until the time that it will finally be returned after the appeal takes place.”

Unfortunately, assurances provided by TVAddons and information published by the court itself hasn’t been enough to stop some people fearing the worst. While the facts have plenty of support on Twitter and Facebook, there also appears to be an element who would like to see TVAddons fail in its efforts to re-establish itself.

Only time will tell who will win that battle but in the meantime, TVAddons has tried to cover all the bases in an update post on its blog.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Can an Army of Bitcoin “Bounty Hunters” Deter Pirates?

Post Syndicated from Ernesto original https://torrentfreak.com/can-an-army-of-bitcoin-bounty-hunters-deter-pirates-170917/

When we first heard of the idea to use Bitcoin bounties to track down pirated content online, we scratched our heads.

Snitching on copyright infringers is not a new concept, but the idea of instant cash rewards though cryptocurrency was quite novel.

In theory, it’s pretty straightforward. Content producers can add a unique identifying watermark into movies, eBooks, or other digital files before they’re circulated. When these somehow leak to the public, the bounty hunters use the watermark to claim their Bitcoin, alerting the owner in the process.

This helps to spot leaks early on, even on networks where automated tools don’t have access, and identify the source at the same time.

Two years have passed and it looks like the idea was no fluke. Custos, the South African company that owns the technology, has various copyright holders on board and recently announced a new partnership with book publisher Erudition Digital.

With help from anti-piracy outfit Digimarc, the companies will add identifying watermarks to eBook releases, counting on the bounty hunters to keep an eye out for leaks. These bounty hunters don’t have to be anti-piracy experts. On the contrary, pirates are more than welcome to help out.

“The Custos approach is revolutionary in that it attacks the economy of piracy by targeting uploaders rather than downloaders, turning downloaders into an early detection network,” the companies announced a few days ago.

“The result is pirates turn on one another, sowing seeds of distrust amongst their communities. As a result, the Custos system is capable of penetrating hard-to-reach places such as the dark web, peer-to-peer networks, and even email.”



Devon Weston, Director of Market Development for Digimarc Guardian, believes that this approach is the next level in anti-piracy efforts. It complements the automated detection tools that have been available in the past by providing access to hard-to-reach places.

“Together, this suite of products represents the next generation in technical measures against eBook piracy,” Weston commented on the partnership.

TorrentFreak reached out to Custos COO Fred Lutz to find out what progress the company has made in recent years. We were informed that they have been protecting thousands of copies every month, ranging from pre-release movie content to eBooks.

At the moment the company works with a selected group of “bounty hunters,” but they plan to open the extraction tool to the public in the near future, so everyone can join in.

“So far we have carefully seeded the free bounty extractor tool in relevant communities with great success. However, in the next phase, we will open the bounty hunting to the general public. We are just careful not to grow the bounty hunting community faster than the number of bounties in the wild require,” Lutz tells us.

The Bitcoin bounties themselves vary in size based on the specific use case. For a movie screener, they are typically anything between $10 and $50. However, for the most sensitive content, they can be $100 or more.

“We can also adjust the bounty over time based on the customer’s needs. A low-quality screener that was very sensitive prior to cinematic release does not require as large a bounty after cam-rips becomes available,” Lutz notes.

Thus far, roughly 50 Bitcoin bounties have been claimed. Some of these were planted by Custos themselves, as an incentive for the bounty hunters. Not a very high number, but that doesn’t mean that it’s not working.

“While this number might seem a bit small compared to the number of copies we protect, our aim is first and foremost not to detect leaks, but to pose a credible threat of quick detection and being caught.”

People who receive content protected by Custos are made aware of the watermarks, which may make them think twice about sharing it. If that’s the case, then it’s having effect without any bounties being claimed.

The question remains how many people will actively help to spot bounties. The success of the system largely depends on volunteers, and not all pirates are eager to rat on the people that provide free content.

On the other hand, there’s also room to abuse the system. In theory, people could claim the bounties on their own eBooks and claim that they’ve lost their e-reader. That would be fraud, of course, but since the bounties are in Bitcoin this isn’t easy to prove.

That brings us to the final question. What happens of a claimed bounty identifies a leaker? Custos admits that this alone isn’t enough evidence to pursue a legal case, but the measures that are taken in response are up to the copyright holders.

“A claim of a bounty is never a sufficient legal proof of piracy, instead, it is an invaluable first piece of evidence on which a legal case could be built if the client so requires. Legal prosecution is definitely not always the best approach to dealing with leaks,” Lutz says.

Time will tell if the Bitcoin bounty approach works…

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Kodi ‘Trademark Troll’ Has Interesting Views on Co-Opting Other People’s Work

Post Syndicated from Andy original https://torrentfreak.com/kodi-trademark-troll-has-interesting-views-on-co-opting-other-peoples-work-170917/

The Kodi team, operating under the XBMC Foundation, announced last week that a third-party had registered the Kodi trademark in Canada and was using it for their own purposes.

That person was Geoff Gavora, who had previously been in communication with the Kodi team, expressing how important the software was to his sales.

“We had hoped, given the positive nature of his past emails, that perhaps he was doing this for the benefit of the Foundation. We learned, unfortunately, that this was not the case,” XBMC Foundation President Nathan Betzen said.

According to the Kodi team, Gavora began delisting Amazon ads placed by companies selling Kodi-enabled products, based on infringement of Gavora’s trademark rights.

“[O]nly Gavora’s hardware can be sold, unless those companies pay him a fee to stay on the store,” Betzen explained.

Predictably, Gavora’s move is being viewed as highly controversial, not least since he’s effectively claiming licensing rights in Canada over what should be a free and open source piece of software. TF obtained one of the notices Amazon sent to a seller of a Kodi-enabled device in Canada, following a complaint from Gavora.

Take down Kodi from Amazon, or pay Gavora

So who is Geoff Gavora and what makes him tick? Thanks to a 2016 interview with Ali Salman of the Rapid Growth Podcast, we have a lot of information from the horse’s mouth.

It all began in 2011, when Gavora began jailbreaking Apple TVs, loading them with XBMC, and selling them to friends.

“I did it as a joke, for beer money from my friends,” Gavora told Salman.

“I’d do it for $25 to $50 and word of mouth spread that I was doing this so we could load on this media center to watch content and online streams from it.”

Intro to the interview with Ali Salman

Soon, however, word of mouth caused the business to grow wings, Gavora claims.

“So they started telling people and I start telling people it’s $50, and then I got so busy so I start telling people it’s $75. I’m getting too busy with my work and with this. And it got to the point where I was making more jailbreaking these Apple TVs than I was at my career, and I wasn’t very happy at my career at that time.”

Jailbreaking was supposed to be a side thing to tide Gavora over until another job came along, but he had a problem – he didn’t come from a technical background. Nevertheless, what Gavora did have was a background in marketing and with a decent knowledge of how to succeed in customer service, he majored on that front.

Gavora had come to learn that while people wanted his devices, they weren’t very good at operating XBMC (Kodi’s former name) which he’d loaded onto them. With this in mind, he began offering web support and phone support via a toll-free line.

“I started receiving calls from New York, Dallas, and then Australia, Hong Kong. Everyone around the world was calling me and saying ‘we hear there’s some kid in Calgary, some young child, who’s offering tech support for the Apple TV’,” Gavora said.

But with things apparently going well, a wrench was soon thrown into the works when Apple released the third variant of its Apple TV and Gavorra was unable to jailbreak it. This prompted him to market his own Linux-based set-top device and his business, Raw-Media, grew from there.

While it seems likely that so-called ‘Raw Boxes’ were doing reasonably well with consumers, what was the secret of their success? Podcast host Salman asked Gavora for his ‘networking party 10-second pitch’, and the Canadian was happy to oblige.

“I get this all the time actually. I basically tell people that I sell a box that gives them free TV and movies,” he said.

This was met with laughter from the host, to which Gavora added, “That’s sort of the three-second pitch and everyone’s like ‘Oh, tell me more’.”

“Who doesn’t like free TV, come on?” Salman responded. “Yeah exactly,” Gavora said.

The image below, taken from a January 2016 YouTube unboxing video, shows one of the products sold by Gavora’s company.

Raw-Media Kodi Box packaging (note Kodi logo)

Bearing in mind the offer of free movies and TV, the tagline on the box, “Stop paying for things you don’t want to watch, watch more free tv!” initially looks quite provocative. That being said, both the device and Kodi are perfectly capable of playing plenty of legal content from free sources, so there’s no problem there.

What is surprising, however, is that the unboxing video shows the device being booted up, apparently already loaded with infamous third-party Kodi addons including PrimeWire, Genesis, Icefilms, and Navi-X.

The unboxing video showing the Kodi setup

Given that Gavora has registered the Kodi trademark in Canada and prints the official logo on his packaging, this runs counter to the official Kodi team’s aggressive stance towards boxes ready-configured with what they categorize as banned addons. Matters are compounded when one visits the product support site.

As seen in the image below, Raw-Media devices are delivered with a printed card in the packaging informing people where to get the after-sales services Gavora says he built his business upon. The cards advise people to visit No-Issue.ca, a site setup to offer text and video-based support to set-top box buyers.

No-Issue.ca (which is hosted on the same server as raw-media.ca and claimed officially as a sister site here) now redirects to No-Issue.is, as per a 2016 announcement. It has a fairly bland forum but the connected tutorial videos, found on No Issue’s YouTube channel, offer a lot more spice.

Registered under Gavora’s online nickname Gombeek (which is also used on the official Kodi forums), the channel is full of videos detailing how to install and use a wide range of addons.

The No-issue YouTube Channel tutorials

But while supplying tutorial videos is one thing, providing the actual software addons is another. Surprisingly, No-Issue does that too. Filed away under the URL http://solved.no-issue.is/ is a Kodi repository which distributes a wide range of addons, including many that specialize in infringing content, according to the Kodi team.

The No-Issue repository

A source familiar with Raw-Media’s devices informs TF that they’re no longer delivered with addons installed. However, tools hosted on No-Issue.is automate the installation process for the customer, with unlisted YouTube Videos (1,2) providing the instructions.

XBMC Foundation President Nathan Betzen says that situation isn’t ideal.

“If that really is his repo it is disappointing to see that Gavora is charging a fee or outright preventing the sale of boxes with Kodi installed that do not include infringing add-ons, while at the same time he is distributing boxes himself that do include the infringing add-ons like this,” Betzen told TF.

While the legality of this type of service is yet to be properly tested in Canada and may yet emerge as entirely permissible under local law, Gavora himself previously described his business as operating in a gray area.

“If I could go back in time four years, I would’ve been more aggressive in the beginning because there was a lot of uncertainty being in a gray market business about how far I could push it,” he said.

“I really shouldn’t say it’s a gray market because everything I do is completely above board, I just felt it was more gray market so I was a bit scared,” he added.

But, legality aside (which will be determined in due course through various cases 1,2), the situation is still problematic when it comes to the Kodi trademark.

The official Kodi team indicate they don’t want to be associated with any kind of questionable addon or even tutorials for the same. Nevertheless, several of the addons installed by No-Issue (including PrimeWire, cCloud TV, Genesis, Icefilms, MoviesHD, MuchMovies and Navi-X, to name a few), are present on the Kodi team’s official ban list.

The fact remains, however, that Gavora successfully registered the trademark in Canada (one month later it was transferred to a brand new company at the same address), and Kodi now have no control over the situation in the country, short of a settlement or some kind of legal action.

Kodi matters aside, though, we get more insight into Gavora’s attitudes towards intellectual property after learning that he studied gemology and jewelry at school. He’s a long-standing member of jewelry discussion forum Ganoskin.com (his profile links to Gavora.com, a domain Gavora owns, as per information supplied by Amazon).

Things get particularly topical in a 2006 thread titled “When your work gets ripped“. The original poster asked how people feel when their jewelry work gets copied and Gavora made his opinions known.

“I think that what most people forget to remember is that when a piece from Tiffany’s or Cartier is ripped off or copied they don’t usually just copy the work, they will stamp it with their name as well,” Gavora said.

“This is, in fact, fraud and they are deceiving clients into believing they are purchasing genuine Tiffany’s or Cartier pieces. The client is in fact more interested in purchasing from an artist than they are the piece. Laying claim to designs (unless a symbol or name is involved) is outrageous.”

Unless that ‘design’ is called Kodi, of course, then it’s possible to claim it as your own through an administrative process and begin demanding licensing fees from the public. That being said, Gavora does seem to flip back and forth a little, later suggesting that being copied is sometimes ok.

“If someone copies your design and produces it under their own name, I think one should be honored and revel in the fact that your design is successful and has caused others to imitate it and grow from it,” he wrote.

“I look forward to the day I see one of my original designs copied, that is the day I will know my design is a success.”

From their public statements, this opinion isn’t shared by the Kodi team in respect of their product. Despite the Kodi name, software and logo being all their own work, they now find themselves having to claw back rights in Canada, in order to keep the product free in the region. For now, however, that seems like a difficult task.

TorrentFreak wrote to Gavora and asked him why he felt the need to register the Kodi trademark, but we received no response. That means we didn’t get the chance to ask him why he’s taking down Amazon listings for other people’s devices, or about something else that came up in the podcast.

“My biggest weakness, I guess, is that I’m too ethical about how I do my business,” he said, referring to how he deals with customers.

Only time will tell how that philosophy will affect Gavora’s attitudes to trademarks and people’s desire not to be charged for using free, open source software.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Australian Government Want ISPs to Adopt Anti-Piracy Code

Post Syndicated from Ernesto original https://torrentfreak.com/australian-government-want-isps-to-adopt-anti-piracy-code-170915/

Australia has been struggling to find an adequate response to online piracy for several years, but progress has been slow.

While pirate site blockades are in effect now, an earlier plan to implement a three-strikes anti-piracy regime failed.

Despite this setback, Australian legislators are still determined to tackle widespread copyright infringement. The most recent effort comes through an overhaul of the country’s copyright regulations, with a new proposal (pdf) to adopt a voluntary anti-piracy code.

The code would apply to carriage service providers, including Internet providers, to agree on a joint anti-piracy strategy. The voluntary code should be supported by “broad consensus” and include technical measures that are “used to protect and identify copyright material.”

The proposal further stresses that the anti-piracy measures should be “non-discriminatory.” They also shouldn’t impose “substantial costs” on the service providers or “substantial burdens on their systems or networks.”

The code proposal

The description of the code is quite broad can include a wide variety of measures, including a new iteration of the “strikes” scheme where copyright holders report copyright infringements. A website blocking agreement, which avoids costly court procedures, also belongs to the options.

An accompanying consultation paper published by the Government stresses that any monitoring measures to track infringements should not interfere with the technology used at the originating sites, ZDNet notes.

While the Government pushes copyright holders and ISPs to come to a voluntary agreement, the failed “three strikes” negotiations suggest that this will be easier said than done.

At the time, the Australasian Music Publishers Association (AMPAL) noted that merely warning users did not go far enough. Instead, they recommended a system where ISPs themselves would implement monitoring and filtering technology to stop piracy.

It appears, however, that extensive monitoring and filtering on the ISPs’ networks goes beyond the scope of the proposed regulations. After all, that would be quite costly and place a significant burden on the ISPs.

The proposed regulations are not limited to the anti-piracy code but also specify how Internet providers should process takedown notices, among other things.

Before any changes are implemented or negotiations begin, the Government is first inviting various stakeholders to share their views. While it doesn’t intend to change the main outline, the Government welcomes suggestions to simplify the current proposal where possible.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

timeShift(GrafanaBuzz, 1w) Issue 13

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/09/15/timeshiftgrafanabuzz-1w-issue-13/

It’s been a busy week here at Grafana Labs – Grafana 4.5 is now available! We’ve made a lot of enhancements and added new features in this release, so be sure and check out the release blog post to see the full changelog. The GrafanaCon EU CFP is officially open so please don’t forget to submit your topic. We’re looking for technical and non-technical talks of all sizes.


Latest Release

Grafana v4.5 is available for download. The new Grafana 4.5 release includes major improvements to the query editors for Prometheus, Elasticsearch and MySQL.
View the changelog.

Download Grafana 4.5 Now


From the Blogosphere

Percona Live Europe Featured Talks: Visualize Your Data with Grafana Featuring Daniel Lee: The folks from Percona sat down with Grafana Labs Software Developer Daniel Lee to discuss his upcoming talk at PerconaLive Europe 2017, Dublin, and how data can drive better decision making for your business. Get your tickets now, and use code: SeeMeSpeakPLE17 for 10% off!

Register Now

Performance monitoring with ELK / Grafana: This article walks you through setting up the ELK stack to monitor webpage load time, but switches out Kibana for Grafana so you can visualize data from other sources right next to this performance data.

ESXi Lab Series: Aaron created a video mini-series about implementing both offensive and defensive security in an ESXi Lab environment. Parts four and five focus on monitoring with Grafana, but you’ll probably want to start with one.

Raspberry Pi Monitoring with Grafana: We’ve been excited to see more and more articles about Grafana from Raspberry Pi users. This article helps you install and configure Grafana, and also touches on what monitoring is and why it’s important.


Grafana Plugins

This week we were busy putting the finishing touches on the new release, but we do have an update to the Gnocchi data source plugin to announce, and a new annotation plugin that works with any data source. Install or update plugins on an on-prem instance using the Grafana-cli, or with one click on Hosted Grafana.

NEW PLUGIN

Simple Annotations – Frustrated with using a data source that doesn’t support annotations? This is a simple annotation plugin for Grafana that works with any data source!

Install Now

UPDATED PLUGIN

Gnocchi Data Source – The latest release adds the reaggregation feature. Gnocchi can pre-compute the aggregation of timeseries (ex: aggregate the mean every 10 minute for 1 year). Then allows you to (re)aggregate timeseries, since stored timeseries have already been aggregated. A big shout out to sileht for adding new features to the Gnocchi plugin.

Update Now


GrafanaCon EU Call for Papers is Open

Have a big idea to share? A shorter talk or a demo you’d like to show off? We’re looking for technical and non-technical talks of all sizes.

I’d Like to Speak at GrafanaCon


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Awesome – really looking forward to seeing updates as you get to 1.0!

We Need Your Help

We’re conducting an experiment and need your help. Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about the experiment.

Be Part of the Experiment


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


What do you think?

We’re always interested in how we can improve our weekly roundups. Submit a comment on this article below, or post something at our community forum. Help us make these roundups better and better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

AWS Partner Webinar Series – September & October 2017

Post Syndicated from Sara Rodas original https://aws.amazon.com/blogs/aws/aws-partner-webinar-series-september-october-2017/

The wait is over. September and October’s Partner Webinars have officially arrived! In case you missed the intro last month, the AWS Partner Webinar Series is a selection of live and recorded presentations covering a broad range of topics at varying technical levels and scale. A little different from our AWS Online TechTalks, each AWS Partner Webinar is hosted by an AWS solutions architect and an AWS Competency Partner who has successfully helped customers evaluate and implement the tools, techniques, and technologies of AWS.

 

 

September & October Partner Webinars:

 

SAP Migration
Velocity: How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud
September 19, 2017 | 10:00 AM PDT

 

Mactores: SAP on AWS: How UCT is Experiencing Better Performance on AWS While Saving 60% in Infrastructure Costs with Mactores
September 19, 2017 | 1:00 PM PDT

 

Accenture: Reduce Operating Costs and Accelerate Efficiency by Migrating Your SAP Applications to AWS with Accenture
September 20, 2017 | 10:00 AM PDT

 

Capgemini: Accelerate your SAP HANA Migration with Capgemini & AWS FAST
September 21, 2017 | 10:00 AM PDT

 

Salesforce
Salesforce IoT: Monetize your IOT Investment with Salesforce and AWS
September 27, 2017 | 10:00 am PDT

 

Salesforce Heroku: Build Engaging Applications with Salesforce Heroku and AWS
October 10, 2017 | 10:00 AM PDT

 

Windows Migration
Cascadeo: How a National Transportation Software Provider Migrated a Mission-Critical Test Infrastructure to AWS with Cascadeo
September 26, 2017 | 10:00 AM PDT

 

Datapipe: Optimize App Performance and Security by Managing Microsoft Workloads on AWS with Datapipe
September 27, 2017 | 10:00 AM PDT

 

Datavail: Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions
September 28, 2017 | 10:00 AM PDT

 

Life Sciences

SAP, Deloitte & Turbot: Life Sciences Compliance on AWS
October 4, 2017 | 10:00 AM PDT

 

Healthcare

AWS, ClearData & Cloudticity: Healthcare Compliance on AWS 
October 5, 2017 | 10:00 AM PDT

 

Storage

N2WS: Learn How Goodwill Industries Ensures 24/7 Data Availability on AWS
October 10, 2017 | 8:00 AM PDT

 

Big Data

Zoomdata: Taking Complexity Out of Data Science with AWS and Zoomdata
October 10, 2017 | 10:00 AM PDT

 

Attunity: Cardinal Health: Moving Data to AWS in Real-Time with Attunity 
October 11, 2017 | 11:00 AM PDT

 

Splunk: How TrueCar Gains Actionable Insights with Splunk Cloud
October 18, 2017 | 9:00 AM PDT

Prime Day 2017 – Powered by AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/prime-day-2017-powered-by-aws/

The third annual Prime Day set another round of records for global orders, topping Black Friday and Cyber Monday, making it the biggest day in Amazon retail history. Over the course of the 30 hour event, tens of millions of Prime members purchased things like Echo Dots, Fire tablets, programmable pressure cookers, espresso machines, rechargeable batteries, and much more! July 11th also set a record for the number of new Prime memberships, as people signed up in order to take advantage of hundreds of thousands of deals. Amazon customers shopped online and made heavy use of the Amazon App, with mobile orders more than doubling from last Prime Day.

Powered by AWS
Last year I told you about How AWS Powered Amazon’s Biggest Day Ever, and shared what the team had learned with regard to preparation, automation, monitoring, and thinking big. All of those lessons still apply and you can read that post to learn more. Preparation for this year’s Prime Day (which started just days after Prime Day 2016 wrapped up) started by collecting and sharing best practices and identifying areas for improvement, proceeding to implementation and stress testing as the big day approached. Two of the best practices involve auditing and GameDay:

Auditing – This is a formal way for us to track preparations, identify risks, and to track progress against our objectives. Each team must respond to a series of detailed technical and operational questions that are designed to help them determine their readiness. On the technical side, questions could revolve around time to recovery after a database failure, including the all-important check of the TTL (time to live) for the CNAME. Operational questions address schedules for on-call personnel, points of contact, and ownership of services & instances.

GameDay – This practice (which I believe originated with former Amazonian Jesse Robbins), is intended to validate all of the capacity planning & preparation and to verify that all of the necessary operational practices are in place and work as expected. It introduces simulated failures and helps to train the team to identify and quickly resolve issues, building muscle memory in the process. It also tests failover and recovery capabilities, and can expose latent defects that are lurking under the covers. GameDays help teams to understand scaling drivers (page views, orders, and so forth) and gives them an opportunity to test their scaling practices. To learn more, read Resilience Engineering: Learning to Embrace Failure or watch the video: GameDay: Creating Resiliency Through Destruction.

Prime Day 2017 Metrics
So, how did we do this year?

The AWS teams checked their dashboards and log files, and were happy to share their metrics with me. Here are a few of the most interesting ones:

Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it.

NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat.

Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line.

API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day.

Configuration TrackingAWS Config generated over 14 million Configuration items for AWS resources.

You Can Do It
Running an event that is as large, complex, and mission-critical as Prime Day takes a lot of planning. If you have an event of this type in mind, please take a look at our new Infrastructure Event Readiness white paper. Inside, you will learn how to design and provision your applications to smoothly handle planned scaling events such as product launches or seasonal traffic spikes, with sections on automation, resiliency, cost optimization, event management, and more.

Jeff;

 

AWS Earns Department of Defense Impact Level 5 Provisional Authorization

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/aws-earns-department-of-defense-impact-level-5-provisional-authorization/

AWS GovCloud (US) Region image

The Defense Information Systems Agency (DISA) has granted the AWS GovCloud (US) Region an Impact Level 5 (IL5) Department of Defense (DoD) Cloud Computing Security Requirements Guide (CC SRG) Provisional Authorization (PA) for six core services. This means that AWS’s DoD customers and partners can now deploy workloads for Controlled Unclassified Information (CUI) exceeding IL4 and for unclassified National Security Systems (NSS).

We have supported sensitive Defense community workloads in the cloud for more than four years, and this latest IL5 authorization is complementary to our FedRAMP High Provisional Authorization that covers 18 services in the AWS GovCloud (US) Region. Our customers now have the flexibility to deploy any range of IL 2, 4, or 5 workloads by leveraging AWS’s services, attestations, and certifications. For example, when the US Air Force needed compute scale to support the Next Generation GPS Operational Control System Program, they turned to AWS.

In partnership with a certified Third Party Assessment Organization (3PAO), an independent validation was conducted to assess both our technical and nontechnical security controls to confirm that they meet the DoD’s stringent CC SRG standards for IL5 workloads. Effective immediately, customers can begin leveraging the IL5 authorization for the following six services in the AWS GovCloud (US) Region:

AWS has been a long-standing industry partner with DoD, federal-agency customers, and private-sector customers to enhance cloud security and policy. We continue to collaborate on the DoD CC SRG, Defense Acquisition Regulation Supplement (DFARS) and other government requirements to ensure that policy makers enact policies to support next-generation security capabilities.

In an effort to reduce the authorization burden of our DoD customers, we’ve worked with DISA to port our assessment results into an easily ingestible format by the Enterprise Mission Assurance Support Service (eMASS) system. Additionally, we undertook a separate effort to empower our industry partners and customers to efficiently solve their compliance, governance, and audit challenges by launching the AWS Customer Compliance Center, a portal providing a breadth of AWS-specific compliance and regulatory information.

We look forward to providing sustained cloud security and compliance support at scale for our DoD customers and adding additional services within the IL5 authorization boundary. See AWS Services in Scope by Compliance Program for updates. To request access to AWS’s DoD security and authorization documentation, contact AWS Sales and Business Development. For a list of frequently asked questions related to AWS DoD SRG compliance, see the AWS DoD SRG page.

To learn more about the announcement in this post, tune in for the AWS Automating DoD SRG Impact Level 5 Compliance in AWS GovCloud (US) webinar on October 11, 2017, at 11:00 A.M. Pacific Time.

– Chris Gile, Senior Manager, AWS Public Sector Risk & Compliance

 

 

KinoX / Movie4K Admin Detained in Kosovo After Three-Year Manhunt

Post Syndicated from Andy original https://torrentfreak.com/kinox-movie4k-admin-detained-in-kosovo-after-three-year-manhunt-170912/

In June 2011, police across Europe carried out the largest anti-piracy operation the region had ever seen. Their target was massive streaming portal Kino.to and several affiliates with links to Spain, France and the Netherlands.

With many sites demonstrating phoenix-like abilities these days, it didn’t take long for a replacement to appear.

Replacement platform KinoX soon attracted a large fanbase and with that almost immediate attention from the authorities. In October 2014, Germany-based investigators acting on behalf of the Attorney General carried out raids in several regions of the country looking for four main suspects.

One raid, focused on a village near to the northern city of Lübeck, targeted two brothers, then aged 21 and 25-years-old. The pair, who were said to have lived with their parents, were claimed to be the main operators of Kinox.to and another large streaming site, Movie4K.to. Although two other men were arrested elsewhere in Germany, the brothers couldn’t be found.

This was to be no ordinary manhunt by the police. In addition to accusing the brothers of copyright infringement and tax evasion, authorities indicated they were wanted for fraud, extortion, and arson too. The suggestion was that they’d targeted a vehicle owned by a pirate competitor, causing it to “burst into flames”.

The brothers were later named as Kastriot and Kreshnik Selimi. Born in 1992, 21-year-old Kreshnik was born in Sweden. 25-year-old Kastriot was born in Kosovo in 1989 and along with his brother, later became a German citizen.

With authorities piling on the charges, the pair were accused of being behind not only KinoX and Movie4K, but also other hosting and sharing platforms including BitShare, Stream4k.to, Shared.sx, Mygully.com and Boerse.sx.

Now, almost three years later, German police are one step closer to getting their men. According to a Handelsblatt report via Tarnkappe, Kreshnik Selimi has been detained by authorities.

The now 24-year-old suspect reportedly handed himself to the German embassy located in the capital of Kosovo, Prestina. The location of the arrest isn’t really a surprise. Older brother Kastriot previously published a picture on Instagram which appeared to show a ticket in his name destined for Kosovo from Zurich in Switzerland.

But while Kreshnik’s arrest reportedly took place in July, there’s still no news of Kastriot. The older brother is still on the run, maybe in Kosovo, or by now, potentially anywhere else in the world.

While his whereabouts remain a mystery, the other puzzle faced by German authorities is the status of the two main sites the brothers were said to maintain.

Despite all the drama and unprecedented allegations of violence and other serious offenses, both Movie4K and KinoX remain stubbornly online, apparently oblivious to the action.

There have been consequences for people connected to the latter, however.

In December 2015, Arvit O (aka “Pedro”) who handled technical issues on KinoX, was sentenced to 40 months in prison for his involvement in the site.

Arvit O, who made a partial confession, was found guilty of copyright infringement by the District Court of Leipzig. The then 29-year-old admitted to infringing 2,889 works. The Court also found that he hacked the computers of two competitors in order to improve Kinox’s market share.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Disabling Intel Hyper-Threading Technology on Amazon EC2 Windows Instances

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-ec2-windows-instances/

In a prior post, Disabling Intel Hyper-Threading on Amazon Linux, I investigated how the Linux kernel enumerates CPUs. I also discussed the options to disable Intel Hyper-Threading (HT Technology) in Amazon Linux running on Amazon EC2.

In this post, I do the same for Microsoft Windows Server 2016 running on EC2 instances. I begin with a quick review of HT Technology and the reasons you might want to disable it. I also recommend that you take a moment to review the prior post for a more thorough foundation.

HT Technology

HT Technology makes a single physical processor appear as multiple logical processors. Each core in an Intel Xeon processor has two threads of execution. Most of the time, these threads can progress independently; one thread executing while the other is waiting on a relatively slow operation (for example, reading from memory) to occur. However, the two threads do share resources and occasionally one thread is forced to wait while the other is executing.

There a few unique situations where disabling HT Technology can improve performance. One example is high performance computing (HPC) workloads that rely heavily on floating point operations. In these rare cases, it can be advantageous to disable HT Technology. However, these cases are rare, and for the overwhelming majority of workloads you should leave it enabled. I recommend that you test with and without HT Technology enabled, and only disable threads if you are sure it will improve performance.

Exploring HT Technology on Microsoft Windows

Here’s how Microsoft Windows enumerates CPUs. As before, I am running these examples on an m4.2xlarge. I also chose to run Windows Server 2016, but you can walk through these exercises on any version of Windows. Remember that the m4.2xlarge has eight vCPUs, and each vCPU is a thread of an Intel Xeon core. Therefore, the m4.2xlarge has four cores, each of which run two threads, resulting in eight vCPUs.

Windows does not have a built-in utility to examine CPU configuration, but you can download the Sysinternals coreinfo utility from Microsoft’s website. This utility provides useful information about the system CPU and memory topology. For this walkthrough, you enumerate the individual CPUs, which you can do by running coreinfo -c. For example:

C:\Users\Administrator >coreinfo -c

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
**------ Physical Processor 0 (Hyperthreaded)
--**---- Physical Processor 1 (Hyperthreaded)
----**-- Physical Processor 2 (Hyperthreaded)
------** Physical Processor 3 (Hyperthreaded)

As you can see from the screenshot, the coreinfo utility displays a table where each row is a physical core and each column is a logical CPU. In other words, the two asterisks on the first line indicate that CPU 0 and CPU 1 are the two threads in the first physical core. Therefore, my m4.2xlarge has for four physical processors and each processor has two threads resulting in eight total CPUs, just as expected.

It is interesting to note that Windows Server 2016 enumerates CPUs in a different order than Linux. Remember from the prior post that Linux enumerated the first thread in each core, followed by the second thread in each core. You can see from the output earlier that Windows Server 2016, enumerates both threads in the first core, then both threads in the second core, and so on. The diagram below shows the relationship of CPUs to cores and threads in both operating systems.

In the Linux post, I disabled CPUs 4–6, leaving one thread per core, and effectively disabling HT Technology. You can see from the diagram that you must disable the odd-numbered threads (that is, 1, 3, 5, and 7) to achieve the same result in Windows. Here’s how to do that.

Disabling HT Technology on Microsoft Windows

In Linux, you can globally disable CPUs dynamically. In Windows, there is no direct equivalent that I could find, but there are a few alternatives.

First, you can disable CPUs using the msconfig.exe tool. If you choose Boot, Advanced Options, you have the option to set the number of processors. In the example below, I limit my m4.2xlarge to four CPUs. Restart for this change to take effect.

Unfortunately, Windows does not disable hyperthreaded CPUs first and then real cores, as Linux does. As you can see in the following output, coreinfo reports that my c4.2xlarge has two real cores and four hyperthreads, after rebooting. Msconfig.exe is useful for disabling cores, but it does not allow you to disable HT Technology.

Note: If you have been following along, you can re-enable all your CPUs by unselecting the Number of processors check box and rebooting your system.

 

C:\Users\Administrator >coreinfo -c

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
**-- Physical Processor 0 (Hyperthreaded)
--** Physical Processor 1 (Hyperthreaded)

While you cannot disable HT Technology systemwide, Windows does allow you to associate a particular process with one or more CPUs. Microsoft calls this, “processor affinity”. To see an example, use the following steps.

  1. Launch an instance of Notepad.
  2. Open Windows Task Manager and choose Processes.
  3. Open the context (right click) menu on notepad.exe and choose Set Affinity….

This brings up the Processor Affinity dialog box.

As you can see, all the CPUs are allowed to run this instance of notepad.exe. You can uncheck a few CPUs to exclude them. Windows is smart enough to allow any scheduled operations to continue to completion on disabled CPUs. It then saves its state at the next scheduling event, and resumes those operations on another CPU. To ensure that only one thread in each core is able to run a process, you uncheck every other core. This effectively disables HT Technology for this process. For example:

Of course, this can be tedious when you have a large number of cores. Remember that the x1.32xlarge has 128 CPUs. Luckily, you can set the affinity of a running process from PowerShell using the Get-Process cmdlet. For example:

PS C:\> (Get-Process -Name 'notepad').ProcessorAffinity = 0x55;

The ProcessorAffinity attribute takes a bitmask in hexadecimal format. 0x55 in hex is equivalent to 01010101 in binary. Think of the binary encoding as 1=enabled and 0=disabled. This is slightly confusing, but we work left to right so that CPU 0 is the rightmost bit and CPU 7 is the leftmost bit. Therefore, 01010101 means that the first thread in each CPU is enabled just as it was in the diagram earlier.

The calculator built into Windows includes a “programmer view” that helps you convert from hexadecimal to binary. In addition, the ProcessorAffinity attribute is a 64-bit number. Therefore, you can only configure the processor affinity on systems up to 64 CPUs. At the moment, only the x1.32xlarge has more than 64 vCPUs.

In the preceding examples, you changed the processor affinity of a running process. Sometimes, you want to start a process with the affinity already configured. You can do this using the start command. The start command includes an affinity flag that takes a hexadecimal number like the PowerShell example earlier.

C:\Users\Administrator>start /affinity 55 notepad.exe

It is interesting to note that a child process inherits the affinity from its parent. For example, the following commands create a batch file that launches Notepad, and starts the batch file with the affinity set. If you examine the instance of Notepad launched by the batch file, you see that the affinity has been applied to as well.

C:\Users\Administrator>echo notepad.exe > test.bat
C:\Users\Administrator>start /affinity 55 test.bat

This means that you can set the affinity of your task scheduler and any tasks that the scheduler starts inherits the affinity. So, you can disable every other thread when you launch the scheduler and effectively disable HT Technology for all of the tasks as well. Be sure to test this point, however, as some schedulers override the normal inheritance behavior and explicitly set processor affinity when starting a child process.

Conclusion

While the Windows operating system does not allow you to disable logical CPUs, you can set processor affinity on individual processes. You also learned that Windows Server 2016 enumerates CPUs in a different order than Linux. Therefore, you can effectively disable HT Technology by restricting a process to every other CPU. Finally, you learned how to set affinity of both new and running processes using Task Manager, PowerShell, and the start command.

Note: this technical approach has nothing to do with control over software licensing, or licensing rights, which are sometimes linked to the number of “CPUs” or “cores.” For licensing purposes, those are legal terms, not technical terms. This post did not cover anything about software licensing or licensing rights.

If you have questions or suggestions, please comment below.

Pirate Sites and the Dying Art of Customer Service

Post Syndicated from Andy original https://torrentfreak.com/pirate-sites-and-the-dying-art-of-customer-service-170803/

Consumers of products and services in the West are now more educated than ever before. They often research before making a purchase and view follow-up assistance as part of the package. Indeed, many companies live and die on the levels of customer support they’re able to offer.

In this ultra-competitive world, we send faulty technology items straight back to the store, cancel our unreliable phone providers, and switch to new suppliers for the sake of a few dollars, pounds or euros per month. But does this demanding environment translate to the ‘pirate’ world?

It’s important to remember that when the first waves of unauthorized platforms appeared after the turn of the century, content on the Internet was firmly established as being ‘free’. When people first fired up KaZaA, LimeWire, or the few fledgling BitTorrent portals, few could believe their luck. Nevertheless, the fact that there was no charge for content was quickly accepted as the standard.

That’s a position that continues today but for reasons that are not entirely clear, some users of pirate sites treat the availability of such platforms as some kind of right, holding them to the same standards of service that they would their ISP, for example.

One only has to trawl the comments section on The Pirate Bay to see hundreds of examples of people criticizing the quality of uploaded movies, the fact that a software crack doesn’t work, or that some anonymous uploader failed to deliver the latest album quickly enough. That’s aside from the continual complaints screamed on various external platforms which bemoan the site’s downtime record.

For people who recall the sheer joy of finding a working Suprnova mirror for a few minutes almost 15 years ago, this attitude is somewhat baffling. Back then, people didn’t go ballistic when a site went down, they savored the moment when enthusiastic volunteers brought it back up. There was a level of gratefulness that appears somewhat absent today, in a new world where free torrent and streaming sites are suddenly held to the same standards as Comcast or McDonalds.

But while a cultural change among users has definitely taken place over the years, the way sites communicate with their users has taken a hit too. Despite the advent of platforms including Twitter and Facebook, the majority of pirate site operators today have a tendency to leave their users completely in the dark when things go wrong, leading to speculation and concern among grateful and entitled users alike.

So why does The Pirate Bay’s blog stay completely unattended these days? Why do countless sites let dust gather on Twitter accounts that last made an announcement in 2012? And why don’t site operators announce scheduled downtime in advance or let people know what’s going on when the unexpected happens?

“Honestly? I don’t have the time anymore. I also care less than I did,” one site operator told TF.

“11 years of doing this shit is enough to grind anybody down. It’s something I need to do but not doing it makes no difference either. People complain in any case. Then if you start [informing people] again they’ll want it always. Not happening.”

Rather less complimentary was the operator of a large public site. He told us that two decades ago relationships between operators and users were good but have been getting worse ever since.

“Users of pirate content 20 years ago were highly technical. 10 years ago they were somewhat technical. Right now they are fucking watermelon head puppets. They are plain stupid,” he said.

“Pirate sites don’t have customers. They have users. The definition of a customer, when related to the web, is a person that actually buys a service. Since pirates sites don’t sell services (I’m talking about public ones) they have no customers.”

Another site operator told us that his motivations for not interacting with users are based on the changing legal environment, which has become steadily and markedly worse, year upon year.

“I’m not enjoying being open like before. I used to chat keenly with the users, on the site and IRC [Internet Relay Chat] but i’m keeping my distance since a long time ago,” he told us.

“There have always been risks but now I lock everything down. I’m not using Facebook in any way personally or for the site and I don’t need the dramas of Twitter. Everytime you engage on there, problems arise with people wanting a piece of you. Some of the staff use it but I advise the contrary where possible.”

Interested in where the boundaries lie, we asked a couple of sites whether they should be doing more to keep users informed and if that should be considered a ‘customer service’ obligation these days.

“This is not Netflix and i’m not the ‘have a nice day’ guy from McDonalds,” one explained.

“If people want Netflix help then go to Netflix. There’s two of us here doing everything and I mean everything. We’re already in a pinch so spending time to answer every retarded question from kids is right out.”

Our large public site operator agreed, noting that users complain about the most crazy things, including why they don’t have enough space on a drive to download, why a movie that’s out in 2020 hasn’t been uploaded yet, and why can’t they login – when they haven’t even opened an account yet.

While the responses aren’t really a surprise given the ‘free’ nature of the sites and the volume of visitors, things don’t get any better when moving up (we use the term loosely) to paid ‘pirate’ services.

Last week, one streaming platform in particular had an absolute nightmare with what appeared to be technical issues. Nevertheless, some of its users, despite only paying a few pounds per month, demanded their pound of flesh from the struggling service.

One, who raised the topic on Reddit, was advised to ask for his money back for the trouble caused. It raised a couple of eyebrows.

“Put in a ticket and ask [for a refund], morally they should,” the user said.

The use of the word “morally” didn’t sit well with some observers, one of which couldn’t understand how the word could possibly be mentioned in the context of a pirate paying another pirate money, for a pirate service that had broken down.

“Wait let me get this straight,” the critic said. “You want a refund for a gray market service. It’s like buying drugs off the corner only to find out it’s parsley. Do you go back to the dealer and demand a refund? You live and you learn bud. [Shaking my head] at people in here talking about it being morally responsible…too funny.”

It’s not clear when pirate sites started being held to the same standards as regular commercial entities but from anecdotal evidence at least, the problem appears to be getting worse. That being said and from what we’ve heard, users can stop holding their breath waiting for deluxe customer service – it’s not coming anytime soon.

“There’s no way to monetize support,” one admin concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Implement Continuous Integration and Delivery of Apache Spark Applications using AWS

Post Syndicated from Luis Caro Perez original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/

When you develop Apache Spark–based applications, you might face some additional challenges when dealing with continuous integration and deployment pipelines, such as the following common issues:

  • Applications must be tested on real clusters using automation tools (live test)
  • Any user or developer must be able to easily deploy and use different versions of both the application and infrastructure to be able to debug, experiment on, and test different functionality.
  • Infrastructure needs to be evaluated and tested along with the application that uses it.

In this post, we walk you through a solution that implements a continuous integration and deployment pipeline supported by AWS services. The pipeline offers the following workflow:

  • Deploy the application to a QA stage after a commit is performed to the source code.
  • Perform a unit test using Spark local mode.
  • Deploy to a dynamically provisioned Amazon EMR cluster and test the Spark application on it
  • Update the application as an AWS Service Catalog product version, allowing a user to deploy any version (commit) of the application on demand.

Solution overview

The following diagram shows the pipeline workflow.

The solution uses AWS CodePipeline, which allows users to orchestrate and automate the build, test, and deploy stages for application source code. The solution consists of a pipeline that contains the following stages:

  • Source: Both the Spark application source code in addition to the AWS CloudFormation template file for deploying the application are committed to version control. In this example, we use AWS CodeCommit. For an example of the application source code, see zip. 
  • Build: In this stage, you use Apache Maven both to generate the application .jar binaries and to execute all of the application unit tests that end with *Spec.scala. In this example, we use AWS CodeBuild, which runs the unit tests given that they are designed to use Spark local mode.
  • QADeploy: In this stage, the .jar file built previously is deployed using the CloudFormation template included with the application source code. All the resources are created in this stage, such as networks, EMR clusters, and so on. 
  • LiveTest: In this stage, you use Apache Maven to execute all the application tests that end with *SpecLive.scala. The tests submit EMR steps to the cluster created as part of the QADeploy step. The tests verify that the steps ran successfully and their results. 
  • LiveTestApproval: This stage is included in case a pipeline administrator approval is required to deploy the application to the next stages. The pipeline pauses in this stage until an administrator manually approves the release. 
  • QACleanup: In this stage, you use an AWS Lambda function to delete the CloudFormation template deployed as part of the QADeploy stage. The function does not affect any resources other than those deployed as part of the QADeploy stage. 
  • DeployProduct: In this stage, you use a Lambda function that creates or updates an AWS Service Catalog product and portfolio. Every time the pipeline releases a change to the application, the AWS Service Catalog product gets a new version, with the commit of the change as the version description. 

Try it out!

Use the provided sample template to get started using this solution. This template creates the pipeline described earlier with all of its stages. It performs an initial commit of the sample Spark application in order to trigger the first release change. To deploy the template, use the following AWS CLI command:

aws cloudformation create-stack  --template-url https://s3.amazonaws.com/aws-bigdata-blog/artifacts/sparkAppDemoForPipeline/emrSparkpipeline.yaml --stack-name emr-spark-pipeline --capabilities CAPABILITY_NAMED_IAM

After the template finishes creating resources, you see the pipeline name on the stack Outputs tab. After that, open the AWS CodePipeline console and select the newly created pipeline.

After a couple of minutes, AWS CodePipeline detects the initial commit applied by the CloudFormation stack and starts the first release.

You can watch how the pipeline goes through the Build, QADeploy, and LiveTest stages until it finally reaches the LiveTestApproval stage.

At this point, you can check the results of the test in the log files of the Build and LiveTest stage jobs on AWS CodeBuild. If you check the CloudFormation console, you see that a new template has been deployed as part of the QADeploy stage.

You can also visit the EMR console and view how the LiveTest stage submitted steps to the EMR cluster.

After performing the review, manually approve the revision on the LiveTestApproval stage by using the AWS CodePipeline console.

After the revision is approved, the pipeline proceeds to use a Lambda function that destroys the resources deployed on the QAdeploy stage. Finally, it creates or updates a product and portfolio in AWS Service Catalog. After the final stage of the pipeline is complete, you can check that the product is created successfully on the AWS Service Catalog console.

You can check the product versions and notice that the first version is the initial commit performed by the CloudFormation template.

You can proceed to share the created portfolio with any users in your AWS account and allow them to deploy any version of the Spark application. You can also perform a commit on the AWS CodeCommit repository. The pipeline is triggered automatically and repeats the pipeline execution to deploy a new version of the product.

To destroy all of the resources created by the stack, make sure all the deployed stacks using AWS Service Catalog or the QAdeploy stage are destroyed. Then, destroy the pipeline template using the following AWS CLI command:

 

aws cloudformation delete-stack --stack-name emr-spark-pipeline

Conclusion

You can use the sample template and Spark application shared in this post and adapt them for the specific needs of your own application. The pipeline can have as many stages as needed and it can be used to automatically deploy to AWS Service Catalog or a production environment using CloudFormation.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to implement authorization and auditing on Amazon EMR using Apache Ranger.

 


About the Authors

Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

Samuel Schmidt is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

Torrent Sites Suffer DDoS Attacks and Other Trouble

Post Syndicated from Ernesto original https://torrentfreak.com/torrent-sites-suffer-ddos-attacks-and-other-trouble-170901/

It’s not uncommon for torrent sites to suffer downtime due to technical issues. That happens pretty much every day.

But when close to a dozen large sites go offline, people start to ask questions.

This is exactly what happened this week. As reported previously, The Pirate Bay was hard to reach earlier, after a surge of traffic and a subsequent DDoS attack overloaded its servers. And they were not alone.

TorrentFreak spoke to several torrent site admins who noticed an increase of suspicious traffic which slowed down or toppled their sites, at least temporarily. While most have recovered, some sites remain offline today.

TorrentProject.se, one of the most used torrent search engines, has been down for nearly three days now. The site currently shows a “403 Forbidden” error message. Whether this is a harmless technical issue, the result of a DDoS attack, or worse, is unknown.

TorrentFreak reached out to the owner of the site but we have yet to hear back.

403 error

Another site that appears to be in trouble is WorldWideTorrents. This site, which was started after the KAT shutdown last year, is a home to many comic book fans. However, over the past few days the site has become unresponsive.

Based on WHOIS data, the site’s domain name has been suspended. The name servers were changed to “suspended-domain.com,” which means that it’s unlikely to be reinstated. WorldWideTorrents will reportedly return with a new domain but which one is currently unknown.

Popular uploaders on the site such as Nemesis43, meanwhile, are still active on other sites.

WorldWireTorrents Whois

Then there’s also Isohunt.to, which has been unresponsive for over a week. The search engine, which launched in 2013 less than two weeks after isoHunt.com shut down, has now vanished itself.

With no word from the operators, we can only speculate what happened. The site has seen a sharp decline in traffic over the past year, so it could be that they simply lost interest.

Isohunt.to is not responding

Those who now search for IsoHunt on Google are instead pointed to isohunts.to, which is a scam site advising users to download a “binary client,” which is little more than an ad.

The above shows that the torrent ecosystem remains vulnerable. DDoS attacks and domain issues are nothing new, but after the shutdown of KAT, Torrentz, Extratorrent, and other giants, the remaining sites have to carry a larger burden.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Hard Drive Stats for Q2 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-stats-q2-2017/

Backblaze Drive Stats Q2 2017

In this update, we’ll review the Q2 2017 and lifetime hard drive failure rates for all our current drive models. We also look at how our drive migration strategy is changing the drives we use and we’ll check in on our enterprise class drives to see how they are doing. Along the way we’ll share our observations and insights and as always we welcome your comments and critiques.

Since our last report for Q1 2017, we have added 635 additional hard drives to bring us to the 83,151 drives we’ll focus on. In Q1 we added over 10,000 new drives to the mix, so adding just 635 in Q2 seems “odd.” In fact, we added 4,921 new drives and retired 4,286 old drives as we migrated from lower density drives to higher density drives. We cover more about migrations later on, but first let’s look at the Q2 quarterly stats.

Hard Drive Stats for Q2 2017

We’ll begin our review by looking at the statistics for the period of April 1, 2017 through June 30, 2017 (Q2 2017). This table includes 17 different 3 ½” drive models that were operational during the indicated period, ranging in size from 3 to 8 TB.

Quarterly Hard Drive Failure Rates for Q2 2017

When looking at the quarterly numbers, remember to look for those drives with at least 50,000 drive hours for the quarter. That works out to about 550 drives running the entire quarter. That’s a good sample size. If the sample size is below that, the failure rates can be skewed based on a small change in the number of drive failures.

As noted previously, we use the quarterly numbers to look for trends. So this time we’ve included a trend indicator in the table. The “Q2Q Trend” column is short for quarter-to-quarter trend, i.e. last quarter to this quarter. We can add, change, or delete trend columns depending on community interest. Let us know what you think in the comments.

Good Migrations

In Q2 we continued with our data migration program. For us, a drive migration means we intentionally remove a good drive from service and replace it with another drive. Drives that are removed via migrations are not counted as failed. Once they are removed they stop accumulating drive hours and other stats in our system.

There are three primary drivers for our migration program.

  1. Increase Storage Density – For example, in Q3 we replaced 3 TB drives with 8 TB drives, more than doubling the amount of storage in a given Storage Pod for the same footprint. The cost of electricity was nominally more with the 8 TB drives, but the increase in density more than offset the additional cost. For those interested you can read more about the cost of cloud storage here.
  2. Backblaze Vaults – Our Vault architecture has proven to be more cost effective over the past two years than using stand-alone Storage Pods. A major goal of the migration program is to have the entire Backblaze cloud deployed on the highly efficient and resilient Backblaze Vault architecture.
  3. Balancing the Load – With our Phoenix data center online and accepting data, we have migrated some systems to the Phoenix DC. Don’t worry, we didn’t put your data on a truck and drive it to Phoenix. We simply built new systems there and transferred the data from our Northern California DC. In the process, we are gaining valuable insights as we move towards being able to replicate data between the two data centers.
During Q2 we migrated nearly 30 Petabytes of data.

During Q2 we migrated the data on 155 systems, giving nearly 30 petabytes of data a new, more durable, place to call home. There are still 644 individual Storage Pods (Storage Pod Classics, as we call them) left to migrate to the Backblaze Vault architecture.

Just in case you don’t know, a Backblaze Vault is a logical collection of 20 beefy Storage Pods (not Classics). Using our own Reed-Solomon erasure coding library, data is spread out across the 20 Pods into 17 data shards and 3 parity shards. The data and parity shards of each arriving data blob can be stored on different Storage Pods in a given Backblaze Vault.

Lifetime Hard Drive Failure Rates for Current Drives

The table below shows the failure rates for the hard drive models we had in service as of June 30, 2017. This is over the period beginning in April 2013 and ending June 30, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

Cumulative Hard Drive Failure Rates

Enterprise vs Consumer Drives

We added 3,595 enterprise class 8 TB drives in Q2 bringing our total to 6,054 drives. You may be tempted to compare the failure rates of the 8 TB enterprise drive (model: ST8000NM005) to the consumer 8 TB drive (model: ST8000DM002), and conclude the enterprise drives fail at a higher rate. Let’s not jump to that conclusion yet, as the average operational age of the enterprise drives is only 2.11 months.

There are some insights we can gain from the current data. The enterprise drives have 363,282 drives hours and an annualized failure rate of 1.61%. If we look back at our data, we find that as of Q3 2016, the 8 TB consumer drives had 422,263 drive hours with an annualized failure rate of 1.60%. That means that when both drive models had a similar number of drive hours, they had nearly the same annualized failure rate. There are no conclusions to be made here, but the observation is worth considering as we gather data for our comparison.

Next quarter, we should have enough data to compare the 8 TB drives, but by then the 8TB drives could be “antiques.” In the next week or so, we’ll be installing 12 TB hard drives in a Backblaze Vault. Each 60-drive Storage Pod in the Vault would have 720 TB of storage available and a 20-pod Backblaze Vault would have 14.4 petabytes of raw storage.

Better Late Than Never

Sorry for being a bit late with the hard drive stats report this quarter. We were ready to go last week, then this happened. Some folks here thought that was more important than our Q2 Hard Drive Stats. Go figure.

Drive Stats at the Storage Developers Conference

We will be presenting at the Storage Developers Conference in Santa Clara on Monday September 11th at 8:30am. We’ll be reviewing our drive stats along with some interesting observations from the SMART stats we also collect. The conference is the leading event for technical discussions and education on the latest storage technologies and standards. Come join us.

The Data For This Review

If you are interested in the data from the two tables in this review, you can download an Excel spreadsheet containing the two tables. Note: the domain for this download will be f001.backblazeb2.com.

You also can download the entire data set we use for these reports from our Hard Drive Test Data page. You can download and use this data for free for your own purposes. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

Good luck, and let us know if you find anything interesting.

The post Hard Drive Stats for Q2 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

VMware Cloud on AWS – Now Available

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/vmware-cloud-on-aws-now-available/

Last year I told you about the work that we are doing with our friends at VMware to build the VMware Cloud on AWS. As I shared at the time, this is a native, fully-managed offering that runs the VMware SDDC stack directly on bare-metal AWS infrastructure that maintains the elasticity and security customers have come to expect. This allows you to benefit from the scalability and resiliency of AWS, along with the networking and system-level hardware features that are fundamental parts of our security-first architecture.

VMware Cloud on AWS allows you take advantage of what you already know and own. Your existing skills, your investment in training, your operational practices, and your investment in software licenses remain relevant and applicable when you move to the public cloud. As part of that move you can forget about building & running data centers, modernizing hardware, and scaling to meet transient or short-term demand. You can also take advantage of a long list of AWS compute, database, analytics, IoT, AI, security, mobile, deployment and application services.

Initial Availability
After incorporating feedback from many customers and partners in our Early Access beta program, today at VMworld, VMware and Amazon announced the initial availability of VMware Cloud on AWS. This service is initially available in the US West (Oregon) region through VMware and members of the VMware Partner Network. It is designed to support popular use cases such as data center extension, application development & testing, and application migration.

This offering is sold, delivered, supported, and billed by VMware. It supports custom-sized VMs, runs any OS that is supported by VMware, and makes use of single-tenant bare-metal AWS infrastructure so that you can bring your Windows Server licenses to the cloud. Each SDDC (Software-Defined Data Center) consists of 4 to 16 instances, each with 36 cores, 512 GB of memory, and 15.2 TB of NVMe storage. Clusters currently run in a single AWS Availability Zone (AZ) with support in the works for clusters that span AZs. You can spin up an entire VMware SDDC in a couple of hours, and scale host capacity up and down in minutes.

The NSX networking platform (powered by the AWS Elastic Networking Adapter running at up to 25 Gbps) supports multicast traffic, separate networks for management and compute, and IPSec VPN tunnels to on-premises firewalls, routers, and so forth.

Here’s an overview to show you how all of the parts fit together:

The VMware and third-party management tools (vCenter Server, PowerCLI, the vRealize Suite, and code that calls the vSphere API) that you use today will work just fine when you build a hybrid VMware environment that combines your existing on-premises resources and those that you launch in AWS. This hybrid environment will use a new VMware Hybrid Linked Mode to create a single, unified view of your on-premises and cloud resources. You can use familiar VMware tools to manage your applications, without having to purchase any new or custom hardware, rewrite applications, or modify your operating model.

Your applications and your code can access the full range of AWS services (the database, analytical, and AI services are a good place to start). Use for these services is billed separately and you’ll need to create an AWS account.

Learn More at VMworld
If you are attending VMworld in Las Vegas, please be sure to check out some of the 90+ AWS sessions:

Also, be sure to stop by booth #300 and say hello to my colleagues from the AWS team.

In the Works
Our teams have come a long way since last year, but things are just getting revved up!

VMware and AWS are continuing to invest to enable support for new capabilities and use cases, such as application migration, data center expansion, and application test and development. Work is under way to add additional AWS regions, support more use cases such as disaster recovery and data center consolidation, add certifications, and enable even deeper integration with AWS services.

Jeff;

 

Live Mayweather v McGregor Streams Will Thrive On Torrents Tonight

Post Syndicated from Andy original https://torrentfreak.com/live-mayweather-v-mcgregor-streams-will-thrive-on-torrents-tonight-170826/

Tonight, August 26, at the T-Mobile Arena in Las Vegas, Floyd Mayweather Jr. will finally meet UFC lightweight champion Conor McGregor in what is being billed as the biggest fight in boxing history.

Although tickets for inside the arena are still available for those with a lot of money to burn, most fans will be viewing on a screen of some kind, whether that’s in a cinema, sports bar, or at home in front of a TV.

The fight will be available on Showtime in the United States but the promoters also say they’ve done their best to make it accessible to millions of people in dozens of countries, with varying price tags dependent on region. Nevertheless, due to generally high prices, it’s likely that untold thousands around the world will attempt to watch the fight without paying.

That will definitely be possible. Although Showtime has won a pre-emptive injunction to stop some sites offering the fight, many hundreds of others are likely to fill in the gaps, offering generally lower-quality streams to the eager masses. Whether all of these sites will be able to cope with what could be unprecedented demand will remain to be seen, but there is one method that will thrive under the pressure.

Torrent technology is best known for offering content after it’s aired, whether that’s the latest episode of Game of Thrones or indeed a recording of the big fight scheduled for the weekend. However, what most ‘point-and-click’ file-sharers won’t know is that there’s a torrent-based technology that offers live sporting events week in, week out.

Without going into too many technical details, AceStream / Ace Player HD is a torrent engine built into the ever-popular VLC media player. It’s available on Windows, Android and Linux, costs nothing to install, and is incredibly easy to use.

Where regular torrent clients handle both .torrent files and magnet links, AceStream relies on an AceStream Content ID to find streams to play instead. This ID is a hash value (similar to one seen in magnet links, but prefaced with ‘acestream://’) which relates to the stream users want to view.

Once found, these can be copied to the user’s clipboard and pasted into the ‘Open Ace Stream Content ID’ section of the player’s file menu. Click ‘play’ and it’s done – it really is that simple.

AceStream is simplicity itself

Of course, any kind of content – both authorized and unauthorized – can be streamed and shared using AceStream and there are hundreds of live channels available, some in very high quality, 24/7. Inevitably, however, there’s quite an emphasis on premium content from sports broadcasters around the world, with fresh links to content shared on a daily basis.

The screenshot below shows a typical AceStream Content ID indexing site, with channels on the left, AceStream Content IDs in the center, plus language and then stream speed on the far right. (Note: TF has redacted the links since many will still be live at time of publication)

A typical AceSteam Content ID listing

While streams of most major TV channels are relatively easy to find, specialist channels showing PPV events are a little bit more difficult to discover. For those who know where to look, however, the big fight will be only a cut-and-paste away and in much better quality than that found on most web-based streaming portals.

All that being said, for torrent enthusiasts the magic lies in the ability of the technology to adapt to surging demand. While websites and streams wilt under the load Saturday night, it’s likely that AceStream streams will thrive under the pressure, with viewers (downloaders/streamers) also becoming distributors (uploaders) to others watching the event unfold.

With this in mind, it’s worth noting that while AceStream is efficient and resilient, using it to watch infringing content is illegal in most regions, since simultaneous uploading also takes place. Still, that’s unlikely to frighten away enthusiasts, who will already be aware of the risks and behind a VPN.

Ace Streams do have an Achilles heel though. Unlike a regular torrent swarm, where the initial seeder can disappear once a full copy of the movie or TV show is distributed around other peers, AceStreams are completely reliant on the initial stream seeder at all times. If he or she disappears, the live stream dies and it is all over. For this reason, people looking to stream often have a couple of extra stream hashes standing by.

But for big fans (who also have the money to spend, of course), the decision to pirate rather than pay is one not to be taken lightly. The fight will be a huge spectacle that will probably go down in history as the biggest combat sports event of all time. If streams go down early, that moment will be gone forever, so forget telling your kids about the time you watched McGregor knock out Mayweather in Round Two.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

ROI is not a cybersecurity concept

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/08/roi-is-not-cybersecurity-concept.html

In the cybersecurity community, much time is spent trying to speak the language of business, in order to communicate to business leaders our problems. One way we do this is trying to adapt the concept of “return on investment” or “ROI” to explain why they need to spend more money. Stop doing this. It’s nonsense. ROI is a concept pushed by vendors in order to justify why you should pay money for their snake oil security products. Don’t play the vendor’s game.

The correct concept is simply “risk analysis”. Here’s how it works.

List out all the risks. For each risk, calculate:

  • How often it occurs.
  • How much damage it does.
  • How to mitigate it.
  • How effective the mitigation is (reduces chance and/or cost).
  • How much the mitigation costs.

If you have risk of something that’ll happen once-per-day on average, costing $1000 each time, then a mitigation costing $500/day that reduces likelihood to once-per-week is a clear win for investment.

Now, ROI should in theory fit directly into this model. If you are paying $500/day to reduce that risk, I could use ROI to show you hypothetical products that will …

  • …reduce the remaining risk to once-per-month for an additional $10/day.
  • …replace that $500/day mitigation with a $400/day mitigation.

But this is never done. Companies don’t have a sophisticated enough risk matrix in order to plug in some ROI numbers to reduce cost/risk. Instead, ROI is a calculation is done standalone by a vendor pimping product, or a security engineer building empires within the company.

If you haven’t done risk analysis to begin with (and almost none of you have), then ROI calculations are pointless.

But there are further problems. This is risk analysis as done in industries like oil and gas, which have inanimate risk. Almost all their risks are due to accidental failures, like in the Deep Water Horizon incident. In our industry, cybersecurity, risks are animate — by hackers. Our risk models are based on trying to guess what hackers might do.

An example of this problem is when our drug company jacks up the price of an HIV drug, Anonymous hackers will break in and dump all our financial data, and our CFO will go to jail. A lot of our risks come now from the technical side, but the whims and fads of the hacker community.

Another example is when some Google researcher finds a vuln in WordPress, and our website gets hacked by that three months from now. We have to forecast not only what hackers can do now, but what they might be able to do in the future.

Finally, there is this problem with cybersecurity that we really can’t distinguish between pesky and existential threats. Take ransomware. A lot of large organizations have just gotten accustomed to just wiping a few worker’s machines every day and restoring from backups. It’s a small, pesky problem of little consequence. Then one day a ransomware gets domain admin privileges and takes down the entire business for several weeks, as happened after #nPetya. Inevitably our risk models always come down on the high side of estimates, with us claiming that all threats are existential, when in fact, most companies continue to survive major breaches.

These difficulties with risk analysis leads us to punting on the problem altogether, but that’s not the right answer. No matter how faulty our risk analysis is, we still have to go through the exercise.

One model of how to do this calculation is architecture. We know we need a certain number of toilets per building, even without doing ROI on the value of such toilets. The same is true for a lot of security engineering. We know we need firewalls, encryption, and OWASP hardening, even without specifically doing a calculation. Passwords and session cookies need to go across SSL. That’s the starting point from which we start to analysis risks and mitigations — what we need beyond SSL, for example.

So stop using “ROI”, or worse, the abomination “ROSI”. Start doing risk analysis.