Tag Archives: innovation

Looming Net Neutrality Repeal Sparks BitTorrent Throttling Fears

Post Syndicated from Ernesto original https://torrentfreak.com/looming-net-neutrality-repeal-sparks-bittorrent-throttling-fears-171123/

Ten years ago we uncovered that Comcast was systematically slowing down BitTorrent traffic to ease the load on its network.

The Comcast case ignited a broad discussion about net neutrality and provided the setup for the FCC’s Open Internet Order, which came into effect three years later.

This Open Internet Order then became the foundation of the net neutrality regulation that was adopted in 2015 and still applies today. The big change compared to the earlier attempt was that ISPs can be regulated as carriers under Title II.

These rules provide a clear standard that prevents ISPs from blocking, throttling, and paid prioritization of “lawful” traffic. However, this may soon be over as the FCC is determined to repeal it.

FCC head Ajit Pai recently told Reuters that the current rules are too restrictive and hinder competition and innovation, which is ultimately not in the best interests of consumers

“The FCC will no longer be in the business of micromanaging business models and preemptively prohibiting services and applications and products that could be pro-competitive,” Pai said. “We should simply set rules of the road that let companies of all kinds in every sector compete and let consumers decide who wins and loses.”

This week the FCC released its final repeal draft (pdf), which was met with fierce resistance from the public and various large tech companies. They fear that, if the current net neutrality rules disappear, throttling and ‘fast lanes’ for some services will become commonplace.

This could also mean that BitTorrent traffic could become a target once again, with it being blocked or throttled across many networks, as The Verge just pointed out.

Blocking BitTorrent traffic would indeed become much easier if current net neutrality safeguards were removed. However, the FCC believes that the current “no-throttling rules are unnecessary to prevent the harms that they were intended to thwart,” such as blocking entire file transfer protocols.

Instead, the FCC notes that antitrust law, FTC enforcement of ISP commitments, and consumer expectations will prevent any unwelcome blocking. This is also the reason why ISPs adopted no-blocking policies even when they were not required to, they point out.

Indeed, when the DC Circuit Court of Appeals decimated the Open Internet Order in 2014, Comcast was quick to assure subscribers that it had no plans to start throttling torrents again. Yes, that offers no guarantees for the future.

The FCC goes on to mention that the current net neutrality rules don’t prevent selective blocking. They can already be bypassed by ISPs if they offer “curated services,” which allows them to filter content on viewpoint grounds. And Edge providers also block content because it violates their “viewpoints,” citing the Cloudflare termination of The Daily Stormer.

Net neutrality supporters see these explanations as weak excuses and have less trust in the self-regulating capacity of the ISP industry that the FCC, calling for last minute protests to stop the repeal.

For now it appears, however, that the FCC is unlikely to change its course, as Ars Technica reports.

While net neutrality concerns are legitimate, for BitTorrent users not that much will change.

As we’ve highlighted in the past, blocking pirate sites is already an option under the current rules. The massive copyright loophole made sure of that. Targeting all torrent traffic is even an option, in theory.

If net neutrality is indeed repealed next month, blocking or throttling BitTorrent traffic across the entire network will become easier, no doubt. For now, however, there are no signs that any ISPs plan to do so.

If it does, we will know soon enough. The FCC will require ISPs to be transparent under the new plan. They have to disclose network management practices, blocking efforts, commercial prioritization, and the like.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Well-Architected Lens: Focus on Specific Workload Types

Post Syndicated from Philip Fitzsimons original https://aws.amazon.com/blogs/architecture/well-architected-lens-focus-on-specific-workload-types/

Customers have been building their innovations on AWS for over 11 years. During that time, our solutions architects have conducted tens of thousands of architecture reviews for our customers. In 2012 we created the “Well-Architected” initiative to share with you best practices for building in the cloud, and started publishing them in 2015. We recently released an updated Framework whitepaper, and a new Operational Excellence Pillar whitepaper to reflect what we learned from working with customers every day. Today, we are pleased to announce a new concept called a “lens” that allows you to focus on specific workload types from the well-architected perspective.

A well-architected review looks at a workload from a general technology perspective, which means it can’t provide workload-specific advice. For example, there are additional best practices when you are building high-performance computing (HPC) or serverless applications. Therefore, we created the concept of a lens to focus on what is different for those types of workloads.

In each lens, we document common scenarios we see — specific to that workload — providing reference architectures and a walkthrough. The lens also provides design principles to help you understand how to architect these types of workloads for the cloud, and questions for assessing your own architecture.

Today, we are releasing two lenses:

Well-Architected: High-Performance Computing (HPC) Lens <new>
Well-Architected: Serverless Applications Lens <new>

We expect to create more lenses over time, and evolve them based on customer feedback.

Philip Fitzsimons, Leader, AWS Well-Architected Team

Ares Kodi Project Calls it Quits After Hollywood Cease & Desist

Post Syndicated from Andy original https://torrentfreak.com/ares-kodi-project-calls-it-quits-after-hollywood-cease-desist-171117/

This week has been particularly bad for those involved in the Kodi addon scene. Following cease-and-desist notices from the MPA-led anti-piracy coalition Alliance for Creativity and Entertainment, several addon developers and repositories shut down.

With Columbia, Disney, Paramount, Twentieth Century Fox, Universal, Warner, Netflix, Amazon and Sky TV all lined up for war, the third-party developers had little choice but to quit. One of those affected was the leader of the hugely popular Ares Project, which quietly disappeared mid-week.

The Ares Wizard was an extremely popular and important piece of software which allowed people to switch Kodi builds, install third-party addons, install popular repositories, change system settings, and carry out backups. It’s installed on huge numbers of machines worldwide but it will soon fall into disrepair.

The mighty Ares Wizard in action

“[This week] I was subject to a hand-delivered notice to cease-and-desist from MPA & ACE,” Ares Project leader Tekto informs TorrentFreak.

“Given the notice, we obviously shut down the repo and wizard as requested.”

The news that Ares Project is done and never coming back will be a huge blow to the community. The project just celebrated its second birthday and has grown exponentially since it first arrived on the scene.

“Ares Project started in Oct 2015. Originally it was to be a tool to setup up the video cache on Kodi correctly. However, many ideas were thrown into the pot and it became a wee bit more; such as a wizard to install community provided builds, common addons and few other tweaks and options,” Tekto says.

“For my own part I started blogging earlier that year as part of a longer-term goal to be self-funding. I always disliked seeing begging bowls out to support ‘server’ costs, many of which were cheap £5-10 per month servers that were used to gain £100s in donations.

“The blog, via affiliate links and ads, could and would provide the funds to cover our hosting costs without resorting to begging for money every weekend.”

Intrigued by this first wave of actions by ACE in Europe, TorrentFreak asked for a copy of the MPA/ACE cease-and-desist notice but unfortunately, Tekto flat-out refused. All he would tell us is that he’d agreed not to give out any copies or screenshots and that he was adhering to that 100%.

That only leaves speculation as to what grounds the MPA/ACE cited for closing the project but to be fair, it doesn’t take much thought to find a direct comparison. Earlier this year, in the BREIN v Filmspeler case, the European Court of Justice (ECJ) ruled that selling “fully-loaded” Kodi boxes amounted to illegally communicating copyrighted content to the public.

With that in mind, it doesn’t take much of a leap to see how this ruling could also apply to someone distributing “fully-loaded” Kodi software builds or addons via a website. It had previously been considered a legal gray area, of course, and it was in that space that the Ares team believed it operated. After all, it took ECJ clarification for local courts in the Netherlands to be satisfied with the legal position.

“There was never any question that what we were doing was illegal. We didn’t and never have hosted any content, we always prevented discussions about illegal paid services, and never sold any devices, pre-loaded or otherwise. That used to be enough to occupy the ‘gray’ area which meant we were safe to develop our applications. That changed in 2017 as we were to discover,” Tekto notes.

Up until this week and apparently oblivious to how the earlier ECJ ruling might affect their operation, things had been going extremely well for Ares. In mid-2016, the group moved to its own support forum that attracted 100,000 signed-up members and 300,000 visitors every month.

“This was quite an achievement in terms of viral marketing but ultimately this would become part of our downfall,” Tekto says.

“The recent innovation of the ‘basket driven’ Ares Portal system seems to have triggered the legal move to shut the project down completely. This simple system gave access to hundreds of add-ons. The system removed the need for builds, blogs and YouTubers – you just shopped on the site for addons and then installed them to your device with a simple 6 digit code.”

While Ares and Tekto still didn’t believe they were doing anything illegal (addons were linked, not hosted) it is now pretty clear to them that the previous gray area has been well and truly closed, at least as far as the MPA/ACE alliance is concerned. And with that in mind, the show is over. Done. Finished.

“We are not criminals or malicious hackers, we weren’t even careful about hiding our identities. You couldn’t meet a more ordinary bunch of folks in truth,” he says.

“There was never any question we would close our doors if what we were doing crossed any boundaries of legality. So with the notice served on us, we are closing our doors and removing all our websites and applications. It’s a sad day in many ways, but nobody wants to be facing court or a potential custodial sentence, for what is essentially a hobby.”

Finally, Tekto says that others like him might want to consider their positions carefully, before they too get a knock at the door. In the meantime, he gives thanks to the project’s supporters, who have remained loyal over the past two years.

“It just leaves me to thank our users for their support and step away from the Kodi scene,” he concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Backing Up the Modern Enterprise with Backblaze for Business

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/endpoint-backup-solutions/

Endpoint backup diagram

Organizations of all types and sizes need reliable and secure backup. Whether they have as few as 3 or as many as 300,000 computer users, an organization’s computer data is a valuable business asset that needs to be protected.

Modern organizations are changing how they work and where they work, which brings new challenges to making sure that company’s data assets are not only available, but secure. Larger organizations have IT departments that are prepared to address these needs, but often times in smaller and newer organizations the challenge falls upon office management who might not be as prepared or knowledgeable to face a work environment undergoing dramatic changes.

Whether small or large, local or world-wide, for-profit or non-profit, organizations need a backup strategy and solution that matches the new ways of working in the enterprise.

The Enterprise Has Changed, and So Has Data Use

More and more, organizations are working in the cloud. These days organizations can operate just fine without their own file servers, database servers, mail servers, or other IT infrastructure that used to be standard for all but the smallest organization.

The reality is that for most organizations, though, it’s a hybrid work environment, with a combination of cloud-based and PC and Macintosh-based applications. Legacy apps aren’t going away any time soon. They will be with us for a while, with their accompanying data scattered amongst all the desktops, laptops and other endpoints in corporate headquarters, home offices, hotel rooms, and airport waiting areas.

In addition, the modern workforce likely combines regular full-time employees, remote workers, contractors, and sometimes interns, volunteers, and other temporary workers who also use company IT assets.

The Modern Enterprise Brings New Challenges for IT

These changes in how enterprises work present a problem for anyone tasked with making sure that data — no matter who uses it or where it lives — is adequately backed-up. Cloud-based applications, when properly used and managed, can be adequately backed up, provided that users are connected to the internet and data transfers occur regularly — which is not always the case. But what about the data on the laptops, desktops, and devices used by remote employees, contractors, or just employees whose work keeps them on the road?

The organization’s backup solution must address all the needs of the modern organization or enterprise using both cloud and PC and Mac-based applications, and not be constrained by employee or computer location.

A Ten-Point Checklist for the Modern Enterprise for Backing Up

What should the modern enterprise look for when evaluating a backup solution?

1) Easy to deploy to workers’ computers

Whether installed by the computer user or an IT person locally or remotely, the backup solution must be easy to implement quickly with minimal demands on the user or administrator.

2) Fast and unobtrusive client software

Backups should happen in the background by efficient (native) PC and Macintosh software clients that don’t consume valuable processing power or take memory away from applications the user needs.

3) Easy to configure

The backup solutions must be easy to configure for both the user and the IT professional. Ease-of-use means less time to deploy, configure, and manage.

4) Defaults to backing up all valuable data

By default, the solution backs up commonly used files and folders or directories, including desktops. Some backup solutions are difficult and intimidating because they require that the user chose what needs to be backed up, often missing files and folders/directories that contain valuable data.

5) Works automatically in the background

Backups should happen automatically, no matter where the computer is located. The computer user, especially the remote or mobile one, shouldn’t be required to attach cables or drives, or remember to initiate backups. A working solution backs up automatically without requiring action by the user or IT administrator.

6) Data restores are fast and easy

Whether it’s a single file, directory, or an entire system that must be restored, a user or IT sysadmin needs to be able to restore backed up data as quickly as possible. In cases of large restores to remote locations, the ability to send a restore via physical media is a must.

7) No limitations on data

Throttling, caps, and data limits complicate backups and require guesses about how much storage space will be needed.

8) Safe & Secure

Organizations require that their data is secure during all phases of initial upload, storage, and restore.

9) Easy-to-manage

The backup solution needs to provide a clear and simple web management interface for all functions. Designing for ease-of-use leads to efficiency in management and operation.

10) Affordable and transparent pricing

Backup costs should be predictable, understandable, and without surprises.

Two Scenarios for the Modern Enterprise

Enterprises exist in many forms and types, but wanting to meet the above requirements is common across all of them. Below, we take a look at two common scenarios showing how enterprises face these challenges. Three case studies are available that provide more information about how Backblaze customers have succeeded in these environments.

Enterprise Profile 1

The needs of a smaller enterprise differ from those of larger, established organizations. This organization likely doesn’t have anyone who is devoted full-time to IT. The job of on-boarding new employees and getting them set up with a computer likely falls upon an executive assistant or office manager. This person might give new employees a checklist with the software and account information and lets users handle setting up the computer themselves.

Organizations in this profile need solutions that are easy to install and require little to no configuration. Backblaze, by default, backs up all user data, which lets the organization be secure in knowing all the data will be backed up to the cloud — including files left on the desktop. Combined with Backblaze’s unlimited data policy, organizations have a truly “set it and forget it” platform.

Customizing Groups To Meet Teams’ Needs

The Groups feature of Backblaze for Business allows an organization to decide whether an individual client’s computer will be Unmanaged (backups and restores under the control of the worker), or Managed, in which an administrator can monitor the status and frequency of backups and handle restores should they become necessary. One group for the entire organization might be adequate at this stage, but the organization has the option to add additional groups as it grows and needs more flexibility and control.

The organization, of course, has the choice of managing and monitoring users using Groups. With Backblaze’s Groups, organizations can set user-based access rules, which allows the administrator to create restores for lost files or entire computers on an employee’s behalf, to centralize billing for all client computers in the organization, and to redeploy a recovered computer or new computer with the backed up data.

Restores

In this scenario, the decision has been made to let each user manage her own backups, including restores, if necessary, of individual files or entire systems. If a restore of a file or system is needed, the restore process is easy enough for the user to handle it by herself.

Case Study 1

Read about how PagerDuty uses Backblaze for Business in a mixed enterprise of cloud and desktop/laptop applications.

PagerDuty Case Study

In a common approach, the employee can retrieve an accidentally deleted file or an earlier version of a document on her own. The Backblaze for Business interface is easy to navigate and was designed with feedback from thousands of customers over the course of a decade.

In the event of a lost, damaged, or stolen laptop,  administrators of Managed Groups can  initiate the restore, which could be in the form of a download of a restore ZIP file from the web management console, or the overnight shipment of a USB drive directly to the organization or user.

Enterprise Profile 2

This profile is for an organization with a full-time IT staff. When a new worker joins the team, the IT staff is tasked with configuring the computer and delivering it to the new employee.

Backblaze for Business Groups

Case Study 2

Global charitable organization charity: water uses Backblaze for Business to back up workers’ and volunteers’ laptops as they travel to developing countries in their efforts to provide clean and safe drinking water.

charity: water Case Study

This organization can take advantage of additional capabilities in Groups. A Managed Group makes sense in an organization with a geographically dispersed work force as it lets IT ensure that workers’ data is being regularly backed up no matter where they are. Billing can be company-wide or assigned to individual departments or geographical locations. The organization has the choice of how to divide the organization into Groups (location, function, subsidiary, etc.) and whether the Group should be Managed or Unmanaged. Using Managed Groups might be suitable for most of the organization, but there are exceptions in which sensitive data might dictate using an Unmanaged Group, such as could be the case with HR, the executive team, or finance.

Deployment

By Invitation Email, Link, or Domain

Backblaze for Business allows a number of options for deploying the client software to workers’ computers. Client installation is fast and easy on both Windows and Macintosh, so sending email invitations to users or automatically enrolling users by domain or invitation link, is a common approach.

By Remote Deployment

IT might choose to remotely and silently deploy Backblaze for Business across specific Groups or the entire organization. An administrator can silently deploy the Backblaze backup client via the command-line, or use common RMM (Remote Monitoring and Management) tools such as Jamf and Munki.

Restores

Case Study 3

Read about how Bright Bear Technology Solutions, an IT Managed Service Provider (MSP), uses the Groups feature of Backblaze for Business to manage customer backups and restores, deploy Backblaze licenses to their customers, and centralize billing for all their client-based backup services.

Bright Bear Case Study

Some organizations are better equipped to manage or assist workers when restores become necessary. Individual users will be pleased to discover they can roll-back files to an earlier version if they wish, but IT will likely manage any complete system restore that involves reconfiguring a computer after a repair or requisitioning an entirely new system when needed.

This organization might chose to retain a client’s entire computer backup for archival purposes, using Backblaze B2 as the cloud storage solution. This is another advantage of having a cloud storage provider that combines both endpoint backup and cloud object storage among its services.

The Next Step: Server Backup & Data Archiving with B2 Cloud Storage

As organizations grow, they have increased needs for cloud storage beyond Macintosh and PC data backup. Backblaze’s object cloud storage, Backblaze B2, provides low-cost storage and archiving of records, media, and server data that can grow with the organization’s size and needs.

B2 Cloud Storage is available through the same Backblaze management console as Backblaze Computer Backup. This means that Admins have one console for billing, monitoring, deployment, and role provisioning. B2 is priced at 1/4 the cost of Amazon S3, or $0.005 per month per gigabyte (which equals $5/month per terabyte).

Why Modern Enterprises Chose Backblaze

Backblaze for Business

Businesses and organizations select Backblaze for Business for backup because Backblaze is designed to meet the needs of the modern enterprise. Backblaze customers are part of a a platform that has a 10+ year track record of innovation and over 400 petabytes of customer data already under management.

Backblaze’s backup model is proven through head-to-head comparisons to back up data that other backup solutions overlook in their default configurations — including valuable files that are needed after an accidental deletion, theft, or computer failure.

Backblaze is the only enterprise-level backup company that provides TOTP (Time-based One-time Password) via both SMS and Authentication app to all accounts at no incremental charge. At just $50/year/computer, Backblaze is affordable for any size of enterprise.

Modern Enterprises can Meet The Challenge of The Changing Data Environment

With the right backup solution and strategy, the modern enterprise will be prepared to ensure that its data is protected from accident, disaster, or theft, whether its data is in one office or dispersed among many locations, and remote and mobile employees.

Backblaze for Business is an affordable solution that enables organizations to meet the evolving data demands facing the modern enterprise.

The post Backing Up the Modern Enterprise with Backblaze for Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Just in Case You Missed It: Catching Up on Some Recent AWS Launches

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/just-in-case-you-missed-it-catching-up-on-some-recent-aws-launches/

So many launches and cloud innovations, that you simply may not believe.  In order to catch up on some service launches and features, this post will be a round-up of some cool releases that happened this summer and through the end of September.

The launches and features I want to share with you today are:

  • AWS IAM for Authenticating Database Users for RDS MySQL and Amazon Aurora
  • Amazon SES Reputation Dashboard
  • Amazon SES Open and Click Tracking Metrics
  • Serverless Image Handler by the Solutions Builder Team
  • AWS Ops Automator by the Solutions Builder Team

Let’s dive in, shall we!

AWS IAM for Authenticating Database Users for RDS MySQL and Amazon Aurora

Wished you could manage access to your Amazon RDS database instances and clusters using AWS IAM? Well, wish no longer. Amazon RDS has launched the ability for you to use IAM to manage database access for Amazon RDS for MySQL and Amazon Aurora DB.

What I like most about this new service feature is, it’s very easy to get started.  To enable database user authentication using IAM, you would select a checkbox Enable IAM DB Authentication when creating, modifying, or restoring your DB instance or cluster. You can enable IAM access using the RDS console, the AWS CLI, and/or the Amazon RDS API.

After configuring the database for IAM authentication, client applications authenticate to the database engine by providing temporary security credentials generated by the IAM Security Token Service. These credentials can be used instead of providing a password to the database engine.

You can learn more about using IAM to provide targeted permissions and authentication to MySQL and Aurora by reviewing the Amazon RDS user guide.

Amazon SES Reputation Dashboard

In order to aid Amazon Simple Email Service customers’ in utilizing best practice guidelines for sending email, I am thrilled to announce we launched the Reputation Dashboard to provide comprehensive reporting on email sending health. To aid in proactively managing emails being sent, customers now have visibility into overall account health, sending metrics, and compliance or enforcement status.

The Reputation Dashboard will provide the following information:

  • Account status: A description of your account health status.
    • Healthy – No issues currently impacting your account.
    • Probation – Account is on probation; Issues causing probation must be resolved to prevent suspension
    • Pending end of probation decision – Your account is on probation. Amazon SES team member must review your account prior to action.
    • Shutdown – Your account has been shut down. No email will be able to be sent using Amazon SES.
    • Pending shutdown – Your account is on probation and issues causing probation are unresolved.
  • Bounce Rate: Percentage of emails sent that have bounced and bounce rate status messages.
  • Complaint Rate: Percentage of emails sent that recipients have reported as spam and complaint rate status messages.
  • Notifications: Messages about other account reputation issues.

Amazon SES Open and Click Tracking Metrics

Another exciting feature recently added to Amazon SES is support for Email Open and Click Tracking Metrics. With Email Open and Click Tracking Metrics feature, SES customers can now track when email they’ve sent has been opened and track when links within the email have been clicked.  Using this SES feature will allow you to better track email campaign engagement and effectiveness.

How does this work?

When using the email open tracking feature, SES will add a transparent, miniature image into the emails that you choose to track. When the email is opened, the mail application client will load the aforementioned tracking which triggers an open track event with Amazon SES. For the email click (link) tracking, links in email and/or email templates are replaced with a custom link.  When the custom link is clicked, a click event is recorded in SES and the custom link will redirect the email user to the link destination of the original email.

You can take advantage of the new open tracking and click tracking features by creating a new configuration set or altering an existing configuration set within SES. After choosing either; Amazon SNS, Amazon CloudWatch, or Amazon Kinesis Firehose as the AWS service to receive the open and click metrics, you would only need to select a new configuration set to successfully enable these new features for any emails you want to send.

AWS Solutions: Serverless Image Handler & AWS Ops Automator

The AWS Solution Builder team has been hard at work helping to make it easier for you all to find answers to common architectural questions to aid in building and running applications on AWS. You can find these solutions on the AWS Answers page. Two new solutions released earlier this fall on AWS Answers are  Serverless Image Handler and the AWS Ops Automator.
Serverless Image Handler was developed to provide a solution to help customers dynamically process, manipulate, and optimize the handling of images on the AWS Cloud. The solution combines Amazon CloudFront for caching, AWS Lambda to dynamically retrieve images and make image modifications, and Amazon S3 bucket to store images. Additionally, the Serverless Image Handler leverages the open source image-processing suite, Thumbor, for additional image manipulation, processing, and optimization.

AWS Ops Automator solution helps you to automate manual tasks using time-based or event-based triggers to automatically such as snapshot scheduling by providing a framework for automated tasks and includes task audit trails, logging, resource selection, scaling, concurrency handling, task completion handing, and API request retries. The solution includes the following AWS services:

  • AWS CloudFormation: a templates to launches the core framework of microservices and solution generated task configurations
  • Amazon DynamoDB: a table which stores task configuration data to defines the event triggers, resources, and saves the results of the action and the errors.
  • Amazon CloudWatch Logs: provides logging to track warning and error messages
  • Amazon SNS: topic to send messages to a subscribed email address to which to send the logging information from the solution

Have fun exploring and coding.

Tara

[$] From lab to libre software: how can academic software research become open source?

Post Syndicated from jake original https://lwn.net/Articles/737217/rss

Academics generate enormous amounts of software, some of which inspires
commercial innovations in networking and other areas. But little academic
software gets released to the public and even less enters common use. Is
some vast “dark matter” being overlooked in the academic community? Would
the world benefit from academics turning more of their software into free
and open projects?

Blockchain? It’s All Greek To Me…

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-its-all-greek-to-me/

The blockchain hype is huge, the ICO craze (“Coindike”) is generating millions if not billions of “funding” for businesses that claim to revolutionize basically anything.

I’ve been following all of that for a while. I got my first (and only) Bitcoin several years ago, I know how the technology works, I’ve implemented the data structure part, I’ve tried (with varying success) to install an Ethereum wallet since almost as soon as Ethereum appeared, and I’ve read and subscribed to newsletters about dozens of projects and new cryptocurrencies, including storj.io, siacoin, namecoin, etc. I would say I’m at least above average in terms of knowledge on how the cryptocurrencies, blockchain, smart contracts, EVM, proof-of-wahtever operates. And I’ve voiced my concerns about the technology in general.

Now it’s rant time.

I’ve been reading whitepapers of various projects, I’ve been to various meetups and talks, I’ve been reading the professed future applications of the blockchain, and I have to admit – it’s all Greek to me. I have no clue what these people are talking about. And why would all of that make any sense. I still think I’m not clever enough to understand the upcoming revolution, but there’s also a cynical side of me that says “this is all a scam”.

Why “X on the blockchain” somehow makes it magical and superior to a good old centralized solution? No, spare me the cliches about “immutable ledger”, “lack of central authority” and the likes. These are the phrases that a person learns after reading literally one article about blockchain. Have you actually written anything apart from a complex-sounding whitepaper or a hello-world smart contract? Do you really know how the overlay network works, how the economic incentives behind that network work, how all the cryptography works? Maybe there are many, many people that indeed know that and they know it better than me and are thus able to imagine the business case behind “X on the blockchain”.

I can’t. I can’t see why it would be useful to abandon a centralized database that you can query in dozens of ways, test easily and scale trivially in favour of a clunky write-only, low-throughput, hard-to-debug privacy nightmare that is any public blockchain. And how do you imagine to gain a substantial userbase with an ecosystem where the Windows client for the 2nd most popular blockchain (Ethereum) has been so buggy, I (a software engineer) couldn’t get it work and sync the whole chain. And why would building a website ontop of that clunky, user-unfriendly database has any benefit over a centralized competitor?

Do we all believe that somehow the huge datacenters with guarnateed power backups, regular hardware and network checks, regular backups and overall – guaranteed redundancy – will somehow be beaten by a few thousand machines hosting a software that has the sole purpose of guaranteeing integrity? Bitcoin has 10 thousand nodes. Ethereum has 22 thousand nodes. And while these nodes are probably very well GPU-equipped, they aren’t supercomputers. Amazon’s AWS has a million servers. How’s that for comparison. And why would anyone take seriously 22 thousand non-servers. Or even 220 thousand, if we believe in some inevitable growth.

Don’t get me wrong, the technology is really cool. The way tamper-evident data structures (hash chains) were combined with a consensus algorithm, an overlay network and a financial incentive is really awesome. When you add a distributed execution environment, it gets even cooler. But is it suitable for literally everything? I fail to see how.

I’m sure I’m missing something. The fact that many of those whitepapers sound increasingly like Greek to me might hint that I’m just a dumb developer and those enlightened people are really onto something huge. I guess time will tell.

But I happen to be living in a country that saw a transition to capitalism in the years of my childhood. And there were a lot of scams and ponzi schemes that people believed in. Because they didn’t know how capitalism works, how the market works. I’m seeing some similarities – we have no idea how the digital realm really works, and so a lot of scams are bound to appear, until we as a society learn the basics.

Until then – enjoy your ICO, enjoy your tokens, enjoy your big-player competitor with practically the same business model, only on a worse database.

And I hope that after the smoke of hype and fraud clears, we’ll be able to enjoy the true benefits of the blockchain innovation.

The post Blockchain? It’s All Greek To Me… appeared first on Bozho's tech blog.

Introducing Gluon: a new library for machine learning from AWS and Microsoft

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-library-for-machine-learning-from-aws-and-microsoft/

Post by Dr. Matt Wood

Today, AWS and Microsoft announced Gluon, a new open source deep learning interface which allows developers to more easily and quickly build machine learning models, without compromising performance.

Gluon Logo

Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure. More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Gluon is available in Apache MXNet today, a forthcoming Microsoft Cognitive Toolkit release, and in more frameworks over time.

Neural Networks vs Developers
Machine learning with neural networks (including ‘deep learning’) has three main components: data for training; a neural network model, and an algorithm which trains the neural network. You can think of the neural network in a similar way to a directed graph; it has a series of inputs (which represent the data), which connect to a series of outputs (the prediction), through a series of connected layers and weights. During training, the algorithm adjusts the weights in the network based on the error in the network output. This is the process by which the network learns; it is a memory and compute intensive process which can take days.

Deep learning frameworks such as Caffe2, Cognitive Toolkit, TensorFlow, and Apache MXNet are, in part, an answer to the question ‘how can we speed this process up? Just like query optimizers in databases, the more a training engine knows about the network and the algorithm, the more optimizations it can make to the training process (for example, it can infer what needs to be re-computed on the graph based on what else has changed, and skip the unaffected weights to speed things up). These frameworks also provide parallelization to distribute the computation process, and reduce the overall training time.

However, in order to achieve these optimizations, most frameworks require the developer to do some extra work: specifically, by providing a formal definition of the network graph, up-front, and then ‘freezing’ the graph, and just adjusting the weights.

The network definition, which can be large and complex with millions of connections, usually has to be constructed by hand. Not only are deep learning networks unwieldy, but they can be difficult to debug and it’s hard to re-use the code between projects.

The result of this complexity can be difficult for beginners and is a time-consuming task for more experienced researchers. At AWS, we’ve been experimenting with some ideas in MXNet around new, flexible, more approachable ways to define and train neural networks. Microsoft is also a contributor to the open source MXNet project, and were interested in some of these same ideas. Based on this, we got talking, and found we had a similar vision: to use these techniques to reduce the complexity of machine learning, making it accessible to more developers.

Enter Gluon: dynamic graphs, rapid iteration, scalable training
Gluon introduces four key innovations.

  1. Friendly API: Gluon networks can be defined using a simple, clear, concise code – this is easier for developers to learn, and much easier to understand than some of the more arcane and formal ways of defining networks and their associated weighted scoring functions.
  2. Dynamic networks: the network definition in Gluon is dynamic: it can bend and flex just like any other data structure. This is in contrast to the more common, formal, symbolic definition of a network which the deep learning framework has to effectively carve into stone in order to be able to effectively optimizing computation during training. Dynamic networks are easier to manage, and with Gluon, developers can easily ‘hybridize’ between these fast symbolic representations and the more friendly, dynamic ‘imperative’ definitions of the network and algorithms.
  3. The algorithm can define the network: the model and the training algorithm are brought much closer together. Instead of separate definitions, the algorithm can adjust the network dynamically during definition and training. Not only does this mean that developers can use standard programming loops, and conditionals to create these networks, but researchers can now define even more sophisticated algorithms and models which were not possible before. They are all easier to create, change, and debug.
  4. High performance operators for training: which makes it possible to have a friendly, concise API and dynamic graphs, without sacrificing training speed. This is a huge step forward in machine learning. Some frameworks bring a friendly API or dynamic graphs to deep learning, but these previous methods all incur a cost in terms of training speed. As with other areas of software, abstraction can slow down computation since it needs to be negotiated and interpreted at run time. Gluon can efficiently blend together a concise API with the formal definition under the hood, without the developer having to know about the specific details or to accommodate the compiler optimizations manually.

The team here at AWS, and our collaborators at Microsoft, couldn’t be more excited to bring these improvements to developers through Gluon. We’re already seeing quite a bit of excitement from developers and researchers alike.

Getting started with Gluon
Gluon is available today in Apache MXNet, with support coming for the Microsoft Cognitive Toolkit in a future release. We’re also publishing the front-end interface and the low-level API specifications so it can be included in other frameworks in the fullness of time.

You can get started with Gluon today. Fire up the AWS Deep Learning AMI with a single click and jump into one of 50 fully worked, notebook examples. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

-Dr. Matt Wood

5 Reasons Why AWS Leads the Cloud Market

Post Syndicated from Chris De Santis original https://www.anchor.com.au/blog/2017/10/5-reasons-aws-leads-cloud/

There is no doubt that in the cloud computing market, there is a lot of competition, but there is also a clear market leader. Amazon Web Services (AWS) leads the charge among other web services from similar tech giants such as Microsoft, IBM, and Google, but how did they get there and what’s taking so long for someone of the likes of Google to knock them off their pedestal?

5 Reasons Why AWS Leads the Cloud Market

Credit: Synergy Research Group

Recent research from Synergy Research shows that Amazon has a seemingly unbeatable lead. John Dinsdale, chief analyst at Synergy Research, told TechCrunch that, on paper, AWS is too far ahead of any competitor trying to gain a short-term advantage. The reason behind their spectacular lead is simple:

AWS was first.

If you start the race before everyone else and keep at the pace they’re running, you’re going to win, and that’s exactly what Amazon are doing. Yet, instead of sitting on their colossal market share like a throne, they’re continuing to rapidly innovate and differentiate.

Dinsdale continues to explain that AWS does five things continuously that allows them to stay on top of the cloud market:

  1. Invest considerable amounts in infrastructure
  2. Expand their fleet of services
  3. Execute it all well
  4. Grow its business with enterprises
  5. Has the full long-term backing of Amazon

What can we take from this?

Well, according to Dinsdale, the Amazon formula involves:

  • Investing in your innovation
  • Constantly broadening your product/service range
  • Perform with minimal error
  • Aim for the high-profile customers
  • Look to receive stable funding and support

The post 5 Reasons Why AWS Leads the Cloud Market appeared first on AWS Managed Services by Anchor.

Things Go Better With Step Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/things-go-better-with-step-functions/

I often give presentations on Amazon’s culture of innovation, and start out with a slide that features a revealing quote from Amazon founder Jeff Bezos:

I love to sit down with our customers and to learn how we have empowered their creativity and to pursue their dreams. Earlier this year I chatted with Patrick from The Coca-Cola Company in order to learn how they used AWS Step Functions and other AWS services to support the Coke.com Vending Pass program. This program includes drink rewards earned by purchasing products at vending machines equipped to support mobile payments using the Coca-Cola Vending Pass. Participants swipe their NFC-enabled phones to complete an Apple Pay or Android Pay purchase, identifying themselves to the vending machine and earning credit towards future free vending purchases in the process

After the swipe, a combination of SNS topics and AWS Lambda functions initiated a pair of calls to some existing backend code to count the vending points and update the participant’s record. Unfortunately, the backend code was slow to react and had some timing dependencies, leading to missing updates that had the potential to confuse Vending Pass participants. The initial solution to this issue was very simple: modify the Lambda code to include a 90 second delay between the two calls. This solved the problem, but ate up process time for no good reason (billing for the use of Lambda functions is based on the duration of the request, in 100 ms intervals).

In order to make their solution more cost-effective, the team turned to AWS Step Functions, building a very simple state machine. As I wrote in an earlier blog post, Step Functions coordinate the components of distributed applications and microservices at scale, using visual workflows that are easy to build.

Coke built a very simple state machine to simplify their business logic and reduce their costs. Yours can be equally simple, or they can make use of other Step Function features such as sequential and parallel execution and the ability to make decisions and choose alternate states. The Coke state machine looks like this:

The FirstState and the SecondState states (Task states) call the appropriate Lambda functions while Step Functions implements the 90 second delay (a Wait state). This modification simplified their logic and reduced their costs. Here’s how it all fits together:

 

What’s Next
This initial success led them to take a closer look at serverless computing and to consider using it for other projects. Patrick told me that they have already seen a boost in productivity and developer happiness. Developers no longer need to wait for servers to be provisioned, and can now (as Jeff says) unleash their creativity and pursue their dreams. They expect to use Step Functions to improve the scalability, functionality, and reliability of their applications, going far beyond the initial use for the Coca-Cola Vending Pass. For example, Coke has built a serverless solution for publishing nutrition information to their food service partners using Lambda, Step Functions, and API Gateway.

Patrick and his team are now experimenting with machine learning and artificial intelligence. They built a prototype application to analyze a stream of photos from Instagram and extract trends in tastes and flavors. The application (built as a quick, one-day prototype) made use of Lambda, Amazon DynamoDB, Amazon API Gateway, and Amazon Rekognition and was, in Patrick’s words, a “big win and an enabler.”

In order to build serverless applications even more quickly, the development team has created an internal CI/CD reference architecture that builds on the Serverless Application Framework. The architecture includes a guided tour of Serverless and some boilerplate code to access internal services and assets. Patrick told me that this model allows them to easily scale promising projects from “a guy with a computer” to an entire development team.

Patrick will be on stage at AWS re:Invent next to my colleague Tim Bray. To meet them in person, be sure to attend SRV306 – State Machines in the Wild! How Customers Use AWS Step Functions.

Jeff;

Porn Copyright Trolls Terrify 60-Year-Old But Age Shouldn’t Matter

Post Syndicated from Andy original https://torrentfreak.com/porn-copyright-trolls-terrify-60-year-old-but-age-shouldnt-matter-171002/

Of all the anti-piracy tactics deployed over the years, the one that has proven most controversial is so-called copyright-trolling.

The idea is that rather than take content down, copyright holders make use of its online availability to watch people who are sharing that material while gathering their IP addresses.

From there it’s possible to file a lawsuit to obtain that person’s identity but these days they’re more likely to short-cut the system, by asking ISPs to forward notices with cash settlement demands attached.

When subscribers receive these demands, many feel compelled to pay. However, copyright trolls are cunning beasts, and while they initially ask for payment for a single download, they very often have several other claims up their sleeves. Once people have paid one, others come out of the woodwork.

That’s what appears to have happened to a 60-year-old Canadian woman called ‘Debra’. In an email sent via her ISP, she was contacted by local anti-piracy outfit Canipre, who accused her of downloading and sharing porn. With threats that she could be ‘fined’ up to CAD$20,000 for her alleged actions, she paid the company $257.40, despite claiming her innocence.

Of course, at this point the company knew her name and address and this week the company contacted her again, accusing her of another five illegal porn downloads alongside demands for more cash.

“I’m not sleeping,” Debra told CBC. “I have depression already and this is sending me over the edge.”

If the public weren’t so fatigued by this kind of story, people in Debra’s position might get more attention and more help, but they don’t. To be absolutely brutal, the only reason why this story is getting press is due to a few factors.

Firstly, we’re talking here about a woman accused of downloading porn. While far from impossible, it’s at least statistically less likely than if it was a man. Two, Debra is 60-years-old. That doesn’t preclude her from being Internet savvy but it does tip the odds in her favor somewhat. Thirdly, Debra suffers from depression and claims she didn’t carry out those downloads.

On the balance of probabilities, on which these cases live or die, she sounds believable. Had she been a 20-year-old man, however, few people would believe ‘him’ and this is exactly the environment companies like Canipre, Rightscorp, and similar companies bank on.

Debra says she won’t pay the additional fines but Canipre is adamant that someone in her house pirated the porn, despite her husband not being savvy enough to download. The important part here is that Debra says she did not commit an offense and with all the technology in the world, Canpire cannot prove that she did.

“How long is this going to terrorize me?” Debra says. “I’m a good Canadian citizen.”

But Debra isn’t on her own and she’s positively spritely compared to Christine McMillan, who last year at the age of 86-years-old was accused of illegally downloading zombie game Metro 2033. Again, those accusations came from Canipre and while the case eventually went quiet, you can safely bet the company backed off.

So who is to blame for situations like Debra’s and Christine’s? It’s a difficult question.

Clearly, copyright holders feel they’re within their rights to try and claw back compensation for their perceived losses but they already have a legal system available to them, if they want to use it. Instead, however, in Canada they’re abusing the so-called notice-and-notice system, which requires ISPs to forward infringement notices from copyright holders to subscribers.

The government knows there is a problem. Law professor Michael Geist previously obtained a government report, which expresses concern over the practice. Its summary is shown below.

Advice summary

While the notice-and-notice regime requires ISPs to forward educational copyright infringement notices, most ISPs complain that companies like Canipre add on cash settlement demands.

“Internet intermediaries complain…that the current legislative framework does not expressly prohibit this practice and that they feel compelled to forward on such notices to their subscribers when they receive them from copyright holders,” recent advice to the Minister of Innovation, Science and Economic Development reads.

That being said, there’s nothing stopping ISPs from passing on the educational notices as required by law but insisting that all demands for cash payments are removed. It’s a position that could even get support from the government, if enough pressure was applied.

“The sending of such notices could lead to abuses, given that consumers may be pressured into making payments even in situations where they have not engaged in any acts that violate copyright laws,” government advice notes.

Given the growing problem, it appears that ISPs have the power here so maybe it’s time they protected their customers. In the meantime, consumers have responsibilities too, not only by refraining from infringing copyright, but by becoming informed of their rights.

“[T]here is no legal obligation to pay any settlement offered by a copyright owner, and the regime does not impose any obligations on a subscriber who receives a notice, including no obligation to contact the copyright owner or the Internet intermediary,” government advice notes.

Hopefully, in future, people won’t have to be old or ill to receive sympathy for being wrongly accused and threatened in their own homes. But until then, people should pressure their ISPs to do more while staying informed.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

TVAddons and ZemTV Operators Named in US Lawsuit

Post Syndicated from Ernesto original https://torrentfreak.com/tvaddons-and-zemtv-operators-named-in-us-lawsuit-170926/

Earlier this year, American satellite and broadcast provider Dish Network targeted two well-known players in the third-party Kodi add-on ecosystem.

In a complaint filed in a federal court in Texas, add-on ZemTV and the TVAddons library were accused of copyright infringement. As a result, both are facing up to $150,000 for each offense.

Initially, the true identities of the defendants unknown and listed as John Does, but an amended complaint that was submitted yesterday reveal their alleged names and hometowns.

The Texas court previously granted subpoenas which allowed Dish to request information from the defendants’ accounts on services including Amazon, Github, Google, Twitter, Facebook and PayPal, which likely helped with the identification.

According to Dish ZemTV was developed by Shahjahan Durrani, who’s based in London, UK. He allegedly controlled and maintained the addon which was used to stream infringing broadcasts of Dish content.

“Durrani developed the ZemTV add-on and managed and operated the ZemTV service. Durrani used the aliases ‘Shani’ and ‘Shani_08′ to communicate with users of the ZemTV service,” the complaint reads.

The owner and operator of TVAddons is listed as Adam Lackman, who resides in Montreal, Canada. This doesn’t really come as a surprise, since Lackman is publicly listed as TVAddons’ owner on Linkedin and was previously named in a Canadian lawsuit.

While both defendants are named, the allegations against them haven’t changed substantially. Both face copyright infringement charges and potentially risk millions of dollars in damages.

Durrani directly infringed Dish’s copyrights by making the streams available, the plaintiffs note. Lackman subsequently profited from this and failed to take any action in response.

“Lackman had the legal right and actual ability to supervise and control this infringing activity because Lackman made the ZemTV add-on, which is necessary to access the ZemTV service, available for download on his websites.

“Lackman refused to take any action to stop the infringement of DISH’s exclusive rights in the programs transmitted through the ZemTV service,” the complaint adds.

TorrentFreak spoke to a TVAddons representative who refutes the copyright infringement allegations. The website sees itself as a platform for user-generated content and cites the DMCA’s safe harbor as a defense.

“TV ADDONS is not a piracy site, it’s a platform for developers of open source add-ons for the Kodi media center. As a community platform filled with user-generated content, we have always acted in accordance with the law and swiftly complied whenever we received a DMCA takedown notice.”

The representative states that it will be very difficult for them to defend themselves against a billion dollar company with unlimited resources, but hopes that the site will prevail.

The new TVAddons

After the original TVAddons.ag domain was seized in the Canadian lawsuit the site returned on TVaddons.co. However, hundreds of allegedly infringing add-ons are no longer listed.

The site previously relied on the DMCA to shield it from liability but apparently, that wasn’t enough. As a result, they now check all submitted add-ons carefully.

“Since complying with the law is clearly not enough to prevent frivolous legal action from being taken against you, we have been forced to implement a more drastic code vetting process,” the TVAddons representative says.

If it’s not entirely clear that an add-on is properly licensed, it won’t be submitted for the time being. This hampers innovation, according to TVAddons, and threatens many communities that rely on user-generated content.

“When you visit any given web site, how can you be certain that every piece of media you see is licensed by the website displaying it? You can assume, but it’s very difficult to be certain. That’s why the DMCA is critical to the existence of online communities.”

Now that both defendants have been named the case will move forward. This may eventually lead to an in-depth discovery process where Dish will try to find more proof that both were knowingly engaging in infringing activity.

Durrani and Lackman, on the other hand, will try to prove their innocence.

A copy of the amended complaint is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Backblaze’s Upgrade Guide for macOS High Sierra

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/macos-high-sierra-upgrade-guide/

High Sierra

Apple introduced macOS 10.13 “High Sierra” at its 2017 Worldwide Developers Conference in June. On Tuesday, we learned we don’t have long to wait — the new OS will be available on September 25. It’s a free upgrade, and millions of Mac users around the world will rush to install it.

We understand. A new OS from Apple is exciting, But please, before you upgrade, we want to remind you to back up your Mac. You want your data to be safe from unexpected problems that could happen in the upgrade. We do, too. To make that easier, Backblaze offers this macOS High Sierra upgrade guide.

Why Upgrade to macOS 10.13 High Sierra?

High Sierra, as the name suggests, is a follow-on to the previous macOS, Sierra. Its major focus is on improving the base OS with significant improvements that will support new capabilities in the future in the file system, video, graphics, and virtual/augmented reality.

But don’t despair; there also are outward improvements that will be readily apparent to everyone when they boot the OS for the first time. We’ll cover both the inner and outer improvements coming in this new OS.

Under the Hood of High Sierra

APFS (Apple File System)

Apple has been rolling out its first file system upgrade for a while now. It’s already in iOS: now High Sierra brings APFS to the Mac. Apple touts APFS as a new file system optimized for Flash/SSD storage and featuring strong encryption, better and faster file handling, safer copying and moving of files, and other improved file system fundamentals.

We went into detail about the enhancements and improvements that APFS has over the previous file system, HFS+, in an earlier post. Many of these improvements, including enhanced performance, security and reliability of data, will provide immediate benefits to users, while others provide a foundation for future storage innovations and will require work by Apple and third parties to support in their products and services.

Most of us won’t notice these improvements, but we’ll benefit from better, faster, and safer file handling, which I think all of us can appreciate.

Video

High Sierra includes High Efficiency Video Encoding (HEVC, aka H.265), which preserves better detail and color while also introducing improved compression over H.264 (MPEG-4 AVC). Even existing Macs will benefit from the HEVC software encoding in High Sierra, but newer Mac models include HEVC hardware acceleration for even better performance.

MacBook Pro

Metal 2

macOS High Sierra introduces Metal 2, the next-generation of Apple’s Metal graphics API that was launched three years ago. Apple claims that Metal 2 provides up to 10x better performance in key areas. It provides near-direct access to the graphics processor (GPU), enabling the GPU to take control over key aspects of the rendering pipeline. Metal 2 will enhance the Mac’s capability for machine learning, and is the technology driving the new virtual reality platform on Macs.

audio video editor screenshot

Virtual Reality

We’re about to see an explosion of virtual reality experiences on both the Mac and iOS thanks to High Sierra and iOS 11. Content creators will be able to use apps like Final Cut Pro X, Epic Unreal 4 Editor, and Unity Editor to create fully immersive worlds that will revolutionize entertainment and education and have many professional uses, as well.

Users will want the new iMac with Retina 5K display or the upcoming iMac Pro to enjoy them, or any supported Mac paired with the latest external GPU and VR headset.

iMac and HTC virtual reality player

Outward Improvements

Siri

Siri logo

Expect a more nature voice from Siri in High Sierra. She or he will be less robotic, with greater expression and use of intonation in speech. Siri will also learn more about your preferences in things like music, helping you choose music that fits your taste and putting together playlists expressly for you. Expect Siri to be able to answer your questions about music-related trivia, as well.

Siri:  what does “scaramouche” refer to in the song Bohemian Rhapsody?

Photos

HD MacBook Pro screenshot

Photos has been redesigned with a new layout and new tools. A redesigned Edit view includes new tools for fine-tuning color and contrast and making adjustments within a defined color range. Some fun elements for creating special effects and memories also have been added. Photos now works with external apps such as Photoshop and Pixelmator. Compatibility with third-party extension adds printing and publishing services to help get your photos out into the world.

Safari

Safari logo

Apple claims that Safari in High Sierra is the world’s fastest desktop browser, outperforming Chrome and other browsers in a range of benchmark tests. They’ve also added autoplay blocking for those pesky videos that play without your permission and tracking blocking to help protect your privacy.

Can My Mac Run macOS High Sierra 10.13?

All Macs introduced in mid 2010 or later are compatible. MacBook and iMac computers introduced in late 2009 are also compatible. You’ll need OS X 10.7.5 “Lion” or later installed, along with at least 2 GB RAM and 8.8 GB of available storage to manage the upgrade.
Some features of High Sierra require an internet connection or an Apple ID. You can check to see if your Mac is compatible with High Sierra on Apple’s website.

Conquering High Sierra — What Do I Do Before I Upgrade?

Back Up That Mac!

It’s always smart to back up before you upgrade the operating system or make any other crucial changes to your computer. Upgrading your OS is a major change to your computer, and if anything goes wrong…well, you don’t want that to happen.

iMac backup screenshot

We recommend the 3-2-1 Backup Strategy to make sure your data is safe. What does that mean? Have three copies of your data. There’s the “live” version on your Mac, a local backup (Time Machine, another copy on a local drive or other computer), and an offsite backup like Backblaze. No matter what happens to your computer, you’ll have a way to restore the files if anything goes wrong. Need help understanding how to back up your Mac? We have you covered with a handy Mac backup guide.

Check for App and Driver Updates

This is when it helps to do your homework. Check with app developers or device manufacturers to find if their apps and devices have updates to work with High Sierra. Visit their websites or use the Check for Updates feature built into most apps (often found in the File or Help menus).

If you’ve downloaded apps through the Mac App Store, make sure to open them and click on the Updates button to download the latest updates.

Updating can be hit or miss when you’ve installed apps that didn’t come from the Mac App Store. To make it easier, visit the MacUpdate website. MacUpdate tracks changes to thousands of Mac apps.


Will Backblaze work with macOS High Sierra?

Yes. We’ve taken care to ensure that Backblaze works with High Sierra. We’ve already enhanced our Macintosh client to report the space available on an APFS container and we plan to add additional support for APFS capabilities that enhance Backblaze’s capabilities in the future.

Of course, we’ll watch Apple’s release carefully for any last minute surprises. We’ll officially offer support for High Sierra once we’ve had a chance to thoroughly test the release version.


Set Aside Time for the Upgrade

Depending on the speed of your Internet connection and your computer, upgrading to High Sierra will take some time. You’ll be able to use your Mac straightaway after answering a few questions at the end of the upgrade process.

If you’re going to install High Sierra on multiple Macs, a time-and-bandwidth-saving tip came from a Backblaze customer who suggested copying the installer from your Mac’s Applications folder to a USB Flash drive (or an external drive) before you run it. The installer routinely deletes itself once the upgrade process is completed, but if you grab it before that happens you can use it on other computers.

Where Do I get High Sierra?

Apple says that High Sierra will be available on September 25. Like other Mac operating system releases, Apple offers macOS 10.13 High Sierra for download from the Mac App Store, which is included on the Mac. As long as your Mac is supported and running OS X 10.7.5 “Lion” (released in 2012) or later, you can download and run the installer. It’s free. Thank you, Apple.

Better to be Safe than Sorry

Back up your Mac before doing anything to it, and make Backblaze part of your 3-2-1 backup strategy. That way your data is secure. Even if you have to roll back after an upgrade, or if you run into other problems, your data will be safe and sound in your backup.

Tell us How it Went

Are you getting ready to install High Sierra? Still have questions? Let us know in the comments. Tell us how your update went and what you like about the new release of macOS.

And While You’re Waiting for High Sierra…

While you’re waiting for Apple to release High Sierra on September 25, you might want to check out these other posts about using your Mac and Backblaze.

The post Backblaze’s Upgrade Guide for macOS High Sierra appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

New UK IP Crime Report Reveals Continued Focus on ‘Pirate’ Kodi Boxes

Post Syndicated from Andy original https://torrentfreak.com/new-uk-ip-crime-report-reveals-continued-focus-on-pirate-kodi-boxes-170908/

The UK’s Intellectual Property Office has published its annual IP Crime Report, spanning the period 2016 to 2017.

It covers key events in the copyright and trademark arenas and is presented with input from the police and trading standards, plus private entities such as the BPI, Premier League, and Federation Against Copyright Theft, to name a few.

The report begins with an interesting statistic. Despite claims that many millions of UK citizens regularly engage in some kind of infringement, figures from the Ministry of Justice indicate that just 47 people were found guilty of offenses under the Copyright, Designs and Patents Act during 2016. That’s down on the 69 found guilty in the previous year.

Despite this low conviction rate, 15% of all internet users aged 12+ are reported to have consumed at least one item of illegal content between March and May 2017. Figures supplied by the Industry Trust for IP indicate that 19% of adults watch content via various IPTV devices – often referred to as set-top, streaming, Android, or Kodi boxes.

“At its cutting edge IP crime is innovative. It exploits technological loopholes before they become apparent. IP crime involves sophisticated hackers, criminal financial experts, international gangs and service delivery networks. Keeping pace with criminal innovation places a burden on IP crime prevention resources,” the report notes.

The report covers a broad range of IP crime, from counterfeit sportswear to foodstuffs, but our focus is obviously on Internet-based infringement. Various contributors cover various aspects of online activity as it affects them, including music industry group BPI.

“The main online piracy threats to the UK recorded music industry at present are from BitTorrent networks, linking/aggregator sites, stream-ripping sites, unauthorized streaming sites and cyberlockers,” the BPI notes.

The BPI’s website blocking efforts have been closely reported, with 63 infringing sites blocked to date via various court orders. However, the BPI reports that more than 700 related URLs, IP addresses, and proxy sites/ proxy aggregators have also been rendered inaccessible as part of the same action.

“Site blocking has proven to be a successful strategy as the longer the blocks are in place, the more effective they are. We have seen traffic to these sites reduce by an average of 70% or more,” the BPI reports.

While prosecutions against music pirates are a fairly rare event in the UK, the Crown Prosecution Service (CPS) Specialist Fraud Division highlights that their most significant prosecution of the past 12 months involved a prolific music uploader.

As first revealed here on TF, Wayne Evans was an uploader not only on KickassTorrents and The Pirate Bay, but also some of his own sites. Known online as OldSkoolScouse, Evans reportedly cost the UK’s Performing Rights Society more than £1m in a single year. He was sentenced in December 2016 to 12 months in prison.

While Evans has been free for some time already, the CPS places particular emphasis on the importance of the case, “since it provided sentencing guidance for the Copyright, Designs and Patents Act 1988, where before there was no definitive guideline.”

The CPS says the case was useful on a number of fronts. Despite illegal distribution of content being difficult to investigate and piracy losses proving tricky to quantify, the court found that deterrent sentences are appropriate for the kinds of offenses Evans was accused of.

The CPS notes that various factors affect the severity of such sentences, not least the length of time the unlawful activity has persisted and particularly if it has done so after the service of a cease and desist notice. Other factors include the profit made by defendants and/or the loss caused to copyright holders “so far as it can accurately be calculated.”

Importantly, however, the CPS says that beyond issues of personal mitigation and timely guilty pleas, a jail sentence is probably going to be the outcome for others engaging in this kind of activity in future. That’s something for torrent and streaming site operators and their content uploaders to consider.

“[U]nless the unlawful activity of this kind is very amateur, minor or short-lived, or in the absence of particularly compelling mitigation or other exceptional circumstances, an immediate custodial sentence is likely to be appropriate in cases of illegal distribution of copyright infringing articles,” the CPS concludes.

But while a music-related trial provided the highlight of the year for the CPS, the online infringement world is still dominated by the rise of streaming sites and the now omnipresent “fully-loaded Kodi Box” – set-top devices configured to receive copyright-infringing live TV and VOD.

In the IP Crime Report, the Intellectual Property Office references a former US Secretary of Defense to describe the emergence of the threat.

“The echoes of Donald Rumsfeld’s famous aphorism concerning ‘known knowns’ and ‘known unknowns’ reverberate across our landscape perhaps more than any other. The certainty we all share is that we must be ready to confront both ‘known unknowns’ and ‘unknown unknowns’,” the IPO writes.

“Not long ago illegal streaming through Kodi Boxes was an ‘unknown’. Now, this technology updates copyright infringement by empowering TV viewers with the technology they need to subvert copyright law at the flick of a remote control.”

While the set-top box threat has grown in recent times, the report highlights the important legal clarifications that emerged from the BREIN v Filmspeler case, which found itself before the European Court of Justice.

As widely reported, the ECJ determined that the selling of piracy-configured devices amounts to a communication to the public, something which renders their sale illegal. However, in a submission by PIPCU, the Police Intellectual Property Crime Unit, box sellers are said to cast a keen eye on the legal situation.

“Organised criminals, especially those in the UK who distribute set-top boxes, are aware of recent developments in the law and routinely exploit loopholes in it,” PIPCU reports.

“Given recent judgments on the sale of pre-programmed set-top boxes, it is now unlikely criminals would advertise the devices in a way which is clearly infringing by offering them pre-loaded or ‘fully loaded’ with apps and addons specifically designed to access subscription services for free.”

With sellers beginning to clean up their advertising, it seems likely that detection will become more difficult than when selling was considered a gray area. While that will present its own issues, PIPCU still sees problems on two fronts – a lack of clear legislation and a perception of support for ‘pirate’ devices among the public.

“There is no specific legislation currently in place for the prosecution of end users or sellers of set-top boxes. Indeed, the general public do not see the usage of these devices as potentially breaking the law,” the unit reports.

“PIPCU are currently having to try and ‘shoehorn’ existing legislation to fit the type of criminality being observed, such as conspiracy to defraud (common law) to tackle this problem. Cases are yet to be charged and results will be known by late 2017.”

Whether these prosecutions will be effective remains to be seen, but PIPCU’s comments suggest an air of caution set to a backdrop of box-sellers’ tendency to adapt to legal challenges.

“Due to the complexity of these cases it is difficult to substantiate charges under the Fraud Act (2006). PIPCU have convicted one person under the Serious Crime Act (2015) (encouraging or assisting s11 of the Fraud Act). However, this would not be applicable unless the suspect had made obvious attempts to encourage users to use the boxes to watch subscription only content,” PIPCU notes, adding;

“The selling community is close knit and adapts constantly to allow itself to operate in the gray area where current legislation is unclear and where they feel they can continue to sell ‘under the radar’.”

More generally, pirate sites as a whole are still seen as a threat. As reported last month, the current anti-piracy narrative is that pirate sites represent a danger to their users. As a result, efforts are underway to paint torrent and streaming sites as risky places to visit, with users allegedly exposed to malware and other malicious content. The scare strategy is supported by PIPCU.

“Unlike the purchase of counterfeit physical goods, consumers who buy unlicensed content online are not taking a risk. Faulty copyright doesn’t explode, burn or break. For this reason the message as to why the public should avoid copyright fraud needs to be re-focused.

“A more concerted attempt to push out a message relating to malware on pirate websites, the clear criminality and the links to organized crime of those behind the sites are crucial if public opinion is to be changed,” the unit advises.

But while the changing of attitudes is desirable for pro-copyright entities, PIPCU says that winning over the public may not prove to be an easy battle. It was given a small taste of backlash itself, after taking action against the operator of a pirate site.

“The scale of the problem regarding public opinion of online copyright crime is evidenced by our own experience. After PIPCU executed a warrant against the owner of a streaming website, a tweet about the event (read by 200,000 people) produced a reaction heavily weighted against PIPCU’s legitimate enforcement action,” PIPCU concludes.

In summary, it seems likely that more effort will be expended during the next 12 months to target the set-top box threat, but there doesn’t appear to be an abundance of confidence in existing legislation to tackle all but the most egregious offenders. That being said, a line has now been drawn in the sand – if the public is prepared to respect it.

The full IP Crime Report 2016-2017 is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

AWS Hot Startups – August 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-august-2017/

There’s no doubt about it – Artificial Intelligence is changing the world and how it operates. Across industries, organizations from startups to Fortune 500s are embracing AI to develop new products, services, and opportunities that are more efficient and accessible for their consumers. From driverless cars to better preventative healthcare to smart home devices, AI is driving innovation at a fast rate and will continue to play a more important role in our everyday lives.

This month we’d like to highlight startups using AI solutions to help companies grow. We are pleased to feature:

  • SignalBox – a simple and accessible deep learning platform to help businesses get started with AI.
  • Valossa – an AI video recognition platform for the media and entertainment industry.
  • Kaliber – innovative applications for businesses using facial recognition, deep learning, and big data.

SignalBox (UK)

In 2016, SignalBox founder Alain Richardt was hearing the same comments being made by developers, data scientists, and business leaders. They wanted to get into deep learning but didn’t know where to start. Alain saw an opportunity to commodify and apply deep learning by providing a platform that does the heavy lifting with an easy-to-use web interface, blueprints for common tasks, and just a single-click to productize the models. With SignalBox, companies can start building deep learning models with no coding at all – they just select a data set, choose a network architecture, and go. SignalBox also offers step-by-step tutorials, tips and tricks from industry experts, and consulting services for customers that want an end-to-end AI solution.

SignalBox offers a variety of solutions that are being used across many industries for energy modeling, fraud detection, customer segmentation, insurance risk modeling, inventory prediction, real estate prediction, and more. Existing data science teams are using SignalBox to accelerate their innovation cycle. One innovative UK startup, Energi Mine, recently worked with SignalBox to develop deep networks that predict anomalous energy consumption patterns and do time series predictions on energy usage for businesses with hundreds of sites.

SignalBox uses a variety of AWS services including Amazon EC2, Amazon VPC, Amazon Elastic Block Store, and Amazon S3. The ability to rapidly provision EC2 GPU instances has been a critical factor in their success – both in terms of keeping their operational expenses low, as well as speed to market. The Amazon API Gateway has allowed for operational automation, giving SignalBox the ability to control its infrastructure.

To learn more about SignalBox, visit here.

Valossa (Finland)

As students at the University of Oulu in Finland, the Valossa founders spent years doing research in the computer science and AI labs. During that time, the team witnessed how the world was moving beyond text, with video playing a greater role in day-to-day communication. This spawned an idea to use technology to automatically understand what an audience is viewing and share that information with a global network of content producers. Since 2015, Valossa has been building next generation AI applications to benefit the media and entertainment industry and is moving beyond the capabilities of traditional visual recognition systems.

Valossa’s AI is capable of analyzing any video stream. The AI studies a vast array of data within videos and converts that information into descriptive tags, categories, and overviews automatically. Basically, it sees, hears, and understands videos like a human does. The Valossa AI can detect people, visual and auditory concepts, key speech elements, and labels explicit content to make moderating and filtering content simpler. Valossa’s solutions are designed to provide value for the content production workflow, from media asset management to end-user applications for content discovery. AI-annotated content allows online viewers to jump directly to their favorite scenes or search specific topics and actors within a video.

Valossa leverages AWS to deliver the industry’s first complete AI video recognition platform. Using Amazon EC2 GPU instances, Valossa can easily scale their computation capacity based on customer activity. High-volume video processing with GPU instances provides the necessary speed for time-sensitive workflows. The geo-located Availability Zones in EC2 allow Valossa to bring resources close to their customers to minimize network delays. Valossa also uses Amazon S3 for video ingestion and to provide end-user video analytics, which makes managing and accessing media data easy and highly scalable.

To see how Valossa works, check out www.WhatIsMyMovie.com or enable the Alexa Skill, Valossa Movie Finder. To try the Valossa AI, sign up for free at www.valossa.com.

Kaliber (San Francisco, CA)

Serial entrepreneurs Ray Rahman and Risto Haukioja founded Kaliber in 2016. The pair had previously worked in startups building smart cities and online privacy tools, and teamed up to bring AI to the workplace and change the hospitality industry. Our world is designed to appeal to our senses – stores and warehouses have clearly marked aisles, products are colorfully packaged, and we use these designs to differentiate one thing from another. We tell each other apart by our faces, and previously that was something only humans could measure or act upon. Kaliber is using facial recognition, deep learning, and big data to create solutions for business use. Markets and companies that aren’t typically associated with cutting-edge technology will be able to use their existing camera infrastructure in a whole new way, making them more efficient and better able to serve their customers.

Computer video processing is rapidly expanding, and Kaliber believes that video recognition will extend to far more than security cameras and robots. Using the clients’ network of in-house cameras, Kaliber’s platform extracts key data points and maps them to actionable insights using their machine learning (ML) algorithm. Dashboards connect users to the client’s BI tools via the Kaliber enterprise APIs, and managers can view these analytics to improve their real-world processes, taking immediate corrective action with real-time alerts. Kaliber’s Real Metrics are aimed at combining the power of image recognition with ML to ultimately provide a more meaningful experience for all.

Kaliber uses many AWS services, including Amazon Rekognition, Amazon Kinesis, AWS Lambda, Amazon EC2 GPU instances, and Amazon S3. These services have been instrumental in helping Kaliber meet the needs of enterprise customers in record time.

Learn more about Kaliber here.

Thanks for reading and we’ll see you next month!

-Tina

 

Choosing a Backup Provider (An Intro to Backblaze)

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/an-intro-to-backblaze/

Backblaze storage pods

Hi! We’re Backblaze — a backup and cloud storage company in sunny San Mateo, California. We’ve been in business since 2007, have a great track record, and have been on a mission to make backing up simple, inexpensive, and unobtrusive.

This post hopes to serve as an introduction to Backblaze for folks that might not be familiar with us. If you’re an avid reader already, you’ll note that we’ve written about many of these stories before. We won’t be offended if you tune back in for the next post. For everyone else, we thought we’d give you a look at who we are, how we’ve remained committed to unlimited backup, and why we think you should give us a shot.

A Bit About our Background

“We never had deep VC pockets to burn cash. If we were unsustainable, we would have gone out of business 9 years ago.” — Gleb Budman, Backblaze CEO and cofounder

Backblaze just turned 10 years old (thanks for the birthday wishes), and we have a solid track record as a successful company. Backblaze was started by five founders who went without salaries for two years until they got the company profitable. That’s an accomplishment in and of itself. A decade later, we’ve “only” raised $5.3 Million in funding. Don’t get us wrong, $5M is a lot of money, but we do think it shows that we run a responsible company by providing industry leading backup solutions at fair prices.

Backblaze is Committed To Customers & Unlimited Data Backup

Since 2007, many companies have come into the backup space. Many of those, at some point or another, offered an unlimited data storage plan. In 2017, Backblaze stands alone as the remaining player offering truly unlimited data backup.

What is “truly unlimited?” To us, that means getting our customers backed up as quickly as possible — with no limits on file types or sizes. While there are other backup companies out there, few of them if any, offer unlimited services at a flat rate. Many force customers to choose between service tiers, leading to confusion and customer apprehension about how much data they have now, or will have later. By contrast, we are focused on making Backblaze easy to use, and easy to understand.

At Backblaze, backup means running efficiently in the background to get a copy of your data securely into the cloud. Because we’re truly unlimited, we operate on an “exclusion” model. That means, by default, we backup all of the user data on your computer. Of course, you can exclude anything you don’t want backed up. Other companies operate on an “inclusion” model — you need to proactively select folders and files to be backed up. Why did we choose “exclusion” over “inclusion?” Because in our model, if you do nothing, you are fully covered. The alternative may leave you forgetting that new folder you created or those important files on your desktop.

Operating under the “inclusion model” would mean we would store less data (which would reduce our costs), but we’re not interested in reducing our costs if it means leaving our customers unprotected. Because of decisions like that, we’re currently storing over 350PB of our customer data.

Recently, we released version 5.0 of our industry leading computer backup product. Among other things in that release, we introduced file sharing via URL and faster backups. Through something called auto-threading, we’ve increased the speed at which your data gets backed up. Our internal tests have us over 10x the speed of the competition. That’s how one Reddit user backed up almost one terabyte of data in fewer than 24 hours.

Not only are we committed to our Personal Backup users, but we’re also a leading destination for businesses as well. Our latest Backblaze for Business update gives businesses of any size all of the same great backup and security, while also adding an administrative console and tools through our Backblaze Groups feature.

Best of all our Backblaze Groups feature is available to every Backblaze user, so if you’re the “Head of I.T.” for your household and managing a few computers, you can manage your families backups with Groups as well.

How We Do It

The question often comes up, “How do you do it? How can you continue offering unlimited backup in an era where most everyone else has stopped?” The answer lies in our origins — because we didn’t have a lot of cash, we had to create a sustainable business. Among other things, we created our own Storage Pods, Storage Vaults, and software. Our purpose-built infrastructure is what gives us incredibly low cloud storage costs. That same storage architecture is the basis for B2 Cloud Storage, the most affordable object storage on the planet (B2 is ¼ of the price of the offerings from Amazon, Microsoft and Google). Backblaze B2’s APIs, CLIs, and integration partners also give users the flexibility of backing up Macs, PCs, Linux, and servers their own way, if they want to take control.

We think that kind of dedication, innovation, and frugality supports our claim to be a trustworthy caretaker of your data — videos, photos, business docs, and other precious memories.

Give Us a Try!

Give us a try with our free 15-day trial. We’d love to welcome you to your new backup home.

Have questions? Sound off in the comments below! We love hearing from current customers as well as those looking to come aboard.

The post Choosing a Backup Provider (An Intro to Backblaze) appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.