Tag Archives: hive

How Much Money Can Pirate Bay Make From a Cryptocoin Miner?

Post Syndicated from Ernesto original https://torrentfreak.com/how-much-money-can-pirate-bay-make-from-a-cryptocoin-miner-170924/

In recent years many pirate sites have struggled to make a decent income.

Not only are more people using ad-blockers now, the ad-quality is also dropping as copyright holders actively go after this revenue source, trying to dry up the funds of pirate sites.

Last weekend The Pirate Bay tested a cryptocurrency miner to see whether that could offer a viable alternative. This created quite a bit of backlash, but there were plenty of positive comments too.

The question still remains whether the mining efforts can bring in enough money to pay all the bills.

The miner is provided by Coinhive which, at the time of writing, pays out 0.00015 XMR per 1M hashes. So how much can The Pirate Bay make from this?

To get a rough idea we did some back-of-the-envelope calculations, starting with the site’s visitor numbers.

SimilarWeb estimates that The Pirate Bay has roughly 315 million visits per month. On average, users spend five minutes on the site per “visit”. While we have reason to believe that this underestimates the site’s popularity, we’ll use it as an illustration.

We spoke to Coinhive and they estimate that a user with a mid-range laptop would have a hashrate of 30 h/s.

In Pirate Bay’s case this would translate to 30 hashes * 300 seconds * 315M visits = 2,835,000M hashes per month. If the miner is throttled at 30% this would drop to 850,000M hashes.

If Coinhive pays out 0.00015 XMR per million hashes, TPB would get 127.5 XMR per month, which is roughly $12,000 at the moment. Since the miner doesn’t appear on all pages and because some may actively block it, this number will drop a bit further.

Keep in mind that this is just an illustration using several estimated variables which may vary greatly over time. Still, it gives a broad idea of the potential.

Since Pirate Bay tested the miner several other sites jumped on board as well. We’ll keep a close eye on the developments and hope we can share some real data in the future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Are Cryptocurrency Miners The Future for Pirate Sites?

Post Syndicated from Ernesto original https://torrentfreak.com/are-cryptocurrency-miners-the-future-for-pirate-sites-170921/

Last weekend The Pirate Bay surprised friend and foe by adding a Javascript-based cryptocurrency miner to its website.

The miner utilizes CPU power from visitors to generate Monero coins for the site, providing an extra revenue source.

Initially, this caused the CPUs of visitors to max out due to a configuration error, but it was later adjusted to be less demanding. Still, there was plenty of discussion on the move, with greatly varying opinions.

Some criticized the site for “hijacking” their computer resources for personal profit, without prior warning. However, there are also people who are happy to give something back to TPB, especially if it can help the site to remain online.

Aside from the configuration error, there was another major mistake everyone agreed on. The Pirate Bay team should have alerted its visitors to this change beforehand, and not after the fact, as they did last weekend.

Despite the sensitivities, The Pirate Bay’s move has inspired others to follow suit. Pirate linking site Alluc.ee is one of the first. While they use the same mining service, their implementation is more elegant.

Alluc shows how many hashes are mined and the site allows users to increase or decrease the CPU load, or turn the miner off completely.

Alluc.ee miner

Putting all the controversy aside for a minute, the idea to let visitors mine coins is a pretty ingenious idea. The Pirate Bay said it was testing the feature to see if it’s possible as a replacement for ads, which might be much needed in the future.

In recent years many pirate sites have struggled to make a decent income. Not only are more people using ad-blockers now, the ad-quality is also dropping as copyright holders actively go after this revenue source, trying to dry up the funds of pirate sites. And with Chrome planning to add a default ad-blocker to its browser, the outlook is grim.

A cryptocurrency miner might alleviate this problem. That is, as long as ad-blockers don’t start to interfere with this revenue source as well.

Interestingly, this would also counter one of the main anti-piracy talking points. Increasingly, industry groups are using the “public safety” argument as a reason to go after pirate sites. They point to malicious advertisements as a great danger, hoping that this will further their calls for tougher legislation and enforcement.

If The Pirate Bay and other pirate sites can ditch the ads, they would be less susceptible to these and other anti-piracy pushes. Of course, copyright holders could still go after the miner revenues, but this might not be easy.

TorrentFreak spoke to Coinhive, the company that provides the mining service to The Pirate Bay, and they don’t seem eager to take action without a court order.

“We don’t track where users come from. We are just providing servers and a script to submit hashes for the Monero blockchain. We don’t see it as our responsibility to determine if a website is ‘valid’ and we don’t have the technical capabilities to do so,” a Coinhive representative says.

We also contacted several site owners and thus far the response has been mixed. Some like the idea and would consider adding a miner, if it doesn’t affect visitors too much. Others are more skeptical and don’t believe that the extra revenue is worth the trouble.

The Pirate Bay itself, meanwhile, has completed its test run and has removed the miner from the site. They will now analyze the results before deciding whether or not it’s “the future” for them.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The Pirate Bay Website Runs a Cryptocurrency Miner

Post Syndicated from Ernesto original https://torrentfreak.com/the-pirate-bay-website-runs-a-cryptocurrency-miner-170916/

Four years ago many popular torrent sites added an option to donate via Bitcoin. The Pirate Bay was one of the first to jump on board and still lists its address on the website.

While there’s nothing wrong with using Bitcoin as a donation tool, adding a Javascript cryptocurrency miner to a site is of a totally different order.

A few hours ago many Pirate Bay users began noticing that their CPU usage increased dramatically when they browsed certain Pirate Bay pages. Upon closer inspection, this spike appears to have been caused by a Bitcoin miner embedded on the site.

The code in question is tucked away in the site’s footer and uses a miner provided by Coinhive. This service offers site owners the option to convert the CPU power of users into Monero coins.

The miner does indeed appear to increase CPU usage quite a bit. It is throttled at different rates (we’ve seen both 0.6 and 0.8) but the increase in resources is immediately noticeable.

The miner is not enabled site-wide. When we checked, it appeared in the search results and category listings, but not on the homepage or individual torrent pages.

There has been no official comment from the site operators on the issue (update, see below), but many users have complained about it. In the official site forums, TPB supermoderator Sid is clearly not in agreement with the site’s latest addition.

“That really is serious, so hopefully we can get some action on it quickly. And perhaps get some attention for the uploading and commenting bugs while they’re at it,” Sid writes.

Like many others, he also points out that blocking or disabling Javascript can stop the automatic mining. This can be done via browser settings or through script blocker addons such as NoScript and ScriptBlock. Alternatively, people can block the miner URL with an ad-blocker.

Whether the miner is a new and permanent tool, or perhaps triggered by an advertiser, is unknown at the point. When we hear more this article will be updated accordingly.

Update: We were told that the miner is being tested for a short period as a new way to generate revenue. This could eventually replace the ads on the site. More info may be revealed later.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Strategies for Backing Up Windows Computers

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/strategies-for-backing-up-windows-computers/

Windows 7, Windows 8, Windows 10 logos

There’s a little company called Apple making big announcements this week, but about 45% of you are on Windows machines, so we thought it would be a good idea to devote a blog post today to Windows users and the options they have for backing up Windows computers.

We’ll be talking about the various options for backing up Windows desktop OS’s 7, 8, and 10, and Windows servers. We’ve written previously about this topic in How to Back Up Windows, and Computer Backup Options, but we’ll be covering some new topics and ways to combine strategies in this post. So, if you’re a Windows user looking for shelter from all the Apple hoopla, welcome to our Apple Announcement Day Windows Backup Day post.

Windows laptop

First, Let’s Talk About What We Mean by Backup

This might seem to our readers like an unneeded appetizer on the way to the main course of our post, but we at Backblaze know that people often mean very different things when they use backup and related terms. Let’s start by defining what we mean when we say backup, cloud storage, sync, and archive.

Backup
A backup is an active copy of the system or files that you are using. It is distinguished from an archive, which is the storing of data that is no longer in active use. Backups fall into two main categories: file and image. File backup software will back up whichever files you designate by either letting you include files you wish backed up or by excluding files you don’t want backed up, or both. An image backup, sometimes called a disaster recovery backup or a system clone, is useful if you need to recreate your system on a new drive or computer.
The first backup generally will be a full backup of all files. After that, the backup will be incremental, meaning that only files that have been changed since the full backup will be added. Often, the software will keep changed versions of the files for some period of time, so you can maintain a number of previous revisions of your files in case you wish to return to something in an earlier version of your file.
The destination for your backup could be another drive on your computer, an attached drive, a network-attached drive (NAS), or the cloud.
Cloud Storage
Cloud storage vendors supply data storage just as a utility company supplies power, gas, or water. Cloud storage can be used for data backups, but it can also be used for data archives, application data, records, or libraries of photos, videos, and other media.
You contract with the service for storing any type of data, and the storage location is available to you via the internet. Cloud storage providers generally charge by some combination of data ingress, egress, and the amount of data stored.
Sync
File sync is useful for files that you wish to have access to from different places or computers, or for files that you wish to share with others. While sync has its uses, it has limitations for keeping files safe and how much it could cost you to store large amounts of data. As opposed to backup, which keeps revision of files, sync is designed to keep two or more locations exactly the same. Sync costs are based on how much data you sync and can get expensive for large amounts of data.
Archive
A data archive is for data that is no longer in active use but needs to be saved, and may or may not ever be retrieved again. In old-style storage parlance, it is called cold storage. An archive could be stored with a cloud storage provider, or put on a hard drive or flash drive that you disconnect and put in the closet, or mail to your brother in Idaho.

What’s the Best Strategy for Backing Up?

Now that we’ve got our terminology clear, let’s talk backup strategies for Windows.

At Backblaze, we advocate the 3-2-1 strategy for safeguarding your data, which means that you should maintain three copies of any valuable data — two copies stored locally and one stored remotely. I follow this strategy at home by working on the active data on my Windows 10 desktop computer (copy one), which is backed up to a Drobo RAID device attached via USB (copy two), and backing up the desktop to Backblaze’s Personal Backup in the cloud (copy three). I also keep an image of my primary disk on a separate drive and frequently update it using Windows 10’s image tool.

I use Dropbox for sharing specific files I am working on that I might wish to have access to when I am traveling or on another computer. Once my subscription with Dropbox expires, I’ll use the latest release of Backblaze that has individual file preview with sharing built-in.

Before you decide which backup strategy will work best for your situation, you’ll need to ask yourself a number of questions. These questions include where you wish to store your backups, whether you wish to supply your own storage media, whether the backups will be manual or automatic, and whether limited or unlimited data storage will work best for you.

Strategy 1 — Back Up to a Local or Attached Drive

The first copy of the data you are working on is often on your desktop or laptop. You can create a second copy of your data on another drive or directory on your computer, or copy the data to a drive directly attached to your computer, such as via USB.

external hard drive and RAID NAS devices

Windows has built-in tools for both file and image level backup. Depending on which version of Windows you use, these tools are called Backup and Restore, File History, or Image. These tools enable you to set a schedule for automatic backups, which ensures that it is done regularly. You also have the choice to use Windows Explorer (aka File Explorer) to manually copy files to another location. Some external disk drives and USB Flash Drives come with their own backup software, and other backup utilities are available for free or for purchase.

Windows Explorer File History screenshot

This is a supply-your-own media solution, meaning that you need to have a hard disk or other medium available of sufficient size to hold all your backup data. When a disk becomes full, you’ll need to add a disk or swap out the full disk to continue your backups.

We’ve written previously on this strategy at Should I use an external drive for backup?

Strategy 2 — Back Up to a Local Area Network (LAN)

Computers, servers, and network-attached-storage (NAS) on your local network all can be used for backing up data. Microsoft’s built-in backup tools can be used for this job, as can any utility that supports network protocols such as NFS or SMB/CIFS, which are common protocols that allow shared access to files on a network for Windows and other operatings systems. There are many third-party applications available as well that provide extensive options for managing and scheduling backups and restoring data when needed.

NAS cloud

Multiple computers can be backed up to a single network-shared computer, server, or NAS, which also could then be backed up to the cloud, which rounds out a nice backup strategy, because it covers both local and remote copies of your data. System images of multiple computers on the LAN can be included in these backups if desired.

Again, you are managing the backup media on the local network, so you’ll need to be sure you have sufficient room on the destination drives to store all your backup data.

Strategy 3 — Back Up to Detached Drive at Another Location

You may have have read our recent blog post, Getting Data Archives Out of Your Closet, in which we discuss the practice of filling hard drives and storing them in a closet. Of course, to satisfy the off-site backup guideline, these drives would need to be stored in a closet that’s in a different geographical location than your main computer. If you’re willing to do all the work of copying the data to drives and transporting them to another location, this is a viable option.

stack of hard drives

The only limitation to the amount of backup data is the number of hard drives you are willing to purchase — and maybe the size of your closet.

Strategy 4 — Back Up to the Cloud

Backing up to the cloud has become a popular option for a number of reasons. Internet speeds have made moving large amounts of data possible, and not having to worry about supplying the storage media simplifies choices for users. Additionally, cloud vendors implement features such as data protection, deduplication, and encryption as part of their services that make cloud storage reliable, secure, and efficient. Unlimited cloud storage for data from a single computer is a popular option.

A backup vendor likely will provide a software client that runs on your computer and backs up your data to the cloud in the background while you’re doing other things, such as Backblaze Personal Backup, which has clients for Windows computers, Macintosh computers, and mobile apps for both iOS and Android. For restores, Backblaze users can download one or all of their files for free from anywhere in the world. Optionally, a 128 GB flash drive or 4 TB drive can be overnighted to the customer, with a refund available if the drive is returned.

Storage Pod in the cloud

Backblaze B2 Cloud Storage is an option for those who need capabilities beyond Backblaze’s Personal Backup. B2 provides cloud storage that is priced based on the amount of data the customer uses, and is suitable for long-term data storage. B2 supports integrations with NAS devices, as well as Windows, Macintosh, and Linux computers and servers.

Services such as BackBlaze B2 are often called Cloud Object Storage or IaaS (Infrastructure as a Service), because they provide a complete solution for storing all types of data in partnership with vendors who integrate various solutions for working with B2. B2 has its own API (Application Programming Interface) and CLI (Command-line Interface) to work with B2, but B2 becomes even more powerful when paired with any one of a number of other solutions for data storage and management provided by third parties who offer both hardware and software solutions.

Backing Up Windows Servers

Windows Servers are popular workstations for some users, and provide needed network services for others. They also can be used to store backups from other computers on the network. They, in turn, can be backed up to attached drives or the cloud. While our Personal Backup client doesn’t support Windows servers, our B2 Cloud Storage has a number of integrations with vendors who supply software or hardware for storing data both locally and on B2. We’ve written a number of blog posts and articles that address these solutions, including How to Back Up your Windows Server with B2 and CloudBerry.

Sometimes the Best Strategy is to Mix and Match

The great thing about computers, software, and networks is that there is an endless number of ways to combine them. Our users and hardware and software partners are ingenious in configuring solutions that save data locally, copy it to an attached or network drive, and then store it to the cloud.

image of cloud backup

Among our B2 partners, Synology, CloudBerry Archiware, QNAP, Morro Data, and GoodSync have integrations that allow their NAS devices to store and retrieve data to and from B2 Cloud Storage. For a drag-and-drop experience on the desktop, take a look at CyberDuck, MountainDuck, and Dropshare, which provide users with an easy and interactive way to store and use data in B2.

If you’d like to explore more options for combining software, hardware, and cloud solutions, we invite you to browse the integrations for our many B2 partners.

Have Questions?

Windows versions, tools, and backup terminology all can be confusing, and we know how hard it can be to make sense of all of it. If there’s something we haven’t addressed here, or if you have a question or contribution, please let us know in the comments.

And happy Windows Backup Day! (Just don’t tell Apple.)

The post Strategies for Backing Up Windows Computers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/733389/rss

Security updates have been issued by Debian (freerdp, mbedtls, tiff, and tiff3), Fedora (chromium, krb5, libstaroffice, mbedtls, mingw-libidn2, mingw-openjpeg2, openjpeg2, and rubygems), Mageia (bzr, libarchive, libgcrypt, and tcpdump), openSUSE (gdk-pixbuf, libidn2, mpg123, postgresql94, postgresql96, and xen), Slackware (bash, mariadb, and tcpdump), and SUSE (evince and kernel).

HiveMQ 3.2.7 released

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/hivemq-3-2-7-released/

The HiveMQ team is pleased to announce the availability of HiveMQ 3.2.7. This is a maintenance release for the 3.2 series and brings the following improvements:

  • Added logging of the replication state when a node joins a cluster
  • Fixed a reference counting issue
  • Fixed an issue that could cause a client to not receive queued messages until it reconnects again
  • Fixed an issue that could cause other subscriptions to be removed for the same client if an unsubscribe message is sent to the wildcard topic
  • Fixed an issue that caused clients to not be able to change the QoS for their subscriptions
  • Fixed an rare issue that could cause a client to not receive any messages after a successful connection
  • Fixed an cosmetic issue that caused IOExceptions to be wrongfully logged when a client disconnected
  • Fixed an issue that could cause HiveMQ to not shut down properly when a plugin triggers a shutdown via UnrecoverableExceptions
  • Fixed an issue that could cause cluster nodes to not be operational on startup
  • Fixed verbose RESTService logging
  • Fixed an issue that could cause messages to be queued although there is no need to queue them
  • Fixed an issue that could lead to incorrect subscription count metrics
  • Fixed an issue that could cause the loss of retained messages when using the stateful-cluster.sh script

You can download the new HiveMQ version here.

We strongly recommend to upgrade if you are an HiveMQ 3.2.x user.

Have a great day,
The HiveMQ Team

How to Migrate All of Your Data from CrashPlan

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/how-to-migrate-your-data-from-crashplan/

Migrating from Crashplan

With CrashPlan deciding to leave the consumer backup space, ex-customers are faced with having to migrate their data to a new cloud backup service. Uploading your data from your computer to a new service is onerous enough, but one thing that seems to be getting overlooked is the potential for the files that reside in CrashPlan Central, but not on your computer, to be lost during the migration to a new provider. Here’s an overview of the migration process to make sure you don’t lose data you wish to keep.

Why would you lose files?

By default CrashPlan for Home does not delete files from CrashPlan Central (their cloud storage servers) after they are uploaded from your computer. Unless you changed your CrashPlan “Frequency and versions” settings, all of the files you uploaded are still there. This includes all the files you deleted from your computer. For example, you may have a folder of old videos that you uploaded to CrashPlan and then deleted from your computer because of space concerns. This folder of old video files is still in your CrashPlan archive. It is very likely you have files stored in CrashPlan Central that are not on your computer. Such files are now in migration limbo, and we’ll get to those files in a minute, but first…

Get Started Now

CrashPlan was kind enough to make sure that everyone will have at least 60 days from August 22nd, 2017 to transfer their data. Most people will have more time, but everyone must be migrated by the end of October 2018.

Regardless, it’s better to get started now as it can take some time to upload your data to another backup provider. The first step in migrating your files is to choose a new cloud backup provider. Let’s assume you choose Backblaze Personal Backup.

Crashplan Migration Steps

The first step is to migrate all the data that is currently on your computer to Backblaze. Once you install Backblaze on your computer, it will automatically scan your system to locate the data to upload to Backblaze. The upload will continue automatically. You can speed up or slow down how quickly Backblaze will upload files by adjusting your performance settings for your Mac or for your Windows PC. In addition, any changes and new files are automatically uploaded as well. Backblaze keeps up to 30 days’ worth of file versions and always keeps the most recent version of every data file currently on your computer.

Question — Should you remove CrashPlan from your computer before migrating to Backblaze?
Answer — No.

If your computer fails during the upload to Backblaze, you’ll still have a full backup with CrashPlan. During the upload period you may want to decrease the resources (CPU and Network) used by CrashPlan and increase the resources available to Backblaze. You can “pause” CrashPlan for up to 24 hours, but that is a manual operation and may not be practical. In any case, you’ll also need to have CrashPlan around to recover those files in migration limbo.

Saving the Files in Migration Limbo

Let’s divide this process into two major parts: recovering the files and getting them stored somewhere else.

    Recovering Files in Limbo

    1. Choose a recovery device — Right now you don’t know how many files you will need to recover, but once you know that information, you’ll need a device to hold them. We recommend that you use an external USB hard drive as your recovery device. If you believe you will only have a small number of limbo files, then a thumb drive will work.
    2. Locate the Limbo files — Open the CrashPlan App on your computer and select the “Restore” menu item on the left. As an example, you can navigate to a given folder and see the files in that folder as shown below:

    Restore files from Crashplan

    1. Click on the “Show deleted files” box as shown below to display all the files, including those that are deleted. As an example, the same files listed above are shown below, and the list now includes the deleted file IMG_6533.JPG.

    Finding deleted file in Crasphlan Central

    1. Deleted files can be visually identified via the different icon and the text shown grayed out. Navigate through your folder/directory structure and select the files you wish to recover. Yes, this can take a while. You only need to click on the deleted files as the other files are currently still on your computer and being backed up directly to Backblaze.
    2. Make sure you change the restore location. By default this is set to “Desktop.” Click on the word “Desktop” to toggle through your options. Click on the option, and you’ll be able to change your backup destination to any mounted device connected to your system. As an example, we’ve chosen to restore the deleted files to the USB external drive named “Backblaze.”
    3. Click “Restore” to restore the files you have selected.

    Storing the Restored Limbo Files

    Now that you have an external USB hard drive with the recovered Limbo files, let’s get them saved to the cloud. With Backblaze you have two options. The first option is to make the Limbo files part of your Backblaze backup. You can do this in two ways.

    1. Copy the Limbo files to your computer and they will be automatically backed up to Backblaze with the rest of your files.
    2. – or –

    3. Connect the external USB Hard Drive to your computer and configure Backblaze to back up that device. This device should remain connected to the computer while the backup occurs, and then once every couple of weeks to make sure that nothing has changed on the hard drive.

    If neither of the above solutions works for you, the other option is to use the Backblaze B2 Cloud Storage service.

What is Backblaze B2 Cloud Storage?

B2 Cloud Storage is a service for storing files in the cloud. Files are available for download at any time, either through the API or through a browser-compatible URL. Files stored in the B2 cloud are not deleted unless you explicitly delete them. In that way it is very similar to CrashPlan. Here’s some help, if you are unsure about the difference between Backblaze Personal Backup and Backblaze B2.

There are four ways to access B2: 1) a Web GUI, 2) a Command-line interface (CLI), 3) an API, and 4) via partner integrations, such a CloudBerry, Synology, Arq, QNAP, GoodSync and many more you can find on our B2 integrations page. Most CrashPlan users will find either the Web GUI or a partner integration to be the way to go. Note: There is an additional cost to use the B2 service, and we’ll get to that shortly.

  1. Since you already have a Backblaze account, you just have to log in to your account. Click on “My Settings” on the left hand navigation and enable B2 Cloud Storage. If you haven’t already done so you will be asked to provide a Mobile number for contact and authentication purposes.
  2. To use the B2 Web GUI, you create a B2 “bucket” and then drag-and-drop the files into the B2 bucket.
  3. You can also choose to use a B2 partner integration to store your data into B2.

If you use B2 to store your Limbo files rescued from CrashPlan and you use Backblaze to back up your computer, you will be able to access and manage all of your data from your one Backblaze account.

What does all this cost?

If you are only going to use Backblaze Personal Backup to back up your computer, then you will pay $50/year per computer.

If you decide to combine the use of Backblaze Personal Backup and Backblaze B2, let’s assume you have 500 GB of data to back up from your computer to Backblaze. Let’s also assume you have to store 100 GB of data in Backblaze B2 that you rescued from CrashPlan limbo. Your annual cost would be:

    To back up 500 GB:

    1. — Backblaze Personal Backup — 1 year/1 computer — $50.00

    To archive 100 GB:

    1. — Backblaze B2 — 100 GB @ $0.005/GB/month for 12 months — $6.00

    The Total Annual Cost to store your CrashPlan data in Backblaze, including your recovered deleted files, is $56.00.

Migrating from CrashPlan to Carbonite

If you are considering migrating your CrashPlan for Home account to Carbonite, you will still have to upload your data to Carbonite. There is no automatic process to copy the files from CrashPlan to Carbonite. You will also have to recover the Limbo files we’ve been speaking about using the process we’ve outlined above. In summary, when moving from CrashPlan for Home to any other vendor you will have to reupload your data to the new vendor.

One More Option

There is one more option you can use when you move your data from CrashPlan to another cloud service. You can download all of your data from CrashPlan, including the active and deleted files, to a local computer or device such an external USB Hard Drive. Then you can upload all that data to the new cloud backup provider. Of course this will mean all that data makes two trips through your local network — down and then back up. This will take time and could be very taxing on any bandwidth limits you may have in place from your network provider.

If you have the bandwidth and the time, this can be a good option, as all your files stored in CrashPlan Central are included in your backup. But, if you have a lot of data and/or a slow internet connection, this can take a really, really long time.

Join Our Webinar for More Information

You can sign up for our upcoming webinar, “Migrating from CrashPlan for Home to Backblaze” on September 7th at 10:00 am PDT if you’d like to learn more about the migration methods we covered today. Please note, you will need to register for this webinar by either signing up for a Backblaze BrightTALK channel account or using your existing BrightTALK account.

CrashPlan Replacement

Now that you are faced with replacing your CrashPlan for Home account, don’t wait until your contract is about to run out. Give yourself at least a couple of months to make sure all the data, including the Limbo data, is safely migrated somewhere else.

Also, regardless of which option you chose for migrating your data from CrashPlan to a new cloud backup service, once everything is moved and you’ve checked to make sure you got everything, then and only then should you turn off your CrashPlan account and uninstall CrashPlan.

An Invitation

If you are a CrashPlan for Home user going through the migration to a new cloud backup service, and have ideas to help other users through the migration process, let us know in the comments. We’ll update this post with any relevant ideas from the community.

The post How to Migrate All of Your Data from CrashPlan appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Mod your Nerf gun with a Pi

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/mod-nerf-gun-pi/

Michael Darby, who blogs at 314reactor, has created a new Raspberry Pi build, and it’s pretty darn cool. Though it’s not the first Raspberry Pi-modded Nerf gun we’ve seen, it’s definitely one of the most complex!

Nerf Gun Ammo Counter / Range Finder – Raspberry Pi

An ammo counter and range finder made from a Raspberry Pi for a Nerf Gun.

Nerf guns

Nerf guns are toy dart guns that have been on the market since the early 1990s. They are popular with kids and adults who enjoy playing paintball, laser tag, and first-person shooter video games. Michael loves Nerf guns, and he wanted to give his toy a sci-fi overhaul, making it look and function more like a gun that an avatar might use in Half-Life, Quake, or Doom.

Modding a Nerf gun

A busy and creative member of the Raspberry Pi community, Michael has previously delighted us with his Windows 98 wristwatch. Now, he has upgraded his Nerf gun with a rangefinder and an ammo counter by adding a Pi, a Pimoroni Rainbow HAT, and some sensors.

Setting up a rangefinder was straightforward. Michael fixed an ultrasonic distance sensor pointing in the direction of the gun’s barrel. Live information about how far away he is from his target is shown on the Rainbow HAT’s alphanumeric display.

View of Michael Darby's nerf gun range finder

To create an ammo counter, Michael had to follow a more circuitous route. Since he couldn’t think of a way to read out how many darts are in the Nerf gun’s magazine, he ended up counting how many darts have been shot instead. This data is collected via a proximity sensor, a device that can measure shorter distances than an ultrasonic sensor. Michael aimed the sensor towards the end of the barrel, attaching it with Blu-Tack.

View of Michael Darby's nerf gun proximity sensor

The number of shots left in the magazine is indicated by the seven LEDs above the Rainbow HAT’s alphanumeric display. The countdown works for more than seven darts, thanks to colour coding: the LEDs count down first in red, then in orange, and finally in green.

In a Python script running on the Pi, Michael has included a default number of shots per magazine. When he changes a magazine, he uses one of the HAT’s buttons as a ‘Reload’ button, resetting the counter. He has also set up the HAT so that the number of available shots can be entered manually instead.

Nerf gun modding tutorial

On Michael’s blog you will find a thorough step-by-step guide to how he created this build. He has also included his code, and links to all the components, software installation guides, and test scripts he has used. So head on over there if you’re keen to mod your own nerf gun like this, and take a look at some of his other projects while you’re there!

Michael welcomes suggestions for how to improve upon his mods, especially for how to count shots in a magazine automatically. Do you have an idea? Let usand himknow in the comments!

Toy mods

Over the years, we’ve covered quite a few fun toy upgrades, and some that may have to be approached with caution. The Pi-powered busy board for babies, the ‘weaponized’ teddy bear, and the inevitable smart Fisher Price phone are just a few from our archives.

What’s your favourite childhood toy, and how could it be improved by the addition of a Pi? Share your ideas with us in the comments below.

The post Mod your Nerf gun with a Pi appeared first on Raspberry Pi.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

Internet Archive Blocked in 2,650 Site Anti-Piracy Sweep

Post Syndicated from Andy original https://torrentfreak.com/internet-archive-blocked-in-2650-site-anti-piracy-sweep-170810/

Reports of sites becoming mysteriously inaccessible in India have been a regular occurance over the past several years. In many cases, sites simply stop functioning, leaving users wondering whether sites are actually down or whether there’s a technical issue.

Due to their increasing prevalence, fingers are often pointed at so-called ‘John Doe’ orders, which are handed down by the court to prevent Internet piracy. Often sweeping in nature (and in some cases pre-emptive rather than preventative), these injunctions have been known to block access to both file-sharing platforms and innocent bystanders.

Earlier this week (and again for no apparent reason), the world renowned Internet Archive was rendered inaccessible to millions of users in India. The platform, which is considered by many to be one of the Internet’s most valued resources, hosts more than 15 petabytes of data, a figure which grows on a daily basis. Yet despite numerous requests for information, none was forthcoming from authorities.

The ‘blocked’ message seen by users accessing Archive.org

Quoted by local news outlet Medianama, Chris Butler, Office Manager at the Internet Archive, said that their attempts to contact the Indian Department of Telecom (DoT) and the Ministry of Electronics and Information Technology (Meity) had proven fruitless.

Noting that site had previously been blocked in India, Butler said they were no clearer on the reasons why the same kind of action had seemingly been taken this week.

“We have no information about why a block would have been implemented,” he said. “Obviously, we are disappointed and concerned by this situation and are very eager to understand why it’s happening and see full access restored to archive.org.”

Now, however, the mystery has been solved. The BBC says a local government agency provided a copy of a court order obtained by two Bollywood production companies who are attempting to slow down piracy of their films in India.

Issued by a local judge, the sweeping order compels local ISPs to block access to 2,650 mainly file-sharing websites, including The Pirate Bay, RARBG, the revived KickassTorrents, and hundreds of other ‘usual suspects’. However, it also includes the URL for the Internet Archive, hence the problems with accessibility this week.

The injunction, which appears to be another John Doe order as previously suspected, was granted by the High Court of the Judicature at Madras on August 2, 2017. Two film productions companies – Prakash Jah Productions and Red Chillies Entertainment – obtained the order to protect their films Lipstick Under My Burkha and Jab Harry Met Sejal.

While India-based visitors to blocked resources are often greeted with a message saying that domains have been blocked at the orders of the Department of Telecommunications, these pages never give a reason why.

This always leads to confusion, with news outlets having to pressure local government agencies to discover the reason behind the blockades. In the interests of transparency, providing a link to a copy of a relevant court order would probably benefit all involved.

A few hours ago, the Internet Archive published a statement questioning the process undertaken before the court order was handed down.

“Is the Court aware of and did it consider the fact that the Internet Archive has a well-established and standard procedure for rights holders to submit take down requests and processes them expeditiously?” the platform said.

“We find several instances of take down requests submitted for one of the plaintiffs, Red Chillies Entertainments, throughout the past year, each of which were processed and responded to promptly.

“After a preliminary review, we find no instance of our having been contacted by anyone at all about these films. Is there a specific claim that someone posted these films to archive.org? If so, we’d be eager to address it directly with the claimant.”

But while the Internet Archive appears to be the highest profile collateral damage following the ISP blocks, it isn’t the only victim. Now that the court orders have become available (1,2), it’s clear that other non-pirate entities have also been affected including news site WN.com, website hosting service Weebly, and French ISP Free.fr.

Also, in a sign that sites aren’t being checked to see if they host the movies in question, one of the orders demands that former torrent index BitSnoop is blocked. The site shut down earlier this year. The same is true for Shaanig.org.

This is not the first time that the Internet Archive has been blocked in India. In 2014/2015, Archive.org was rendered inaccessible after it was accused of hosting extremist material. In common with Google, the site copies and stores huge amounts of data, much of it in automated processes. This can leave it exposed to these kinds of accusations.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Getting Your Data into the Cloud is Just the Beginning

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/cost-data-of-transfer-cloud-storage/

Total Cloud Storage Cost

Organizations should consider not just the cost of getting their data into the cloud, but also long-term costs for storage and retrieval when deciding which cloud storage solution meets their needs.

As cloud storage has become ubiquitous, organizations large and small are joining in. For larger organizations the lure of reducing capital expenses and their associated operational costs is enticing. For smaller organizations, cloud storage often replaces an unmanageable closet full of external hard drives, thumb drives, SD cards, and other devices. With terabytes or even petabytes of data, the common challenge facing organizations, large and small, is how to get their data up to the cloud.

Transferring Data to the Cloud

The obvious solution for getting your data to the cloud is to upload your data from your internal network through the internet to the cloud storage vendor you’ve selected. Cloud storage vendors don’t charge you for uploading your data to their cloud, but you, of course, have to pay your network provider and that’s where things start to get interesting. Here are a few things to consider.

  • The initial upload: Unless you are just starting out, you will have a large amount of data you want to upload to the cloud. This could be data you wish to archive or have had archived previously, for example data stored on LTO tapes or kept stored on external hard drives.
  • Pipe size: This is the amount of upload bandwidth of your network connection. This is measured in Mbps (megabits per second). Remember, your data is stored in MB (megabytes), so an upload connection of 80 Mbps will transfer no more than 10 MB of data per second and most likely a lot less.
  • Cost and caps: In some places, organizations pay a flat monthly rate for a specified level of service (speed) for internet access. In other locations, internet access is metered, or pay as you go. In either case, there can be internet service caps that limit or completely stop data transfer once you reach your contracted threshold.

One or more of these challenges has the potential to make the initial upload of your data expensive and potentially impossible. You could wait until cloud storage companies start buying up internet providers and make data upload cheap (or free with Amazon Prime!), but there is another option.

Data Transfer Devices

Given the potential challenges of using your network for the initial upload of your data to the cloud, a handful of cloud storage companies have introduced data transfer or data ingest services. Backblaze has the B2 Fireball, Amazon has Snowball (and other similar devices), and Google recently introduced their Transfer Appliance.

KLRU-TV Austin PBS uploaded their Austin City Limits musical anthology series to Backblaze using a B2 Fireball.

These services work as follows:

  • The provider sends you a portable (or somewhat portable) storage device.
  • You connect the device to your network and load some amount of data on the device over your internal network connection.
  • You return the device, loaded with your data, to the provider, who uploads your data to your cloud storage account from inside their own data center.

Data Transfer Devices Save Time

Assuming your Internet connection is a flat rate service that has no caps or limits and your organizational operations can withstand the traffic, you still may want to opt to use a data transfer service to move your data to the cloud. Why? Time. For example, if your initial data upload is 100 TB here’s how long it would take using different network upload connection speeds:

Network Speed Upload Time
10 Mbps 3 years
100 Mbps 124 days
500 Mbps 25 days
1 Gbps 12 days

This assumes you are using most of your upload connection to upload your data, which is probably not realistic if you want to stay in business. You could potentially rent a better connection or upgrade your connection permanently, both of which add to the cost of running your business.

Speaking of cost, there is of course a charge for the data transfer service that can be summarized as follows:

  • Backblaze B2 Fireball — Up to 40 TB of data per trip for $550.00 for 30 days in use at your site.
  • Amazon Snowball — up to 50 TB of data per trip for $200.00 for 10 days use at your site, plus $15/day each day in use at your site thereafter.
  • Google Transfer Appliance — up to 100 TB of data per trip for $300.00 for 10 days use at your site, plus $10/day each day in use at your site thereafter.

These prices do not include shipping, which can range from $100 to $900 depending on shipping method, location, etc.

Both Amazon and Google have transfer devices that are larger and cost more. For comparison purposes below we’ll use the three device versions listed above.

The Real Cost of Uploading Your Data

If we stopped our review at the previous paragraph and we were prepared to load up our transfer device in 10 days or less, the clear winner would be Google. But, this leaves out two very important components of any cloud storage project; the cost of storing your data and the cost of downloading your data.

Let’s look at two examples:

Example 1 — Archive 100 TB of data:

  • Use the data transfer service move 100 TB of data to the cloud storage service.
  • Accomplish the transfer within 10 days.
  • Store that 100 TB of data for 1 year.
Service Transfer Cost Cloud Storage Total
Backblaze B2 $1,650 (3 trips) $6,000 $7,650
Google Cloud $300 (1 trip) $24,000 $24,300
Amazon S3 $400 (2 trips) $25,200 $25,600

Results:

  • Using the B2 Fireball to store data in Backblaze B2 saves you $16,650 over a one-year period versus the Google solution.
  • The payback period for using a Backblaze B2 FireBall versus a Google Transfer Appliance is less than 1 month.

Example 2 — Store and use 100 TB of data:

  • Use the data transfer service to move 100 TB of data to the cloud storage service.
  • Accomplish the transfer within 10 days.
  • Store that 100 TB of data for 1 year.
  • Add 5 TB a month (on average) to the total stored.
  • Delete 2 TB a month (on average) from the total stored.
  • Download 10 TB a month (on average) from the total stored.
Service Transfer Cost Cloud Storage Total
Backblaze B2 $1,650 (3 trips) $9,570 $11,220
Google Cloud $300 (1 trip) $39,684 $39,984
Amazon S3 $400 (2 trips) $36,114 $36,514

Results:

  • Using the B2 Fireball to store data in Backblaze B2 saves you $28,764 over a one-year period versus the Google solution.
  • The payback period for using a Backblaze B2 FireBall versus a Google Transfer Appliance is less than 1 month.

Notes:

  • All prices listed are based on list prices from the vendor websites as of the date of this blog post.
  • We are accomplishing the transfer of your data to the device within the 10 day “free” period specified by Amazon and Google.
  • We are comparing cloud storage services that have similar performance. For example, once the data is uploaded, it is readily available for download. The data is also available for access via a Web GUI, CLI, API, and/or various applications integrated with the cloud storage service. Multiple versions of files can be kept as desired. Files can be deleted any time.

To be fair, it requires Backblaze three trips to move 100 TB while it only takes 1 trip for the Google Transfer Appliance. This adds some cost to prepare, monitor, and ship three B2 Fireballs versus one Transfer Appliance. Even with that added cost, the Backblaze B2 solution will still be significantly less expensive over the one year period and beyond.

Have a Data Transfer Device Owner

Before you run out and order a transfer device, make sure the transfer process is someone’s job once the device arrives at your organization. Filling a transfer device should only take a few days, but if it is forgotten, you’ll find you’ve had the device for 2 or 3 weeks. While that’s not much of a problem with a B2 Fireball, it could start to get expensive otherwise.

Just the Beginning

As with most “new” technologies and services, you can expect other companies to jump in and provide various data ingest services. The cost will get cheaper or even free as cloud storage companies race to capture and lock up the data you have kept locally all these years. When you are evaluating cloud storage solutions, it’s best to look past the data ingest loss-leader price, and spend a few minutes to calculate the long-term cost of storing and using your data.

The post Getting Your Data into the Cloud is Just the Beginning appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Turbocharge your Apache Hive queries on Amazon EMR using LLAP

Post Syndicated from Jigar Mistry original https://aws.amazon.com/blogs/big-data/turbocharge-your-apache-hive-queries-on-amazon-emr-using-llap/

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data.

With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has changed. LLAP uses long-lived daemons with intelligent in-memory caching to circumvent batch-oriented latency and provide sub-second query response times.

This post provides an overview of Hive LLAP, including its architecture and common use cases for boosting query performance. You will learn how to install and configure Hive LLAP on an Amazon EMR cluster and run queries on LLAP daemons.

What is Hive LLAP?

Hive LLAP was introduced in Apache Hive 2.0, which provides very fast processing of queries. It uses persistent daemons that are deployed on a Hadoop YARN cluster using Apache Slider. These daemons are long-running and provide functionality such as I/O with DataNode, in-memory caching, query processing, and fine-grained access control. And since the daemons are always running in the cluster, it saves substantial overhead of launching new YARN containers for every new Hive session, thereby avoiding long startup times.

When Hive is configured in hybrid execution mode, small and short queries execute directly on LLAP daemons. Heavy lifting (like large shuffles in the reduce stage) is performed in YARN containers that belong to the application. Resources (CPU, memory, etc.) are obtained in a traditional fashion using YARN. After the resources are obtained, the execution engine can decide which resources are to be allocated to LLAP, or it can launch Apache Tez processors in separate YARN containers. You can also configure Hive to run all the processing workloads on LLAP daemons for querying small datasets at lightning fast speeds.

LLAP daemons are launched under YARN management to ensure that the nodes don’t get overloaded with the compute resources of these daemons. You can use scheduling queues to make sure that there is enough compute capacity for other YARN applications to run.

Why use Hive LLAP?

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL  over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

  • For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.
  • It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.
  • It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.
  • LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.
  • It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.
  • It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.
  • It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

How do you install Hive LLAP in Amazon EMR?

To install and configure LLAP on an EMR cluster, use the following bootstrap action (BA):

s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh

This BA downloads and installs Apache Slider on the cluster and configures LLAP so that it works with EMR Hive. For LLAP to work, the EMR cluster must have Hive, Tez, and Apache Zookeeper installed.

You can pass the following arguments to the BA.

Argument Definition Default value
--instances Number of instances of LLAP daemon Number of core/task nodes of the cluster
--cache Cache size per instance 20% of physical memory of the node
--executors Number of executors per instance Number of CPU cores of the node
--iothreads Number of IO threads per instance Number of CPU cores of the node
--size Container size per instance 50% of physical memory of the node
--xmx Working memory size 50% of container size
--log-level Log levels for the LLAP instance INFO

LLAP example

This section describes how you can try the faster Hive queries with LLAP using the TPC-DS testbench for Hive on Amazon EMR.

Use the following AWS command line interface (AWS CLI) command to launch a 1+3 nodes m4.xlarge EMR 5.6.0 cluster with the bootstrap action to install LLAP:

aws emr create-cluster --release-label emr-5.6.0 \
--applications Name=Hadoop Name=Hive Name=Hue Name=ZooKeeper Name=Tez \
--bootstrap-actions '[{"Path":"s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh","Name":"Custom action"}]' \ 
--ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-xxxxxxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxxxxx","EmrManagedMasterSecurityGroup":"sg-xxxxxxxx"}' 
--service-role EMR_DefaultRole \
--enable-debugging \
--log-uri 's3n://<YOUR-BUCKET/' --name 'test-hive-llap' \
--instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master - 1"},{"InstanceCount":3,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"Core - 2"}]' 
--region us-east-1

After the cluster is launched, log in to the master node using SSH, and do the following:

  1. Open the hive-tpcds folder:
    cd /home/hadoop/hive-tpcds/
  2. Start Hive CLI using the testbench configuration, create the required tables, and run the sample query:

    hive –i testbench.settings
    hive> source create_tables.sql;
    hive> source query55.sql;

    This sample query runs on a 40 GB dataset that is stored on Amazon S3. The dataset is generated using the data generation tool in the TPC-DS testbench for Hive.It results in output like the following:
  3. This screenshot shows that the query finished in about 47 seconds for LLAP mode. Now, to compare this to the execution time without LLAP, you can run the same workload using only Tez containers:
    hive> set hive.llap.execution.mode=none;
    hive> source query55.sql;


    This query finished in about 80 seconds.

The difference in query execution time is almost 1.7 times when using just YARN containers in contrast to running the query on LLAP daemons. And with every rerun of the query, you notice that the execution time substantially decreases by the virtue of in-memory caching by LLAP daemons.

Conclusion

In this post, I introduced Hive LLAP as a way to boost Hive query performance. I discussed its architecture and described several use cases for the component. I showed how you can install and configure Hive LLAP on an Amazon EMR cluster and how you can run queries on LLAP daemons.

If you have questions about using Hive LLAP on Amazon EMR or would like to share your use cases, please leave a comment below.


Additional Reading

Learn how to to automatically partition Hive external tables with AWS.


About the Author

Jigar Mistry is a Hadoop Systems Engineer with Amazon Web Services. He works with customers to provide them architectural guidance and technical support for processing large datasets in the cloud using open-source applications. In his spare time, he enjoys going for camping and exploring different restaurants in the Seattle area.

 

 

 

 

Transparency in Cloud Storage Costs

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/transparency-in-cloud-storage-costs/

cloud storage cost calculator

Backblaze’s mission is to make cloud storage that’s affordable and astonishingly easy to use. Backblaze B2 embodies that mission for those looking for an object storage solution.

Another Backblaze core value is being transparent, from releasing our Storage Pod designs to detailing our cloud storage cost of goods sold. We are an open book in the Cloud Storage industry. So it makes sense that opaque pricing policies that require mind numbing calculations are a no-no for us. Our approach to pricing is to be transparent, straight-forward, and predictable.

For Backblaze B2, this means that no matter how much data you have, the cost for B2 is $0.005/GB per month for data storage and $0.02/GB to download data. There are no costs to upload. We also throw in 10GB of storage and 1GB of downloads for free every month.

Cloud Storage Price Comparison

The storage industry does not share our view of making pricing transparent, or affordable. In an effort to help everyone, we’ve made a Cloud Storage Pricing Calculator, where anyone can enter in their specific use case and get pricing back for B2, S3, Azure, and GCS. We’ve also included the calculator below for those interested in trying it out.

B2 Cost Calculator

Backblaze provides this calculator as an estimate.

Initial Upload:

GB

Data over time

Monthly Upload:

GB

Monthly Delete:

GB

Monthly Download:

GB


Period of Time:

Months

Storage Costs

Storage Cost for Initial Month:
x

Data Added Each Month:
x

Data Deleted Each Month:
x

Net Data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Amazon S3
Microsoft Azure
Google Cloud

x
x
x
x
x
x
* Figures are not exact and do not include the following: Free first 10 GB of storage, free 1 GB of daily downloads, or $.004/10,000 class B Transactions and $.004/1,000 Class C Transactions.

Sample storage scenarios:

Scenario 1

You have data you wish to archive, and will be adding more each month, but you don’t expect that you will be downloading or deleting any data.

Initial upload: 10,000GB
Monthly upload: 1,000GB

For twelve months, your costs would be:

Backblaze B2 $990.00
Amazon S3 $4,158.00 +420%
Microsoft Azure $4,356.00 +440%
Google Cloud $5,148.00 +520%

 

Scenario 2

You wish to store data, and will be actively changing that data with uploads, downloads, and deletions.

Initial upload: 10,000GB
Monthly upload: 2,000GB
Monthly deletion: 1,000GB
Monthly download: 500GB

Your costs for 12 months would be:

Backblaze B2 $1,100.00
Amazon S3 $3.458/00 +402%
Microsoft Azure $4,656.00 +519%
Google Cloud $5,628.00 +507%

We invite you to compare our cost estimates against the competition. Here are the links to our competitors’ pricing calculators.

B2 Cloud Storage Pricing Summary

Provider
Storage
($/GB/Month)

Download
($/GB)
$0.005 $0.02
$0.021
+420%
$0.05+
+250%
$0.022+
+440%
$0.05+
+250%
$0.026
+520%
$0.08+
+400%

The Details


STORAGE
$0.005/GB/Month
How much data you have stored with Backblaze. This is calculated once a day based on the average storage of the previous 24 hours.
The first 10 GB of storage is free.

DOWNLOAD
$0.02/GB
Charged when you download files and charged when you create a Snapshot. Charged for any portion of a GB. The first 1 GB of data downloaded each day is free.

TRANSACTIONS
Class “A” transactions – Free
Class “B” transactions – $0.004 per 10,000 with 2,500 free per day.
Class “C” transactions – $0.004 per 1,000 with 2,500 free per day.
View Transactions by API Call

DATA BY MAIL
Mail us your data on a B2 Fireball – $550
Backblaze will mail your data to you by FedEx:
• USB Flash Drive – up to 110 GB – $89
• USB Hard Drive – up to 3.5TB of data – $189

PRODUCT SUPPORT
All B2 active account owners can contact Backblaze support at help.backblaze.com where they will also find a free-to- use knowledge base of B2 advice, guides, and more. In addition, a B2 user can pay to upgrade their support plan to include phone service, 24×7 support and more.

EVERYTHING ELSE
Free
Unlike other services, you won’t be nickeled and dimed with upload fees, file deletion charges, minimum files size requirements, and more. Everything you can possibly pay Backblaze is listed above.

 

Visit our B2 Cloud Storage Pricing web page for more details.


Amazon S3
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Microsoft
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Google
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

The post Transparency in Cloud Storage Costs appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.