All posts by Andy Klein

Backblaze Hard Drive Stats Q3 2019

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-hard-drive-stats-q3-2019/

Backblaze Drive Stats Q3 2019

As of September 30, 2019, Backblaze had 115,151 spinning hard drives spread across four data centers on two continents. Of that number, there were 2,098 boot drives and 113,053 data drives. We’ll look at the lifetime hard drive failure rates of the data drive models currently in operation in our data centers, but first we’ll cover the events that occurred in Q3 that potentially affected the drive stats for that period. As always, we’ll publish the data we use in these reports on our Hard Drive Test Data web page and we look forward to your comments.

Hard Drive Stats for Q3 2019

At this point in prior hard drive stats reports we would reveal the quarterly hard drive stats table. This time we are only going to present the Lifetime Hard Drive Failure table, which you can see if you jump to the end of this report. For the Q3 table, the data which we typically use to create that report may have been indirectly affected by one of our utility programs which performs data integrity checks. While we don’t believe the long-term data is impacted, we felt you should know. Below, we will dig into the particulars in an attempt to explain what happened in Q3 and what we think it all means.

What is a Drive Failure?

Over the years we have stated that a drive failure occurs when a drive stops spinning, won’t stay as a member of a RAID array, or demonstrates continuous degradation over time as informed by SMART stats and other system checks. For example, a drive that reports a rapidly increasing or egregious number of media read errors is a candidate for being replaced as a failed drive. These types of errors are usually seen in the SMART stats we record as non-zero values for SMART 197 and 198 which log the discovery and correctability of bad disk sectors, typically due to media errors. We monitor other SMART stats as well, but these two are the most relevant to this discussion.

What might not be obvious is that changes in some SMART attributes only occur when specific actions occur. Using SMART 197 and 198 as examples again, these values are only affected when a read or write operation occurs on a disk sector whose media is damaged or otherwise won’t allow the operation. In short, SMART stats 197 and 198 that have a value of zero today will not change unless a bad sector is encountered during normal disk operations. These two SMART stats don’t cause read and writes to occur, they only log aberrant behavior from those operations.

Protecting Stored Data

When a file, or group of files, arrives at a Backblaze data center, the file is divided into pieces we call shards. For more information on how shards are created and used in the Backblaze architecture, please refer to Backblaze Vault and Backblaze Erasure Coding blog posts. For simplicity’s sake, let’s say a shard is a blob of data that resides on a disk in our system.

As each shard is stored on a hard drive, we create and store a one-way hash of the contents. For reasons ranging from media damage to bit rot to gamma rays, we check the integrity of these shards regularly by recomputing the hash and comparing it to the stored value. To recompute the shard hash value, a utility known as a shard integrity check reads the data in the shard. If there is an inconsistency between the newly computed and the stored hash values, we rebuild the shard using the other shards as described in the Backblaze Vault blog post.

Shard Integrity Checks

The shard integrity check utility runs as a utility task on each Storage Pod. In late June, we decided to increase the rate of the shard integrity checks across the data farm to cause the checks to run as often as possible on a given drive while still maintaining the drive’s performance. We increased the frequency of the shard integrity checks to account for the growing number of larger-capacity drives that had been deployed recently.

The Consequences for Drive Stats

Once we write data to a disk, that section of disk remains untouched until the data is read by the user, the data is read by the shard integrity check process to recompute the hash, or the data is deleted and written over. As a consequence, there are no updates regarding that section of disk sent to SMART stats until one of those three actions occur. By speeding up the frequency of the shard integrity checks on a disk, the disk is read more often. Errors discovered during the read operation of the shard integrity check utility are captured by the appropriate SMART attributes. Putting together the pieces, a problem that would have been discovered in the future—under our previous shard integrity check cadence—would now be captured by the SMART stats when the process reads that section of disk today.

By increasing the shard integrity check rate, we potentially moved failures that were going to be found in the future into Q3. While discovering potential problems earlier is a good thing, it is possible that the hard drive failures recorded in Q3 could then be artificially high as future failures were dragged forward into the quarter. Given that our Annualized Failure Rate calculation is based on Drive Days and Drive Failures, potentially moving up some number of failures into Q3 could cause an artificial spike in the Q3 Annualized Failure Rates. This is what we will be monitoring over the coming quarters.

There are a couple of things to note as we consider the effect of the accelerated shard integrity checks on the Q3 data for Drive Stats:

  • The number of drive failures over the lifetime of a given drive model should not increase. At best we just moved the failures around a bit.
  • It is possible that the shard integrity checks did nothing to increase the number of drive failures that occurred in Q3. The quarterly failure rates didn’t vary wildly from previous quarters, but we didn’t feel comfortable publishing them at this time given the discussion above.

Lifetime Hard Drive Stats through Q3 2019

Below are the lifetime failure rates for all of our drive models in service as of September 30, 2019.
Backblaze Lifetime Hard Drive Annualized Failure Rates
The lifetime failure rate for the drive models in production rose slightly, from 1.70% at the end of Q2 to 1.73% at the end of Q3. This trivial increase would seem to indicate that the effect of the potential Q3 data issue noted above is minimal and well within a normal variation. However, we’re not satisfied that is true yet and we have a plan for making sure as we’ll see in the next section.

What’s Next for Drive Stats?

We will continue to publish our Hard Drive Stats each quarter, and next quarter we expect to include the quarterly (Q4) chart as well. For the foreseeable future, we will have a little extra work to do internally as we will be tracking two different groups of drives. One group will be the drives that “went through the wormhole,” so to speak, as they were present during the accelerated shard integrity checks. The other group will be those drives that were placed into production after the shard integrity check setting was reduced. We’ll compare these two datasets to see if there was indeed any effect of the increased shard integrity checks on the Q3 hard drive failure rates. We’ll let you know what we find in subsequent drive stats reports.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know what you find.

As always, we look forward to your thoughts and questions in the comments.

The post Backblaze Hard Drive Stats Q3 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Life and Times of a Backblaze Hard Drive

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/life-and-times-of-a-backblaze-hard-drive/

Seagate 12 TB hard drive

Backblaze likes to talk about hard drive failures — a lot. What we haven’t talked much about is how we deal with those failures: the daily dance of temp drives, replacement drives, and all the clones that it takes to keep over 100,000 drives healthy. Let’s go behind the scenes and take a look at that dance from the eyes of one Backblaze hard drive.

After sitting still for what seemed like forever, ZCH007BZ was on the move. ZCH007BZ, let’s call him Zach, is a Seagate 12 TB hard drive. For the last few weeks, Zach and over 6,000 friends were securely sealed inside their protective cases in the ready storage area of a Backblaze data center. Being a hard disk drive, Zach’s modest dream was to be installed in a system, spin merrily, and store data for many years to come. And now the wait was nearly over, or was it?

Hard drives in wrappers

The Life of Zach

Zach was born in a factory in Singapore and shipped to the US, eventually finding his way to Backblaze, but he didn’t know that. He had sat sealed in the dark for weeks. Now Zach and boxes of other drives were removed from their protective cases and gently stacked on a cart. Zack was near the bottom of the pile, but even he could see endless columns of beautiful red boxes stacked seemingly to the sky. “Backblaze!” one of the drives on the cart whispered. All the other drives gasped with recognition. Thank goodness the noise-cancelling headphones worn by all Backblaze Data Center Techs covered the drives’ collective excitement.

While sitting in the dark, the drives had gossiped about where they were: a data center, a distribution warehouse, a Costco, or Best Buy. Backblaze came up a few times, but that was squashed — they couldn’t be that lucky. After all, Backblaze was the only place where a drive could be famous. Before Backblaze, hard drives labored in anonymity. Occasionally, one or two would be seen in a hard drive tear down article, but even that sort of exposure had died out a couple of years ago. But Backblaze publishes everything about their drives, their model numbers, their serial numbers, heck even their S.M.A.R.T. statistics. There was a rumor that hard drives worked extra hard at Backblaze because they knew they would be in the public eye. With red Backblaze Storage Pods as far as the eye could see, Zach and friends were about to find out.

Drive with guideThe cart Zach and his friends were on glided to a stop at the production build facility. This is where storage pods are filled with drives and tested before being deployed. The cart stopped by the first of twenty V6.0 Backblaze Storage Pods that together would form a Backblaze Vault. At each Storage Pod station 60 drives were unloaded from the cart. The serial number of each drive was recorded along with the Storage Pod ID and drive location in the pod. Finally, each drive was fitted with a pair of drive guides and slid into its new home as a production drive in a Backblaze Storage Pod. “Spin long and prosper,” Zach said quietly each time the lid of a Storage Pod snapped in place covering the 60 giddy hard drives inside. The process was repeated for the remaining 19 Storage Pods, and when it was done Zach remained on the cart. He would not be installed in a production system today.

The Clone Room

Zach and the remaining drives on the cart were slowly wheeled down the hall. Bewildered, they were rolled in the clone room. “What’s a clone room,” Zach asked to himself? The drives on the cart were divided into two groups, with one group being placed on the clone table, and the other being placed on the test table. Zach was on the test table.

Almost as soon as Zach was placed on the test table, the DC Tech picked him up again and placed him and several other drives into a machine. He was about to get formatted. The entire formatting process only took a few minutes for Zach, as it did for all of the other drives on the test table. Zach counted 25 drives, including himself.

Still confused and a little sore from the formatting, Zach and two other drives were picked up from the bench by a different DC Tech. She recorded their vitals — serial number, manufacturer, and model — and left the clone room with all three drives on a different cart.

Dreams of a Test Drive

Luigi, Storage Pod liftThe three drives were back on the data center floor with red Storage Pods all around. The DC Tech had maneuvered Luigi, the local Storage Pod lift unit, to hold a Storage Pod she was sliding from a data center rack. The lid was opened, the tech attached a grounding clip, and then removed one of the drives in the Storage Pod. She recorded the vitals of the removed drive. While she was doing so, Zach could hear the removed drive breathlessly mumble something about media errors, but before Zach could respond, the tech picked him up, attached drive guides to his frame and gently slide him into the Storage Pod. The tech updated her records, closed the lid, and slide the pod back into place. A few seconds later, Zach felt a jolt of electricity pass through his circuits and he and 59 other drives spun to life. Zach was now part of a production Backblaze Storage Pod.

First, Zach was introduced to the other 19 members of his tome. There are 20 drives in a tome, with each living in a separate Storage Pod. Files are divided (sharded) across these 20 drives using Backblaze’s open-sourced erasure code algorithm.

Zach’s first task was to rebuild all of the files that were stored on the drive he replaced. He’d do this by asking for pieces (shards) of all the files from the 19 other drives in his tome. He only needed 17 of the pieces to rebuild a file, but he asked everyone in case there was a problem. Rebuilding was hard work, and the other drives were often busy with reading files, performing shard integrity checks, and so on. Depending on how busy the system was, and how full the drives were, it might take Zach a couple of weeks to rebuild the files and get him up to speed with his contemporaries.

Nightmares of a Test Drive

Little did he know, but at this point, Zach was still considered a temp replacement drive. The dysfunctional drive that he replaced was making its way back to the clone room where a pair of cloning units, named Harold and Maude in this case, waited. The tech would attempt to clone the contents of the failed drive to a new drive assigned to the clone table. The primary reason for trying to clone a failed drive was recovery speed. A drive can be cloned in a couple of days, but as noted above, it can take up to a couple of weeks to rebuild a drive, especially large drives on busy systems. In short, a successful clone would speed up the recovery process.

For nearly two days straight, Zach was rebuilding. He barely had time to meet his pod neighbors, Cheryl and Carlos. Since they were not rebuilding, they had plenty of time to marvel at how hard Zach was working. He was 25 % done and going strong when the Storage Pod powered down. Moments later, the pod was slid out of the rack and the lid popped open. Zach assumed that another drive in the pod had failed, when he felt the spindly, cold fingers of the tech grab him and yank firmly. He was being replaced.

Storage Pod in Backblaze data center

Zach had done nothing wrong. It was just that the clone was successful, with nearly all the files being copied from the previous drive to the smiling clone drive that was putting on Zach’s drive guides and gently being inserted in Zach’s old slot. “Goodbye,” he managed to eek out as he was placed on the cart and watched the tech bring the Storage Pod back to life. Confused, angry, and mostly exhausted, Zach quickly fell asleep.

Zach woke up just in time to see he was in the formatting machine again. The data he had worked so hard to rebuild was being ripped from his platters and replaced randomly with ones and zeroes. This happened multiple times and just as Zach was ready to scream, it stopped, and he was removed from his torture and stacked neatly with a few other drives.

After a while he looked around, and once the lights went out the stories started. Zach wasn’t alone. Several of the other temp drives had pretty much the same story; they thought they had found a home, only to be replaced by some uppity clone drive. One of the temp drives, Lin, said she had been in three different systems only to be replaced each time by a clone drive. No one wanted to believe her, but no one knew what was next either.

The Day the Clone Died

Zach found out the truth a few days later when he was selected, inspected, and injected as a temp drive into another Storage Pod. Then three days later he was removed, wiped, reformatted, and placed back in the temp pool. He began to resign himself to life as a temp drive. Not exactly glamorous, but he did get his serial number in the Backblaze Drive Stats data tables while he was a temp. That was more than the millions of other drives in the world that would forever be unknown.

On his third temp drive stint, he was barely in the pod a day when the lid opened and he was unceremoniously removed. This was the life of temp drive, and when the lid opened on the fourth day of his fourth temp drive shift, he just closed his eyes and waited for his dream to end again. Except, this time, the tech’s hand reached past him and grabbed a drive a few slots away. That unfortunate drive had passed the night before, a full-fledged crash. Zach, like all the other drives nearby, had heard the screams.

Another temp drive Zach knew from the temp table replaced the dead drive, then the lid was closed, the pod slid back into place, and power was restored. With that Zach, doubled down on getting rebuilt — maybe if he could get done before the clone was finished then he could stay. What Zach didn’t know was that the clone process for the drive he had replaced had failed. This happens about half the time. Zach was home free; he just didn’t know it.

In a couple of days, Zach was finished rebuilding and become a real member of a production Backblaze Storage Pod. He now spends his days storing and retrieving data, getting his bits tested by shard integrity checks, and having his S.M.A.R.T. stats logged for the Backblaze Drive Stats. His hard drive life is better than he ever dreamed.

The post The Life and Times of a Backblaze Hard Drive appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Petabytes on a Budget: 10 Years and Counting

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/petabytes-on-a-budget-10-years-and-counting/

A Decade of the Pod

This post is for all of the storage geeks out there who have followed the adventures of Backblaze and our Storage Pods over the years. The rest of you are welcome to come along for the ride.

It has been 10 years since Backblaze introduced our Storage Pod to the world. In September 2009, we announced our hulking, eye-catching, red 4U storage server equipped with 45 hard drives delivering 67 terabytes of storage for just $7,867 — that was about $0.11 a gigabyte. As part of that announcement, we open-sourced the design for what we dubbed Storage Pods, telling you and everyone like you how to build one, and many of you did.

Backblaze Storage Pod version 1 was announced on our blog with little fanfare. We thought it would be interesting to a handful of folks — readers like you. In fact, it wasn’t even called version 1, as no one had ever considered there would be a version 2, much less a version 3, 4, 4.5, 5, or 6. We were wrong. The Backblaze Storage Pod struck a chord with many IT and storage folks who were offended by having to pay a king’s ransom for a high density storage system. “I can build that for a tenth of the price,” you could almost hear them muttering to themselves. Mutter or not, we thought the same thing, and version 1 was born.

The Podfather

Tim, the “Podfather” as we know him, was the Backblaze lead in creating the first Storage Pod. He had design help from our friends at Protocase, who built the first three generations of Storage Pods for Backblaze and also spun out a company named 45 Drives to sell their own versions of the Storage Pod — that’s open source at its best. Before we decided on the version 1 design, there were a few experiments along the way:

Wooden pod
Octopod

The original Storage Pod was prototyped by building a wooden pod or two. We needed to test the software while the first metal pods were being constructed.

The Octopod was a quick and dirty response to receiving the wrong SATA cables — ones that were too long and glowed. Yes, there are holes drilled in the bottom of the pod.

Pre-1 Storage Pod
Early not-red Storage Pod

The original faceplate shown above was used on about 10 pre-1.0 Storage Pods. It was updated to the three circle design just prior to Storage Pod 1.0.

Why are Storage Pods red? When we had the first ones built, the manufacturer had a batch of red paint left over that could be used on our pods, and it was free.

Back in 2007, when we started Backblaze, there wasn’t a whole lot of affordable choices for storing large quantities of data. Our goal was to charge $5/month for unlimited data storage for one computer. We decided to build our own storage servers when it became apparent that, if we were to use the other solutions available, we’d have to charge a whole lot more money. Storage Pod 1.0 allowed us to store one petabyte of data for about $81,000. Today we’ve lowered that to about $35,000 with Storage Pod 6.0. When you take into account that the average amount of data per user has nearly tripled in that same time period and our price is now $6/month for unlimited storage, the math works out about the same today as it did in 2009.

We Must Have Done Something Right

The Backblaze Storage Pod was more than just affordable data storage. Version 1.0 introduced or popularized three fundamental changes to storage design: 1) You could build a system out of commodity parts and it would work, 2) You could mount hard drives vertically and they would still spin, and 3) You could use consumer hard drives in the system. It’s hard to determine which of these three features offended and/or excited more people. It is fair to say that ten years out, things worked out in our favor, as we currently have about 900 petabytes of storage in production on the platform.

Over the last 10 years, people have warmed up to our design, or at least elements of the design. Starting with 45 Drives, multitudes of companies have worked on and introduced various designs for high density storage systems ranging from 45 to 102 drives in a 4U chassis, so today the list of high-density storage systems that use vertically mounted drives is pretty impressive:

CompanyServerDrive Count
45 DrivesStorinator S4545
45 DrivesStorinator XL6060
ChenbroRM4316060
ChenbroRM43699100
DellDSS 700090
HPECloudline CL520080
HPECloudline CL5800100
NetGear ReadyNAS 4360X60
NewisysNDS 445060
QuantaQuantaGrid D51PL-4U102
QuantaQuantaPlex T21P-4U70
Seagate Exos AP 4U10096
SupermicroSuperStorage 6049P-E1CR60L60
SupermicroSuperStorage 6049P-E1CR45L45
TyanThunder SX FA100-B7118100
Viking Enterprise SolutionsNSS-460260
Viking Enterprise SolutionsNDS-490090
Viking Enterprise SolutionsNSS-41000100
Western DigitalUltrastar Serv60+860
WiwynnSV7000G272

Another driver in the development of some of these systems is the Open Compute Project (OCP). Formed in 2011, they gather and share ideas and designs for data storage, rack designs, and related technologies. The group is managed by The Open Compute Project Foundation as a 501(c)(6) and counts many industry luminaries in the storage business as members.

What Have We Done Lately?

In technology land, 10 years of anything is a long time. What was exciting then is expected now. And the same thing has happened to our beloved Storage Pod. We have introduced updates and upgrades over the years twisting the usual dials: cost down, speed up, capacity up, vibration down, and so on. All good things. But, we can’t fool you, especially if you’ve read this far. You know that Storage Pod 6.0 was introduced in April 2016 and quite frankly it’s been crickets ever since as it relates to Storage Pods. Three plus years of non-innovation. Why?

  1. If it ain’t broke, don’t fix it. Storage Pod 6.0 is built in the US by Equus Compute Solutions, our contract manufacturer, and it works great. Production costs are well understood, performance is fine, and the new higher density drives perform quite well in the 6.0 chassis.
  2. Disk migrations kept us busy. From Q2 2016 through Q2 2019 we migrated over 53,000 drives. We replaced 2, 3, and 4 terabyte drives with 8, 10, and 12 terabyte drives, doubling, tripling and sometimes quadrupling the storage density of a storage pod.
  3. Pod upgrades kept us busy. From Q2 2016 through Q1 2019, we upgraded our older V2, V3, and V4.5 storage pods to V6.0. Then we crushed a few of the older ones with a MegaBot and gave a bunch more away. Today there are no longer any stand-alone storage pods; they are all members of a Backblaze Vault.
  4. Lots of data kept us busy. In Q2 2016, we had 250 petabytes of data storage in production. Today, we have 900 petabytes. That’s a lot of data you folks gave us (thank you by the way) and a lot of new systems to deploy. The chart below shows the challenge our data center techs faced.

Petabytes Stored vs Headcount vs Millions Raised

In other words, our data center folks were really, really busy, and not interested in shiny new things. Now that we’ve hired a bunch more DC techs, let’s talk about what’s next.

Storage Pod Version 7.0 — Almost

Yes, there is a Backblaze Storage Pod 7.0 on the drawing board. Here is a short list of some of the features we are looking at:

  • Updating the motherboard
  • Upgrade the CPU and consider using an AMD CPU
  • Updating the power supply units, perhaps moving to one unit
  • Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
  • Upgrading the SATA cards
  • Modifying the tool-less lid design

The timeframe is still being decided, but early 2020 is a good time to ask us about it.

“That’s nice,” you say out loud, but what you are really thinking is, “Is that it? Where’s the Backblaze in all this?” And that’s where you come in.

The Next Generation Backblaze Storage Pod

We are not out of ideas, but one of the things that we realized over the years is that many of you are really clever. From the moment we open sourced the Storage Pod design back in 2009, we’ve received countless interesting, well thought out, and occasionally odd ideas to improve the design. As we look to the future, we’d be stupid not to ask for your thoughts. Besides, you’ll tell us anyway on Reddit or HackerNews or wherever you’re reading this post, so let’s just cut to the chase.

Build or Buy

The two basic choices are: We design and build our own storage servers or we buy them from someone else. Here are some of the criteria as we think about this:

  1. Cost: We’d like the cost of a storage server to be about $0.030 – $0.035 per gigabyte of storage (or less of course). That includes the server and the drives inside. For example, using off-the-shelf Seagate 12 TB drives (model: ST12000NM0007) in a 6.0 Storage Pod costs about $0.032-$0.034/gigabyte depending on the price of the drives on a given day.
  2. International: Now that we have a data center in Amsterdam, we need to be able to ship these servers anywhere.
  3. Maintenance: Things should be easy to fix or replace — especially the drives.
  4. Commodity Parts: Wherever possible, the parts should be easy to purchase, ideally from multiple vendors.
  5. Racks: We’d prefer to keep using 42” deep cabinets, but make a good case for something deeper and we’ll consider it.
  6. Possible Today: No DNA drives or other wistful technologies. We need to store data today, not in the year 2061.
  7. Scale: Nothing in the solution should limit the ability to scale the systems. For example, we should be able to upgrade drives to higher densities over the next 5-7 years.

Other than that there are no limitations. Any of the following acronyms, words, and phrases could be part of your proposed solution and we won’t be offended: SAS, JBOD, IOPS, SSD, redundancy, compute node, 2U chassis, 3U chassis, horizontal mounted drives, direct wire, caching layers, appliance, edge storage units, PCIe, fibre channel, SDS, etc.

The solution does not have to be a Backblaze one. As the list from earlier in this post shows, Dell, HP, and many others make high density storage platforms we could leverage. Make a good case for any of those units, or any others you like, and we’ll take a look.

What Will We Do With All Your Input?

We’ve already started by cranking up Backblaze Labs again and have tried a few experiments. Over the coming months we’ll share with you what’s happening as we move this project forward. Maybe we’ll introduce Storage Pod X or perhaps take some of those Storage Pod knockoffs for a spin. Regardless, we’ll keep you posted. Thanks in advance for your ideas and thanks for all your support over the past ten years.

The post Petabytes on a Budget: 10 Years and Counting appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats Q2 2019

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-q2-2019/

Backblaze Drive Stats Q2 2019
< ve models that have been around for several years, take a look at how our 14 TB Toshiba drives are doing (spoiler alert: great), and along the way we’ll provide a handful of insights and observations from inside our storage cloud. As always, we’ll publish the data we use in these reports on our Hard Drive Test Data web page and we look forward to your comments.

Hard Drive Failure Stats for Q2 2019

At the end of Q2 2019, Backblaze was using 108,660 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 108,461 hard drives. The table below covers what happened in Q2 2019.

Backblaze Q2 2019 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0 percent, it means there were no drive failures of that model during Q2 2019 — lifetime failure rates are later in this report. The two drives listed with zero failures in Q2 were the 4 TB and 14 TB Toshiba models. The Toshiba 4 TB drive doesn’t have a large enough number of drives or drive days to be statistically reliable, but only one drive of that model has failed in the last three years. We’ll dig into the 14 TB Toshiba drive stats a little later in the report.

There were 199 drives (108,660 minus 108,461) that were not included in the list above because they were used as testing drives or we did not have at least 60 of a given drive model. We now use 60 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics as there are 60 drives in all newly deployed Storage Pods — older Storage Pod models had a minimum of 45.

2,000 Backblaze Storage Pods? Almost…

We currently have 1,980 Storage Pods in operation. All are version 5 or version 6 as we recently gave away nearly all of the older Storage Pods to folks who stopped by our Sacramento storage facility. Nearly all, as we have a couple in our Storage Pod museum. There are currently 544 version 5 pods each containing 45 data drives, and there are 1436 version 6 pods each containing 60 data drives. The next time we add a Backblaze Vault, which consists of 20 Storage Pods, we will have 2,000 Backblaze Storage Pods in operation.

Goodbye Western Digital

In Q2 2019, the last of the Western Digital 6 TB drives were retired from service. The average age of the drives was 50 months. These were the last of our Western Digital branded data drives. When Backblaze was first starting out, the first data drives we deployed en masse were Western Digital Green 1 TB drives. So, it is with a bit of sadness to see our Western Digital data drive count go to zero. We hope to see them again in the future.

WD Ultrastar 14 TB DC HC530

Hello “Western Digital”

While the Western Digital brand is gone, the HGST brand (owned by Western Digital) is going strong as we still have plenty of the HGST branded drives, about 20 percent of our farm, ranging in size from 4 to 12 TB. In fact, we added over 4,700 HGST 12 TB drives in this quarter.

This just in; rumor has it there are twenty 14 TB Western Digital Ultrastar drives getting readied for deployment and testing in one of our data centers. It appears Western Digital has returned: stay tuned.

Goodbye 5 TB Drives

Back in Q1 2015, we deployed 45 Toshiba 5 TB drives. They were the only 5 TB drives we deployed as the manufacturers quickly moved on to larger capacity drives, and so did we. Yet, during their four plus years of deployment only two failed, with no failures since Q2 of 2016 — three years ago. This made it hard to say goodbye, but buying, stocking, and keeping track of a couple of 5 TB spare drives was not optimal, especially since these spares could not be used anywhere else. So yes, the Toshiba 5 TB drives were the odd ducks on our farm, but they were so good they got to stay for over four years.

Hello Again Toshiba 14 TB Toshiba Drives

We’ve mentioned the Toshiba 14 TB drives in previous reports, now we can dig in a little deeper given that they have been deployed almost nine months and we have some experience working with them. These drives got off to a bit of a rocky start, with six failures in the first three months of being deployed. Since then, there has been only one additional failure, with no failures reported in Q2 2019. The result is that the lifetime annualized failure rate for the Toshiba 14 TB drives has decreased to a very respectable 0.78% as shown in the lifetime table in the following section.

Lifetime Hard Drive Stats

The table below shows the lifetime failure rates for the hard drive models we had in service as of June 30, 2019. This is over the period beginning in April 2013 and ending June 30, 2019.

Backblaze Lifetime Hard Drive Annualized Failure Rates

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know if you find anything interesting.

If you just want the tables we used to create the charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

The post Backblaze Hard Drive Stats Q2 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats Q1 2019

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-hard-drive-stats-q1-2019/

Backblaze Drive Stats Q1 2019

As of March 31, 2019, Backblaze had 106,238 spinning hard drives in our cloud storage ecosystem spread across three data centers. Of that number, there were 1,913 boot drives and 104,325 data drives. This review looks at the Q1 2019 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. In addition, we have a few questions for you to ponder near the end of the post. As always, we look forward to your comments.

Hard Drive Failure Stats for Q1 2019

At the end of Q1 2019, Backblaze was using 104,325 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 104,130 hard drives. The table below covers what happened in Q1 2019.

Q1 2019 Hard Drive Failure Rates table

Notes and Observations

If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q1 2019. The two drives listed with zero failures in Q1 were the 4 TB and 5 TB Toshiba models. Neither has a large enough number of drive days to be statistically significant, but in the case of the 5 TB model, you have to go back to Q2 2016 to find the last drive failure we had of that model.

There were 195 drives (104,325 minus 104,130) that were not included in the list above because they were used as testing drives or we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods. Beginning next quarter that threshold will change; we’ll get to that shortly.

The Annualized Failure Rate (AFR) for Q1 is 1.56%. That’s as high as the quarterly rate has been since Q4 2017 and its part of an overall upward trend we’ve seen in the quarterly failure rates over the last few quarters. Let’s take a closer look.

Quarterly Trends

We noted in previous reports that using the quarterly reports is useful in spotting trends about a particular drive or even a manufacturer. Still, you need to have enough data (drive count and drive days) in each observed period (quarter) to make any analysis valid. To that end the chart below uses quarterly data from Seagate and HGST drives while leaving out Toshiba and WDC drives as we don’t have enough drives from those manufacturers over the course of the last three years.

Trends of the Quarterly Hard Drive Annualized Failure Rates by Manufacturer

Over the last three years, the trend for both Seagate and HGST annualized failure rates had improved, i.e. gone down. While Seagate has reduced their failure rate over 50% during that time, the upward trend over the last three quarters requires some consideration. We’ll take a look at this and let you know if we find anything interesting in a future post.

Changing the Qualification Threshold

As reported over the last several quarters, we’ve been migrating from lower density drives, 2, 3, and 4 TB drives, to larger 10, 12, and 14 TB hard drives. At the same time, we have been replacing our stand-alone 45-drive Storage Pods with 60-drive Storage Pods arranged into the Backblaze Vault configuration of 20 Storage Pods per vault. In Q1, the last stand-alone 45-drive Storage Pod was retired. Therefore, using 45 drives as the threshold for qualification to our quarterly report seems antiquated. This is a good time to switch to using Drive Days as the qualification criteria. In reviewing our data, we have decided to use 5,000 Drive Days as the threshold going forward. The exception, any current drives we are reporting, such as the Toshiba 5 TB model with about 4,000 hours each quarter, will continue to be included in our Hard Drive Stats reports.

Fewer Drives = More Data

Those of you who follow our quarterly reports might have observed that the total number of hard drives in service decreased in Q1 by 648 drives compared to Q4 2018, yet we added nearly 60 petabytes of storage. You can see what changed in the chart below.

Backblaze Cloud Storage: Drive Counts vs. Disk Space in Q1 2019 table

Lifetime Hard Drive Stats

The table below shows the lifetime failure rates for the hard drive models we had in service as of March 31, 2019. This is over the period beginning in April 2013 and ending March 31, 2019.

Backblaze Lifetime Hard Drive Failure Rates table

Predictions for the Rest of 2019

As 2019 unfolds, here are a few guesses as to what might happen over the course of the year. Let’s see what you think.

By the end of 2019, which, if any, of the following things will happen? Let us know in the comments.

  • Backblaze will continue to migrate out 4 TB drives and will have fewer than 15,000 by the end of 2019: we currently have about 35,000.
  • We will have installed at least twenty 20 TB drives for testing purposes.
  • Backblaze will go over 1 exabyte (1,000 petabytes) of available cloud storage. We are currently at about 850 petabytes of available storage.
  • We will have installed, for testing purposes, at least 1 HAMR based drive from Seagate and/or 1 MAMR drive from Western Digital.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone — it is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats Q1 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

An Inside Look at the Backblaze Storage Pod Museum

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-storage-pod-museum/

image of the back of a Backblaze Storage Pod

Merriam-Webster defines a museum as “an institution devoted to the procurement, care, study, and display of objects of lasting interest or value.” With that definition in mind, we’d like to introduce the Backblaze Storage Pod Museum. While some folks think of a museum as a place of static, outdated artifacts, others realize that those artifacts can tell a story over time of experimentation, evolution, and innovation. That is certainly the case with our Storage Pods. Modesty prevents from us saying that we changed the storage industry with our Storage Pod design, so let’s say we added a lot of red to the picture.

Over the years, Larry, our data center manager, has stashed away the various versions of our Storage Pods as they were removed from service. He also kept drives, SATA cards, power supplies, cables, and more. Thank goodness. With the equipment that Larry’s pack-rat tendencies saved, and a couple of current Storage Pods we borrowed (shhhh, don’t tell Larry), we were able to start the Backblaze Storage Pod Museum. Let’s take a quick photo trip through the years.

Before Storage Pod 1.0

Before we announced Storage Pod 1.0 to the world nearly 10 years ago, we had already built about twenty or so Storage Pods. These early pods used Western Digital 1.0 TB Green drives. There were multiple prototypes, but once we went into production, we had settled on the 45-drive design with 3 rows of 15 vertically mounted drives. We ordered the first batch of ten chassis to be built and then discovered we did not spec a hole for the on/off switch. We improvised.

Storage Pod 1.0 — Petabytes on a Budget

We introduced the storage world to inexpensive cloud storage with Storage Pod 1.0. Funny thing, we didn’t refer to this innovation as version 1.0 — just a Backblaze Storage Pod. We not only introduced the Storage Pod, we also open-sourced the design, publishing the design specs, parts list, and more. People took notice. We introduced the design with Seagate 1.5 TB drives for a total of 67 TB of storage. This version also had an Intel Desktop motherboard (DG43NB) and 4 GB of memory.

Storage Pod 2.0 — More Petabytes on a Budget

Storage Pod 2.0 was basically twice the system that 1.0 was. It had twice the memory, twice the speed, and twice the storage, but it was in the same chassis with the same number of drives. All of this combined to reduce the cost per GB of the Storage Pod system over 50%: from $0.117/GB in version 1 to $0.055/GB in version 2.

Among the changes: the desktop motherboard in V1 was upgraded to a server class motherboard, we simplified things by using three four-port SATA cards, and reduced the cost of the chassis itself. In addition, we used Hitachi (HGST) 3 TB drives in Storage Pod 2.0 to double the total amount of storage to 135 TB. Over their lifetime, these HGST drives had an annualized failure rate of 0.82%, with the last of them being replaced in Q2 2017.

Storage Pod 3.0 — Good Vibrations

Storage Pod 3.0 brought the first significant chassis redesign in our efforts to make the design easier to service and provide the opportunity to use a wider variety of components. The most noticeable change was the introduction of drive lids — one for each row of 15 drives. Each lid was held in place by a pair of steel rods. The drive lids held the drives below in place and replaced the drive bands used previously. The motherboard and CPU were upgraded and we went with memory that was Supermicro certified. In addition, we added standoffs to the chassis to allow for Micro ATX motherboards to be used if desired, and we added holes where needed to allow for someone to use one or two 2.5” drives as boot drives — we use one 3.5” drive.

Storage Pod 4.0 — Direct Wire

Up through Storage Pod 3.0, Protocase helped design and then build our Storage Pods. During that time, they also designed and produced a direct wire version, which replaced the nine backplanes with direct wiring to the SATA cards. Storage Pod 4.0 was based on the direct wire technology. We deployed a small number of these systems but we fought driver problems between our software and the new SATA cards. In the end, we went back to our backplanes and Protocase continued forward with direct wire systems that they continued to deploy successfully. Conclusion: there are multiple ways you can be successful with the Storage Pod design.

Storage Pod 4.5 — Backplanes are Back

This version started with the Storage Pod 3.0 design and introduced new 5-port backplanes and upgraded to SATA III cards. Both of these parts were built on Marvel chipsets. The backplanes we previously used were being phased out, which prompted us to examine other alternatives like the direct wire pods. Now we had a ready supply of 5-port backplanes and Storage Pod 4.5 was ready to go.

We also began using Evolve Manufacturing to build these systems. They were located near Backblaze and were able to scale to meet our ever increasing production needs. In addition, they were full of great ideas on how to improve the Storage Pod design.

Storage Pod 5.0 — Evolution from the Chassis on Up

While Storage Pod 3.0 was the first chassis redesign, Storage Pod 5.0 was, to date, the most substantial. Working with Evolve Manufacturing, we examined everything down to the rivets and stand-offs, looking for a better, more cost efficient design. Driving many of the design decisions was the introduction of Backblaze B2 Cloud Storage that was designed to run on our Backblaze Vault architecture. From a performance point-of-view we upgraded the motherboard and CPU, increased memory fourfold, upgraded the networking to 10 GB on the motherboard, and moved from SATA II to SATA III. We also completely redid the drive enclosures, replacing the 15-drive clampdown lids to nine five-drive compartments with drive guides.

Storage Pod 6.0 — 60 Drives

Storage Pod 6.0 increased the amount of storage from 45 to 60 drives. We had a lot of questions when this idea was first proposed, like would we need: bigger power supplies (answer: no), more memory (no), a bigger CPU (no), or more fans (no). We did need to redesign our SATA cable routes from the SATA cards to the backplanes as we needed to stay under the one meter spec length for the SATA cables. We also needed to update our power cable harness, and, of course, add length to the chassis to accommodate the 15 additional drives, but nothing unexpected cropped up — it just worked.

What’s Next?

We’ll continue to increase the density of our storage systems. For example, we unveiled a Backblaze Vault full of 14 TB drives in our 2018 Drive Stats report. Each Storage Pod in that vault contains 840 terabytes worth of hard drives, meaning the 20 Storage Pods that make up the Backblaze Vault bring 16.8 petabytes of storage online when the vault is activated. As higher density drives and new technologies like HAMR and MAMR are brought to market, you can be sure we’ll be testing them for inclusion in our environment.

Nearly 10 years after the first Storage Pod altered the storage landscape, the innovation continues to deliver great returns to the market. Many other companies, from 45Drives to Dell and HP, have leveraged the Storage Pod’s concepts to make affordable, high-density storage systems. We think that’s awesome.

The post An Inside Look at the Backblaze Storage Pod Museum appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats for 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-2018/

Backblaze Hard Drive Stats for 2018

We published our first “Hard Drive Stats” report just over 5 years ago on January 21, 2014. We titled that report “What Hard Drive Should I Buy.” In hindsight, that might have been a bit of an overreach, but we were publishing data that was basically non-existent otherwise.

Many people like our reports, some don’t, and some really don’t — and that’s fine. From the beginning, the idea was to share our experience and use our data to shine a light on the otherwise opaque world of hard disk drives. We hope you have enjoyed reading our reports and we look forward to publishing them for as long as people find them useful.

Thank you.

As of December 31, 2018, we had 106,919 spinning hard drives. Of that number, there were 1,965 boot drives and 104,954 data drives. This review looks at the hard drive failure rates for the data drive models in operation in our data centers. In addition, we’ll take a look at the new hard drive models we’ve added in 2018 including our 12 TB HGST and 14 TB Toshiba drives. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

2018 Hard Drive Failure Rates: What 100,000+ Hard Drives Tell Us

At the end of 2018 Backblaze was monitoring 104,954 hard drives used to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 104,778 hard drives. The table below covers what happened just in 2018.

2018 annualized hard drive failure rates

Notes and Observations

If a drive model has a failure rate of 0%, it means there were no drive failures of that model during 2018.

For 2018, the Annualized Failure Rate (AFR) stated is usually pretty solid. The exception is when a given drive model has a small number of drives (fewer than 500) and/or a small number of drive days (fewer than 50,000). In these cases, the APR can be too wobbly to be used reliably for buying or retirement decisions.

There were 176 drives (104,954 minus 104,778) that were not included in the list above. These drives were either used for testing or we did not have at least 45 drives of a given model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. This is a historical number based on the number of drives needed to fill one Backblaze Storage Pod (version 5 or earlier).

The Annualized Failure Rate (AFR) for 2018 for all drive models was just 1.25%, well below the rates from previous years as we’ll discuss later on in this review.

What’s New in 2018

In 2018 the big trend was hard drive migration: replacing lower density 2, 3, and 4 TB drives, with 8, 10, 12, and in Q4, 14 TB drives. In 2018 we migrated 13,720 hard drives and we added another 13,389 hard drives as we increased our total storage from about 500 petabytes to over 750 petabytes. So in 2018, our data center techs migrated or added 75 drives a day on average, every day of the year.

Here’s a quick review of what’s new in 2018.

  • There are no more 4 TB Western Digital drives; the last of them was replaced in Q4. This leaves us with only 383 Western Digital drives remaining — all 6 TB drives. That’s 0.37% of our drive farm. We do have plenty of drives from HGST (owned by WDC), but over the years we’ve never been able to get the quantity of Western Digital drives we need at a reasonable price.
  • Speaking of HGST drives, in Q4 we added 1,200 HGST 12 TB drives (model: HUH721212ALN604). We had previously tested these drives in Q3 with no failures, so we have filled a Backblaze Vault with 1,200 drives. After about one month we’ve only had one failure, so they are off to a good start.
  • The HGST drives have a ways to go as in Q4 we also added 6,045 Seagate 12 TB drives (model: ST12000NM0007) to bring us to 31,146 of this drive model. That’s 29.7% of our drive farm.
  • Finally in Q4, we added 1,200 Toshiba 14 TB drives (model: MG07ACA14TA). These are helium-filled PMR (perpendicular magnetic recording) drives. The initial annualized failure rate (AFR) is just over 3%, which is similar to the other new models and we would expect the AFR to drop over time as the drives settle in.

Comparing Hard Drive Failure Rates Over Time

When we compare Hard Drive stats for 2018 to previous years two things jump out. First, the migration to larger drives, and second, the improvement in the overall annual failure rate each year. The chart below compares each of the last three years. The data for each year is inclusive of that year only.

Annualized Hard Drive Failure Rates by Year

Notes and Observations

  • In 2016 the average size of hard drives in use was 4.5 TB. By 2018 the average size had grown to 7.7 TB.
  • The 2018 annualized failure rate of 1.25% was the lowest by far of any year we’ve recorded.
  • None of the 45 Toshiba 5 TB drives (model: MD04ABA500V) has failed since Q2 2016. While the drive count is small, that’s still a pretty good run.
  • The Seagate 10 TB drives (model: ST10000NM0086) continue to impress as their AFR for 2018 was just 0.33%. That’s based on 1,220 drives and nearly 500,000 drive days, making the AFR pretty solid.

Lifetime Hard Drive Stats

While comparing the annual failure rates of hard drives over multiple years is a great way to spot trends, we also look at the lifetime annualized failure rates of our hard drives. The chart below is the annualized failure rates of all of the drives currently in production.

Annualized Hard Drive Failure Rates for Active Drives

Hard Drive Stats Webinar

We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2018” on Thursday, January 24, 2019 at 10:00 Pacific time. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up today.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the CSV file.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats for 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How We Optimized Storage and Performance of Apache Cassandra at Backblaze

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

Guest post by Mick Semb Wever

Backblaze uses Apache Cassandra, a high-performance, scalable distributed database to help manage hundreds of petabytes of data. We engaged the folks at The Last Pickle to use their extensive experience to optimize the capabilities and performance of our Cassandra 3.11 cluster, and now they want to share their experience with a wider audience to explain what they found. We agree; enjoy!

— Andy

Wide Partitions in Apache Cassandra 3.11

by Mick Semb Wever, Consultant, The Last Pickle

Wide partitions in Cassandra can put tremendous pressure on the Java heap and garbage collector, impact read latencies, and can cause issues ranging from load shedding and dropped messages to crashed and downed nodes.

While the theoretical limit on the number of cells per partition has always been two billion cells, the reality has been quite different, as the impacts of heap pressure show. To mitigate these problems, the community has offered a standard recommendation for Cassandra users to keep partitions under 400MB, and preferably under 100MB.

However, in version 3 many improvements were made that affected how Cassandra handles wide partitions. Memtables, caches, and SSTable components were moved off-heap, the storage engine was rewritten in CASSANDRA-8099, and Robert Stupp made a number of other improvements listed under CASSANDRA-11206.

While working with Backblaze and operating a Cassandra version 3.11 cluster, we had the opportunity to test and validate how Cassandra actually handles partitions with this latest version. We will demonstrate that well designed data models can go beyond the existing 400MB recommendation without nodes crashing through heap pressure.

Below, we walk through how Cassandra writes partitions to disk in 3.11, look at how wide partitions impact read latencies, and then present our testing and verification of wide partition impacts on the cluster using the work we did with Backblaze.

The Art and Science of Writing Wide Partitions to Disk

First we need to understand what a partition is and how Cassandra writes partitions to disk in version 3.11.

Each SSTable contains a set of files, and the (–Data.db) file contains numerous partitions.

The layout of a partition in the –Data.db file has three components: a header, followed by zero or one static rows, which is followed by zero or more ordered Clusterable objects. The Clusterable object in this file may either be a row or a RangeTombstone that deletes data with each wide partition containing many Clusterable objects. For an excellent in-depth examination of this, see Aaron’s blog post Cassandra 3.x Storage Engine.

The –Index.db file stores offsets for the partitions, as well as the IndexInfo serialized objects for each partition. These indices facilitate locating the data on disk within the –Data.db file. Stored partition offsets are represented by a subclass of the RowIndexEntry. This subclass is chosen by the the ColumnIndex and depends on the size of the partition:

  • RowIndexEntry is used when there are no Clusterable objects in the partition, such as when there is only a static row. In this case there are no IndexInfo objects to store and so the parent RowIndexEntry class is used rather than a subclass.
  • The IndexEntry subclass holds the IndexInfo objects in memory until the partition has finished writing to disk. It is used in partitions where the total serialized size of the IndexInfo objects is less than the column_index_cache_size_in_kb configuration setting (which defaults to 2KB).
  • The ShallowIndexEntry subclass serializes IndexInfo objects to disk as they are created and references these objects using only their position in the file. This is used in partitions where the total serialized size of the IndexInfo objects is more than the column_index_cache_size_in_kb configuration setting.

These IndexInfo objects provide a sampling of positional offsets for rows within a partition, creating an index. Each object specifies the offset the page starts at, the first row and the last row.

So, in general, the bigger the partition, the more IndexInfo objects need to be created when writing to disk — and if they are held in memory until the partition is fully written to disk they can cause memory pressure. This is why the column_index_cache_size_in_kb setting was added in Cassandra 3.6 and the objects are now serialized as they are created.

The relationship between partition size and the number of objects was quantified by Robert Stupp in his presentation, Myths of Big Partitions.

IndexInfo numbers from Robert Stupp

How Wide Partitions Impact Read Latencies

Cassandra’s key cache is an optimization that is enabled by default and helps to improve the speed and efficiency of the read path by reducing the amount of disk activity per read.

Each key cache entry is identified by a combination of the keyspace, table name, SSTable, and the partition key. The value of the key cache is a RowIndexEntry or one of its subclasses — either IndexedEntry or the new ShallowIndexedEntry. The size of the key cache is limited by the key_cache_size_in_mb configuration setting.

When a read operation in the storage engine gets a cache hit it avoids having to access the –Summary.db and –Index.db SSTable components, which reduces that read request’s latency. Wide partitions, however, can decrease the efficiency of this key cache optimization because fewer hot partitions will fit into the allocated cache size.

Indeed, before the ShallowIndexedEntry was added in Cassandra version 3.6, a single wide row could fill the key cache, reducing the hit rate efficiency. When applied to multiple rows, this will cause greater churn of additions and evictions of cache entries.

For example, if the IndexEntry for a 512MB partition contains 100K+ IndexInfo objects and if these IndexInfo objects total 1.4MB, then the key cache would only be able to hold 140 entries.

The introduction of ShallowIndexedEntry objects changed how the key cache can hold data. The ShallowIndexedEntry contains a list of file pointers referencing the serialized IndexInfo objects and can binary search through this list, rather than having to deserialize the entire IndexInfo objects list. Thus when the ShallowIndexedEntry is used no IndexInfo objects exist within the key cache. This increases the storage efficiency of the key cache in storing more entries, but does still require that the IndexInfo objects are binary searched and deserialized from the –Index.db file on a cache hit.

In short, on wide partitions a key cache miss still results in two additional disk reads, as it did before Cassandra 3.6, but now a key cache hit incurs a disk read to the -Index.db file where it did not before Cassandra 3.6.

Object Creation and Heap Behavior with Wide Partitions in 2.2.13 vs 3.11.3

Introducing the ShallowIndexedEntry into Cassandra version 3.6 creates a measurable improvement in the performance of wide partitions. To test the effects of this and the other performance enhancement features introduced in version 3 we compared how Cassandra 2.2.13 and 3.11.3 performed when one hundred thousand, one million, or ten million rows were each written to a single partition.

The results and accompanying screenshots help illustrate the impact of object creation and heap behavior when inserting rows into wide partitions. While version 2.2.13 crashed repeatedly during this test, 3.11.3 was able to write over 30 million rows to a single partition before Cassandra Out-of-Memory crashed. The test and results are reproduced below.

Both Cassandra versions were started as single-node clusters with default configurations, excepting heap customization in the cassandra–env.sh:

MAX_HEAP_SIZE=”1G”
HEAP_NEWSIZE=”600M”

In Cassandra only the configured concurrency of memtable flushes and compactors determines how many partitions are processed by a node and thus pressuring its heap at any one time. Based on this known concurrency limitation, profiling can be done by inserting data into one partition against one Cassandra node with a small heap. These results extrapolate to production environments.

The tlp-stress tool inserted data in three separate profiling passes against both versions of Cassandra, creating wide partitions of one hundred thousand (100K), one million (1M), or ten million (10M) rows.

A tlp-stress profile for wide partitions was written, as no suitable profile existed. The read to write ratio used the default setting of 1:100.

The following command lines then implemented the tlp-stress tool:

# To write 100000 rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 100K# To write 1M rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 1M# To write 10M rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 10M

Each time tlp-stress executed it was immediately followed by a command to ensure the full count of specified rows passed through the memtable flush and were written to disk:

nodetool flush

The graphs in the sections below, taken from the Apache NetBeans Profiler, illustrate how the ShallowIndexEntry in Cassandra version 3.11 avoids keeping IndexInfo objects in memory.

Notably, the IndexInfo objects are instantiated far more often, but are referenced for much shorter periods of time. The Garbage Collector is more effective at removing short-lived objects, as illustrated by the GC pause times being barely present in the Cassandra 3.11 graphs compared to Cassandra 2.2 where GC pause times overwhelm the JVM.

Wide Partitions in Cassandra 2.2

Benchmarks were against Cassandra 2.2.13

One Partition with 100K Rows (2.2.13)

The following three screenshots shows the number of IndexInfo objects instantiated during the write benchmark, during compaction, and a heap profile.

The partition grew to be ~40MB.

Objects created during tlp-stress

screenshot of Cassandra 2.2 objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 2.2 objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 2.2 Heap profiled during tlp-stress and major compaction

The above diagrams do not have their x-axis expanded to the full width, but still encompass the startup, stress test, flush, and compaction periods of the benchmark.

When stress testing starts with tlp-stress, the CPU Time and Surviving Generations starts to climb. During this time the heap also starts to increase and decrease more frequently as it fills up and then the Garbage Collector cleans it out. In these diagrams the garbage collection intervals are easy to identify and isolate from one another.

One Partition with 1M Rows (2.2.13)

Here, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Times and the heap profile from the time writes started through when the compaction was completed.

The partition grew to be ~400MB.

Already at this size the Cassandra JVM is GC thrashing and has occasionally Out-of-Memory crashed.

Objects created during tlp-stress

screenshot of Cassandra 2.2.13 Objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 2.2.13 Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 2.2.13 Heap profiled during tlp-stress and major compaction

The above diagrams display a longer running benchmark, with the quiet period during the startup barely noticeable on the very left-hand side of each diagram. The number of garbage collection intervals and the oscillations in heap size are far more frequent. The GC Pause Time during the stress testing period is now consistently higher and comparable to the CPU Time. It only dissipates when the benchmark performs the flush and compaction.

One Partition with 10M Rows (2.2.13)

In this final test of Cassandra version 2.2.13, the results were difficult to reproduce reliably, as more often than not this test Out-of-Memory crashed from GC heap pressure.

The first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the GC Pause Time and the heap profile from the time writes started until compaction was completed.

The partition grew to be ~4GB.

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra Heap profiled during tlp-stress and major compaction

The above diagrams display consistently very high GC Pause Time compared to CPU Time. Any Cassandra node under this much duress from garbage collection is not healthy. It is suffering from high read latencies, could become blacklisted by other nodes due to its lack of responsiveness, and even crash altogether from Out-of-Memory errors (as it did often during this benchmark).

Wide Partitions in Cassandra 3.11.3

Benchmarks were against Cassandra 3.11.3

In this series, the graphs demonstrate how IndexInfo objects are created either from memtable flushes or from deserialization off disk. The ShallowIndexEntry is used in Cassandra 3.11.3 when deserializing the IndexInfo objects from the -Index.db file.

Neither form of IndexInfo objects reside long in the heap and thus the GC Pause Time is barely visible in comparison to Cassandra 2.2.13 despite the additional numbers of IndexInfo objects created via deserialization.

One Partition with 100K Rows (3.11.3)

As with the earlier version test of this size, the following two screenshots shows the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile from the time writes started through when the compaction was completed.

The partition grew to be ~40MB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

screenshot of Cassandra 3.11.3 objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 3.11.3 objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 3.11.3 Heap profiled during tlp-stress and major compaction

The diagrams above are roughly comparable to the first diagrams presented under Cassandra 2.2.13, except here the x-axis is expanded to full width. Note there are significantly more instantiated IndexInfo objects, but barely any noticeable GC Pause Time.

One Partition with 1M Rows (3.11.3)

Again, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile over the time writes started until the compaction was completed.

The partition grew to be ~400MB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

The above diagrams show a wildly oscillating heap as many IndexInfo objects are created, and shows many garbage collection intervals, yet the GC Pause Time remains low, if at all noticeable.

One Partition with 10M Rows (3.11.3)

Here again, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile over the time writes started until the compaction was completed.

The partition grew to be ~4GB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

Unlike this profile in 2.2.13, the cluster remains stable as it was when running 1M rows per partition. The above diagrams display an oscillating heap when IndexInfo objects are created, and many garbage collection intervals, yet GC Pause Time remains low, if at all noticeable.

Maximum Rows in 1GB Heap (3.11.3)

In an attempt to push Cassandra 3.11.3 to the limit, we ran a test to see how much data could be written to a single partition before Cassandra Out-of-Memory crashed.

The result was 30M+ rows, which is ~12GB of data on disk.

This is similar to the limit of 17GB of data written to a single partition as Robert Stupp found in CASSANDRA-9754 when using a 5GB Java heap.

screenshot of Cassandra 3.11.3 memory usage

What about Reads

The following graph reruns the benchmark on Cassandra version 3.11.3 over a longer period of time with a read to write ratio of 10:1. It illustrates that reads of wide partitions do not create the heap pressure that writes do.

screenshot of Cassandra 3.11.3 read functions

Conclusion

While the 400MB community recommendation for partition size is clearly appropriate for version 2.2.13, version 3.11.3 shows that performance improvements have created a tremendous ability to handle wide partitions and they can easily be an order of magnitude larger than earlier versions of Cassandra without nodes crashing through heap pressure.

The trade-off for better supporting wide partitions in Cassandra 3.11.3 is increased read latency as row offsets now need to be read off disk. However, modern SSDs and kernel pagecaches take advantage of larger configurations of physical memory providing enough IO improvements to compensate for the read latency trade-offs.

The improved stability and falling back on better hardware to deal with the read latency issue allows Cassandra operators to worry less about how to store massive amounts of data in different schemas and unexpected data growth patterns on those schemas.

Some CASSANDRA-9754 custom B+ tree structures will be used to more effectively look up the deserialised row offsets and further avoid the deserialization and instantiation of short-lived unused IndexInfo objects.


Mick Semb WeverMick Semb Wever designs, builds, and is an evangelist for distributed systems, from data-driven backends using Cassandra, Hadoop, Spark, to enterprise microservices platform.

The post How We Optimized Storage and Performance of Apache Cassandra at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

2018 in the Rear View Mirror

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/2018-in-the-rear-view-mirror/

2018 Year in Review

Thank you to all of our customers and friends. You’ve made 2018 a great year for Backblaze. Here’s a quick look at what we’ve been up to in the past year.

Behind the Scenes

Backblaze likes to be transparent in how we do business. Here are just a few areas where we pointed the light on ourselves:

Storage: We started the year with 500 petabytes of data storage. We’ll finish with over 750 petabytes of storage under management — next up, our first exabyte.

Durability: The durability of Backblaze B2 is eleven 9’s. Here’s how we calculated that number and what 99.999999999 really means for you.

Hard Drive Stats: We continue to publish our Hard Drive Stats reports each quarter, detailing the failure rates of the hard drives in the Backblaze data centers. Here’s the most recent report for Q3 2018 and here’s everything since we started.

$30 Million ARR: Backblaze got to $30 million annualized recurring revenue with only $3 million in funding. For some reason, some companies insist on doing the opposite.

Making Lemonade: Peer into Yev’s mind and see how he tackles the wild ride that is social media management.

Data Center PDUs: Read how one of our data center technicians solved the problem of too many cables in the wrong place in the data center.

The Startup CEO: Our CEO continued to publish a series of blog posts on the lessons learned from starting and operating Backblaze for the past 11 years.

Helium and Hard Drives: Take a look at how helium affects the hard drives we use.

People

Over this last year Backblaze hired 34 new people, let’s welcome: Janet, Morgan, John, Cheryl, Ebony, Athrea, Cameron, Skip, John, Vanna, Tina, Daniel, Jyotsna, Jack, Tim, Elliott, Josh, Steven, Victoria, Daren, Billy, Jacob, Nathan, Michele, Matt, Lin, Alex, and a few of others who choose to remain unnamed. Let’s not forget our 2018 interns; Nelly, Kelly, Angie, and Colin.

Come join us, we have some great jobs open for the right people.

Backblaze people 2011Backblaze people 2018

One thing that happens when you add people is you need a place for them to work. Sometimes that means people have to move around. Back in 2012, Yev and I staked out the marketing corner in the office, complete with a view of the back alley. It was our work home for over six years, but we recently had to say goodbye. The marketing group had grown too big for our little corner and we all had to move to a new location in the office. Sigh.

Marketing corner

Fun

MegaBot versus Backblaze: We gave a few busted Backblaze Storage Pods to a 30 foot tall robot who crushes other robots for a living. What could possibly go wrong?

Blog Noir: Reggie, the Macbook, had croaked. Could our hero use Backblaze to recover Reggie’s data from beyond the grave?

Backblaze Bling: We made a shiny version of Backblaze that has no additional features and is ridiculously more expensive version of Backblaze, but its only for sale on April 1st. Be the first on you block to overpay.

Holiday Wishes There’s a gift on the list for everyone on your shopping list.

Numbers

Behind every business there are a lot of numbers, here are a few we thought might be interesting.

11 years — Over the last 11 years, Backblaze has conducted our annual backup awareness survey looking at how often people back up their computers.

6% — In 2018, a mere 6% of the respondents backed up all the data on their computers at least once a day.

35 billion — The number of files Backblaze has restored for our Consumer Backup and Business Backup customers since we started keeping track in 2011.

876,388,286 — The number of Consumer and Business Backup files restored by our customers just in November 2018. That’s 29.2 million files per day or 1.2 million files per hour or 20,868 files per minute. That’s a lot of memories and musings returned to their rightful owners.

104,527 — The number of spinning hard drives in our data center, including data drives and boot drives.

1,920 — The number of Backblaze Storage Pods in use today. Nearly all are deployed in Backblaze Vaults.

Onward

2018 was a good year for Backblaze. Growth was not too fast and not too slow, but just right. We are all looking forward to 2019 and continuing to keep people’s data backed up, safe, and ready for when it’s needed.

The post 2018 in the Rear View Mirror appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LTO Versus Cloud Storage Costs — the Math Revealed

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/lto-versus-cloud-storage/

B2 Cloud Storage $68,813 vs. LTO 8 Tape $119,873

A few months back we did a blog post titled, LTO versus Cloud Storage: Choosing the Model That Fits Your Business. In that post we presented our version of an LTO vs. B2 Cloud Storage calculator, a useful tool to determine whether or not it makes economic sense to consider using cloud storage over your LTO storage.

Rather than just saying, “trust us, it’s cheaper,” we thought it would be a good idea to show you what’s inside the model: the assumptions we used, the variables we defined, and the actual math we used to compute our answers. In fact, we’re making the underlying model available for download.

Our Model: LTO vs Cloud Storage

The LTO vs. B2 calculator that is on our website was based on a Microsoft Excel spreadsheet we built. The Excel file we’ve provided for download below is completely self-contained; there are no macros and no external data sources.

Download Excel file: Backblaze-LTO-Calculator-Public-Nov2018.xlsx

The spreadsheet is divided into multiple sections. In the first section, you enter the four values the model needs to calculate the LTO and B2 cloud storage costs. The website implementation is obviously much prettier, but the variables and math are the same as the spreadsheet. Let’s look at the remaining sections.

Entered Values Section

The second section is for organization and documentation of the data that is entered. You also can see the limits we imposed on the data elements.

One question you may have is why we limited the Daily Incremental Backup value to 10 TB. As the comment notes, that’s about as much traffic you can cram through a 1Gbps upload connection in a 24-hour period. If you have bigger (or smaller) pipes, adjust accordingly.

Don’t use the model for one-time archives. You may be tempted to enter zeros in both the Yearly Added Data and Daily Incremental Backup fields to compare the cost of a one-time archive. The model is not designed to compare the cost of a one-time archive. It will give you an answer, but the LTO costs will be overstated by anywhere from 10%-50%. The model was designed for the typical LTO use case where data is written to tape, typically daily, based on the data backup plan.

Variables Section

The third section stores all the variable values you can play with in the model. There is a short description for each variable, but let’s review some general concepts:

Tapes — We use LTO-8 tapes that will decrease in cost about 20% per year down to $60. Non-compressed, these tapes store 12 TB each and take about 9.5 hours to fully load. We use 24 TB for each tape assuming 2:1 compression. If some or all of your data is comprised of video or photos, then compression cannot be used, which makes actual tape capacity number much lower and increases the cost of the LTO solution.

Tapes Used — Based on the grandfather-father-son (GFS) model and assumes you replace tapes once a year.

Maintenance — Assumes you have no spare units, so you cannot miss more than one business day for backups. You could add a spare unit and remove the maintenance or just decide it is OK to miss a day or two while the unit is being repaired.

Off-site Storage — The cost of getting your tapes off-site (and back) assuming a once a week pick-up/drop-off.

Personnel — The cost of the person doing the LTO work, and how much time per week they spend doing the LTO related work, including data restoration. The cost of a person doing the cloud storage work is calculated from this value as described in the Time Savings paragraph below.

Data Restoration — How much of your data on average you will restore each month. The model is a bit limited here in that we use an average for all time periods when downloads are typically uneven across time. You are, of course, welcome to adjust the model. One thing to remember is that you’ll want to test your restore process from time to time, so make sure you allocate resources for that task.

Time Savings — We make the assumption that you will only spend 25% of the time working with cloud storage versus managing and maintaining an LTO system, i.e. no more buying, mounting, unmounting, labeling, cataloging, packaging, reading, or writing tapes.

Model Section

The last section is where the math gets done. Don’t change specific values in this section as they all originate in previous sections. If you decide to change a formula, remember to do so across all 10 years. It is quite possible that many of these steps can be combined into more complex formulas. We break them out to try to make an already complicated calculation somewhat easier to follow. Let’s look at the major subsections.

Data Storage — This section is principally used to organize the different data types and amounts. The model does not apply any corporate data retention policies such as deleting financial records after seven years. Data that is deleted is done so solely based on the GFS backup model, for example, deleting incremental data sets after 30 days.

LTO Costs — This starts with defining the amount of data to store, then calculates the quantity of tapes needed and their costs, along with the number of drive units and their annual unit cost and annual maintenance cost. The purchase price of a tape drive unit is divided evenly over a 10-year period.

Why 10 years? The LTO foundation, states is will support LTO tapes two versions back and expects to release a new version every two years. If you buy an LTO-8 system is 2018, in 2024 LTO-11 will not be able to read your LTO-8 tapes. You are now using obsolete hardware. We assume your LTO-8 hardware will continue to be supported through third party vendors for at least four years (to 2028) after it goes obsolete.

We finish up with calculating the cost of the off-site storage service and finally the personnel cost of managing the system and maintaining the tape library. Other models seem to forget this cost or just assume it is the same as your cloud storage personnel costs.

Cloud Storage Costs — We start with calculating the cost to store the data. This uses the amount of data at the end of the year, versus trying to compute monthly numbers throughout the year. This overstates the total amount a bit, but simplifies the math without materially changing the results. We then calculate the cost to download the data, again using the number at the end of the period. We calculate the incremental cost of enhancing the network to send and restore cloud data. This is an incremental cost, not the total cost. Finally, we add in the personnel cost to access and check on the cloud storage system as needed.

Result Tables — These are the totals from the LTO and cloud storage section in one place.

B2 Fireball Section

There is a small section and some variables associated with the B2 Fireball data transfer service. This service is useful to transfer large amounts of data from your organization to Backblaze. There is a cost for this service of $550 per month to rent the Fireball, plus $75 for shipping. Organizations with existing LTO libraries often don’t want to use their network bandwidth to transfer their entire library, so they end up keeping some LTO systems just to read their archived tapes. The B2 Fireball can move the data in the library quickly and let you move completely away from LTO if desired.

Summary

While we think the model is pretty good there is always room for improvement. If you have any thoughts you’d like to share, let us know in the comments. One more thing: the model is free to update and use within your organization, but if you publicize it anywhere please cite Backblaze as the original source.

The post LTO Versus Cloud Storage Costs — the Math Revealed appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Buying a Hard Drive this Holiday Season? These Tips Will Help

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-buying-guide/

Hard drives with bows
Over the last few years we’ve shared many observations in our quarterly Hard Drive Stats reports that go beyond the hard drive failure rates. We decided to consolidate some of these additional observations into one post just in time for the holiday buying season. If you have “buy a hard drive” on your shopping list this holiday season, here is just about everything we know about hard disk drives.

First, let’s establish that we are talking about hard disk drives (HDDs) here and not solid state drives (SSDs). Here’s a Backblaze “What’s the Diff” blog post where we discuss the differences between HDD and SSD drives.

How Will You Use Your HDD?

Hard drive manufacturers build drive models for different use cases; that is, a given drive model is optimized for a given purpose. For example, a consumer drive may spin slower to save energy and provides little if any access to tools that can adjust the firmware settings on the drive. An enterprise class drive, on the other hand, is typically much faster and provides the user with access to features they can tweak to adjust performance and/or power usage.

Each drive manufacturer has their own criteria for their use cases, but in general there are five categories: consumer, NAS (network attached storage), archiving/video recording, enterprise, and more recently, data center. The different drive manufacturers have different variations on these categories, so the first thing you should do is to know what you are going to do with the drive before you start looking.

Hard Drive Recording Technologies

For a long time, the recording technology a drive manufacturer used was not important. Then SMR (shingled magnetic recording) drives appeared a couple of years ago.

Let’s explain:

PMR: Perpendicular Magnetic Recording
This is the technology inside of most hard drives. With PMR data is written to and read from circular tracks on a spinning platter.
SMR: Shingled Magnetic Recording
This type of drive overlaps recording tracks to store data at a lower cost than PMR technology. The downside occurs when data is deleted and that space is reused. If existing data overlaps the space you want to reuse, this can mean delays in writing the new data. These drives are great for archive storage (write once, read many) use cases, but if your files turn over with some regularity, stick with PMR drives.

That sounds simple, but here are two things you should know:

  1. SMR drives are often the least expensive drives available when you consider the cost per gigabyte. If you are price sensitive, you may believe you are getting a great deal, but you may be buying the wrong drive for your use case. For example, buying SMR drives for your NAS device running RAID 6 would be ugly because of all the rewrites that may be involved.
  2. It is sometimes really hard to figure out if the drive you want to buy is an SMR or PMR drive. For example, based on the cost per gigabyte, the 8TB Seagate external drive (model: STEB8000100) is one of the least expensive external drives out there right now. But, the 8TB drive inside is an SMR drive, and that fact is not obvious to the buyer. To be fair, the manufacturers try to guide buyers to the right drive for their use case, but a lot of that guiding information is lost on reseller sites such as Amazon and Newegg, where the buyer is often blinded by price.

Over the next couple of years, HAMR (heat-assisted magnetic recording) by Seagate and MAMR (microwave-assisted magnetic recording) by Western Digital will be introduced, making the drive selection process even more complicated.

What About Refurbished Drives?

Refurbished drives are hard drives that have been returned to the manufacturer and repaired in some way to make them operational. Given the cost, repairs are often limited to what can be done in the software or firmware of the failed drive. For example, the repair may consist of identifying a section of bad media on a drive platter and telling the drive to read and write around it.

Once repaired, refurbished drives are tested and often marked certified by the manufacturer, e.g. “Certified Refurbished.” Refurbished drives are typically less expensive and come with a limited warranty, often one year or less. You can decide if you want to use these types of drives in your environment.

Helium-Filled versus Air-Filled Drives

Helium-filled drives are finally taking center stage after spending years as an experimental technology. Backblaze has in part used helium-filled drives since 2015, and over the years we’ve compared helium-filled drives to air-filled drives. Here’s what we know so far.

The first commercial helium-filled drives were 6TB; the transition to helium took hold at 8TB as we started seeing helium-filled 8TB drives from every manufacturer. Today helium-filled 12 and 14TB drives are now available at a reasonable price per terabyte.

Helium drives have two advantages over their air-filled cohorts: they create less heat and they use less power. Both of these are important in data centers, but may be less important to you, especially when you consider the primary two disadvantages: a higher cost and lack of experience. The street-price premium for a helium-filled drive is roughly 20% right now versus an air-filled drive of the same size. That premium is expected to decrease as time goes on.

While price is important, the lack of experience of helium-filled drives may be more interesting as these drives have only been in the field in quantity a little over four years. That said, we have had helium-filled drives in service for 3.5 years. They are solid performers with a 1.2% annualized failure rate and show no signs of hitting the wall.

Enterprise versus Consumer Drives

In our Q2 2018 Hard Drive Stats report we delved into this topic, so let’s just summarize some of the findings below.

We have both 8TB consumer and enterprise models to compare. Both models are from Seagate. The consumer drive is model ST800DM002 and the Enterprise drive model is ST800NM0055. The chart below, from the Q2 2018 report, shows the failure rates for each of these drive models at the same average age of all of the drives of the specified model.

Annualized Hard Drive Failure Rates by Time table

When you constrain for the average age of each of the drive models, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much. By the way, conducting the same analysis at an average age of 15 months showed little change, with the consumer drive recording a 1.10% AFR and the enterprise drive holding at 0.97% AFR.

Whether every enterprise model is better than every corresponding consumer model is unknown, but below are a few reasons you might choose one class of drive over another:

Enterprise Class Drives

  • Longer Warranty: 5 years vs. 2 years
  • More Accessible Features, i.e. Seagate PowerChoice technology
  • Faster reads and writes

Consumer Class Drives

  • Lower Price: Up to 50% less
  • Similar annualized failure rates as enterprise drives
  • Uses less power and produces less heat

Hard Drive Failure Rates

As many of you know, each quarter Backblaze publishes our Hard Drive Stats report for the hard drives in our data centers. Here’s the lifetime chart from our most recent Q3 2018 report.

Backblaze Lifetime Hard Drive Failure Rates table

Along with the report, we also publish the data we used to create the reports. We are not alone. Let’s look at the various ways you can find hard drive failure rates for the drive you wish to purchase.

Backblaze AFR (annualized failure rate)
The failure rate of a given hard drive model based on the number of days a drive model is in use and the number of failures of that drive model. Here’s the formula:

( ( Drive Failures / ( Drive Days / 365 ) ) * 100 )
MTBF (mean time between failures)
TBF is the term some disk drive manufacturers use to quantify disk drive average failure rates. It is the average number of service hours between failures. This is similar to MTTF (mean time to failure), which is the average time to the first failure. MTBF has been superseded by AFR for some drive vendors as described below.
AFR (Seagate and Western Digital)
These manufacturers have decided to replace MTBF with AFR. Their definition of AFR is the probable percent of failures per year, based on the manufacturer’s total number of installed units of similar type. While Seagate and WD don’t give the specific formula for calculating AFR, Seagate notes that AFR is similar to MTBF and differs only in units. One way for converting MTBF to AFR can be found here.
Comparing Backblaze AFR to the Seagate/WD AFR
The Backblaze environment is a closed system, meaning we know with a high degree of certainty the variables we need to compute the Backblaze AFR percentage. We also know most, if not all, of the mitigating factors. The Seagate/WD AFR environment is made up of potentially millions of drives in the field (home, office, mobile, etc.) where the environmental variables can be quite varied and in some cases unknown. Either of the AFR calculations can be considered as part of your evaluation if you are comfortable with how they are calculated.
CDL (component design life)
This term is used by Western Digital in their support knowledge base although we don’t see it in their technical specifications yet. The example provided in the knowledge base article is, “The Component Design Life of the drive is 5 years and the Annualized Failure Rate is less than 0.8%.” With those two numbers you can calculate that no more than four out of 100 drives will die in a five-year period. The is really good information, but it is not readily available yet.

Which Hard Drive Do I Need?

While hard drive failure rates are interesting, we believe that our Hard Drive Stats reports are just one of the factors to consider in your hard drive buying decision. Here are some things you should think about, in no particular order:

  • Your use case
    • What you will do with the drive.
  • What size drive do you need?
    • Using it as a Time Machine backup? It should be 3-4 times the size of your internal hard drive. Using it as an archive for your photo collection? — bigger is better.
  • How long do you want the drive to last?
    • Forever is not a valid answer. We suggest starting with the warranty period and subtracting a year if you move the drive around a lot or if you fill it up and stuff it in the closet.
  • The failure rate of the drive
    • We talked about that above.
  • What your friends think
    • You might get some good advice.
  • What the community thinks
    • reddit, Hacker News, Spiceworks, etc.
  • Product reviews
    • I read them, but only to see if there is anything else worth investigating via other sources.
  • Product review sites
    • These days, many review sites on the internet are pay-to-play, although not all. Pay-to-play means the vendor pays the site either for their review or if the review leads to a sale. Sometimes, whoever pays the most gets to the top of the list. This isn’t true for all sites, but often it is really hard to tell the good guys. One of our favorite sites, Tom’s Hardware, has stopped doing HDD reviews, so if you have a site you trust for such reviews, share it in the comments, we’d all like to know.
  • The drive manufacturer
    • Most drive manufacturer websites provide information that can help you determine the right drive for your use case. Of course, they are also trying to sell you a drive, but the information, especially the technical specs, can be useful.

What about price? We left that out of our list as many people start and end their evaluation with just price and we wanted to mention a few other things we thought could be important. Speaking of price…

What’s a Good Price for a Hard Drive?

Below is our best guess as to what you could pay over the next couple of months for different sized internal drives. Of course, there are bound to be some great discounts on Black Friday, Cyber Monday, Hanukkah, Christmas, Kwanzaa, Boxing Day, Winter Solstice, and Festivus — to name a few holiday season reasons for a sale on hard disk drives.

Drive SizePriceCost per GB
1TB$35$0.035
2TB$50$0.25
3TB$75$0.25
4TB$100$0.25
6TB$170$0.28
8TB$250$0.31
10TB$300$0.30
12TB$380$0.32
14TB$540$0.39

How Much Do External Hard Drives Cost?

We wanted to include the same information about external hard drives, but there is just too much unclear information to feel good about doing it. While researching this topic, we came across multiple complaints about a wide variety of external drive systems containing refurbished or used drives. In reviewing the advertisements and technical specs, the fact that the HDD inside an external drive sometimes is not new often gets left off the specifications. In addition, on Amazon and similar sites, many of the complaints were from purchases made via third party sellers and not the original external drive manufacturers, so check the “by” tag before buying.

Let’s make it easy: an external hard drive should have at least a two-year warranty and be available from a trusted source. The list price for the external drive should be about 10-15% higher than the same sized internal drive. What you will actually pay, the street price, is based on supply and demand and a host of other factors. Don’t be surprised if the cost of an external drive is sometimes less than a corresponding internal drive — that’s just supply and demand at work. Following this guidance doesn’t mean the drive won’t fail, it just means you’ll have better odds at getting a good external drive for your money.

One More Thing Before You Buy

The most important thing to consider when buying a hard drive is the value of the data on the drive and what it would cost to replace that data. If you have a good backup plan and practice the 3-2-1 backup strategy, then the value of a given drive is low and limited to the time and cost it takes to replace the drive that goes bad. That’s annoying, yes, but you still have your data. In other words, if you want to get the most for your money when buying a hard drive, have a good backup plan.

The post Buying a Hard Drive this Holiday Season? These Tips Will Help appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Migrating from CrashPlan: Arq and B2

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/migrating-crashplan-arq-backup-b2/

Arq and Backblaze B2 logos on a computer screen

Many ex-CrashPlan for Home users have moved to Backblaze over the last year. We gave them a reliable, set-and-forget backup experience for the amazing price of $5/month per computer. Yet some people wanted features such as network share backup and CrashPlan’s rollback policy, and Arq Backup can provide those capabilities. So we asked Stefan Reitshamer of Arq to tell us about his solution.

— Andy

Migrating from CrashPlan
by Stefan Reitshamer, Founder, Arq Backup

CrashPlan for Home is gone — no more backups to CrashPlan and no more ability to restore from your old backups. Time to find an alternative!

Arq + Backblaze B2 = CrashPlan Home

If you’re looking for many of the same features as CrashPlan plus affordable storage, Arq + B2 cloud storage is a great option. MacWorld’s review of Arq called it “more reliable and easier to use than CrashPlan.”

Just like CrashPlan for Home, Arq lets you choose your own encryption password. Everything is encrypted before it leaves your computer, with a password that only you know.

Also just like CrashPlan for Home, Arq keeps all backups forever by default. Optionally you can tell it to “thin” your backup records from hourly to daily to weekly as they age, similar to the way Time Machine does it. And/or you can set a budget and Arq will periodically delete the oldest backup records to keep your costs under control.

With Arq you can back up whatever you want — no limits. Back up your external hard drives, network shares, etc. Arq won’t delete backups of an external drive no matter how long it’s been since you’ve connected it to your computer.

The license for Arq is a one-time cost and, if you use multiple Macs and/or PCs, one license covers all of them. The pricing for B2 storage is a fraction of the cost of any scale cloud storage provider — just $0.005/GB per month and the first 10GB is free. To put that in context, that’s 1/4th the price of Amazon S3. The savings becomes more pronounced if/when you need to restore your files. B2 only charges a flat rate of $0.01/GB for data download, and you get 1 GB of downloads free every day. By contract, Amazon S3 has tiered pricing that starts at 9 times that of B2.

Arq’s Advanced Features

Arq is a mature product with plenty of advanced features:

  • You can tell Arq to pause backups whenever you’re on battery.
  • You can tell Arq to pause backups during a certain time window every day.
  • You can tell Arq to keep your computer awake until it finishes the backup.
  • You can restrict which Wi-Fi networks and which network interfaces Arq uses for backup.
  • You can restrict how much bandwidth Arq uses when backing up.
  • You can configure Arq to send you email every time it finishes backing up, or only if there were errors during backup.
  • You can configure Arq to run a script before and/or after backup.
  • You can configure Arq to back up to multiple B2 accounts if you wish. Back up different folders to different B2 accounts, configure different schedules for each B2 account, etc.

Arq is fully compatible with B2. You can configure it with your B2 account ID and master application key, or you can use B2’s new application keys feature to restrict which bucket(s) Arq can write to.

Privacy and Control

With Arq and B2 storage, you keep control of your data because it’s your B2 account and your encryption password — even if an attacker got access to the B2 data they wouldn’t be able to read your encrypted files. Your backups are stored in an open, documented format. There’s even an open-source restore tool.

The post Migrating from CrashPlan: Arq and B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q3 2018: Less is More

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/2018-hard-drive-failure-rates/

Backblaze Drive Stats Q3 2018

As of September 30, 2018 Backblaze had 99,636 spinning hard drives. Of that number, there were 1,866 boot drives and 97,770 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. In addition, we’ll say goodbye to the last of our 3TB drives, hello to our new 12TB HGST drives, and we’ll explain how we have 584 fewer drives than last quarter, but have added over 40 petabytes of storage. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q3 2018

At the end of Q3 2018, Backblaze was monitoring 97,770 hard drives used to store data. For our evaluation, we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 97,600 hard drives. The table below covers what happened in Q3 2018.

Backblaze Q3 2018 Hard Drive Failure Rates chart

Notes and Observations

  • If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q3 2018.
  • Quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
  • There were 170 drives (97,770 minus 97,600) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

When to Replace a Hard Drive

As noted, at the end of Q3 that we had 584 fewer drives, but over 40 petabytes more storage space. We replaced 3TB, 4TB, and even a handful of 6TB drives with 3,600 new 12TB drives using the very same data center infrastructure, i.e. racks of Storage Pods. The drives we are replacing are about 4 years old. That’s plus or minus a few months depending on how much we paid for the drive and a number of other factors. Keeping lower density drives in service when higher density drives are both available and efficiently priced does not make economic sense.

Why Drive Migration Will Continue

Over the next several years, data growth is expected to explode. Hard drives are still expected to store the bulk of that data, meaning cloud storage companies like Backblaze will have to increase capacity by either increasing existing storage density and/or building, or building out, more data centers. Drive manufacturers, like Seagate and Western Digital, are looking at HDD storage densities of 40TB as early as 2023, just 5 years away. It is significantly less expensive to replace lower density operational drives in a data center versus building a new facility or even building out an existing facility to house the higher density drives.

Goodbye 3TB WD Drives

For the last couple of quarters, we had 180 Western Digital 3TB drives (model: WD30EFRX) remaining — the last of our 3TB drives. In early Q3, they were removed and replaced with 12TB drives. These 3TB drives were purchased in the aftermath of the Thailand drive crisis and installed in mid-2014 and were still hard at work when we replaced them. Sometime over the next couple of years we expect to say goodbye to all of our 4TB drives and upgrade them to 14, 16, or even 20TB drives. After that it will be time to “up-density” our 6TB systems, then our 8TB systems, and so on.

Hello 12TB HGST Drives

In Q3 we added 79 HGST 12TB drives (model: HUH721212ALN604) to the farm. While 79 may seem like an unusual number of drives to add, it represents “stage 2” of our drive testing process. Stage 1 uses 20 drives, the number of hard drives in one Backblaze Vault tome. That is, there are are 20 Storage Pods in a Backblaze Vault, and there is one “test” drive in each Storage Pod. This allows us to compare the performance, etc., of the test tome to the remaining 59 production tomes (which are running already-qualified drives). There are 60 tomes in each Backblaze Vault. In stage 2, we fill an entire Storage Pod with the test drives, adding 59 test drives to the one currently being tested in one of the 20 Storage Pods in a Backblaze Vault.

To date, none of the 79 HGST drives have failed, but as of September 30th, they were installed only 9 days. Let’s see how they perform over the next few months.

A New Drive Count Leader

For the last 4 years, the drive model we’ve deployed the most has been the 4TB Seagate drive, model ST4000DM000. In Q3 we had 24,208 of this drive model, which is now only good enough for second place. The 12TB Seagate drive, model ST12000NM0007, became our new drive count leader with 25,101 drives in Q3.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models in operation as of September 30th, 2018. For each model, we compute their reliability starting from when they were first installed.

Backblaze Lifetime Hard Drive Failure Rates Chart

Notes and Observations

  • The failure rates of all of the larger drives (8, 10, and 12 TB) are very good: 1.21% AFR (Annualized Failure Rate) or less. In particular, the Seagate 10TB drives, which have been in operation for over 1 year now, are performing very nicely with a failure rate of 0.48%.
  • The overall failure rate of 1.71% is the lowest we have ever achieved, besting the previous low of 1.82% from Q2 of 2018.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q3 2018: Less is More appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze B2 API Version 2 Beta is Now Open

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-b2-api-version-2-beta-is-now-open/

cloud storage workflow image

Since B2 cloud storage was introduced nearly 3 years ago, we’ve been adding enhancements and new functionality to the B2 API, including capabilities like CORS support and lifecycle rules. Today, we’d like to introduce the beta of version 2 of the B2 API, which formalizes rules on application keys, provides a consistent structure for all API calls returning information about files, and cleans up outdated request parameters and returned data. All version 1 B2 API calls will continue to work as is, so no changes are required to existing integrations and applications.

The API Versions section of the B2 documentation on the Backblaze website provides the details on how the V1 and V2 APIs differ, but in the meantime here’s an overview into the what, why, and how of the V2 API.

What Has Changed Between the B2 Cloud Storage Version 1 and Version 2 APIs?

The most obvious difference between a V1 and V2 API call is the version number in the URL. For example:

https://apiNNN.backblazeb2.com/b2api/v1/b2_create_bucket

https://apiNNN.backblazeb2.com/b2api/v2/b2_create_bucket

In addition, the V2 API call may have different required request parameters and/or required response data. For example, the V2 version of b2_hide_file always returns accountId and bucketId, while V1 returns accountId.

The documentation for each API call will show whether there are any differences between API versions for a given API call.

No Change is Required For V1 Applications

With the introduction of V2 of the B2 API there will be V1 and V2 versions for every B2 API call. All applications using V1 API calls will continue to work with no change in behavior. In some cases, a given V2 API call will be different from its companion V1 API call as noted in the B2 API documentation. For the remaining API calls a given V1 API call and its companion V2 call will be the same, have identical parameters, return the same data, and have the same errors. This provides a B2 developer the flexibility to choose how to upgrade to the V2 API.

Obviously, if you want to use the functionality associated with a V2 API version, then you must use the V2 API call and update your code accordingly.

One last thing: beginning today, if we create a new B2 API call it will be created in the current API version (V2) and most likely will not be created in V1.

Standardizing B2 File Related API Calls

As requested by many B2 developers, the V2 API now uses a consistent structure for all API calls returning information about files. To enable this there are some V2 API calls that return additional fields, for example:

Restricted Application Keys

In August we introduced the ability to create restricted applications keys using the B2 API. This capability allows an account owner the ability to restrict who, how, and when the data in a given bucket can be accessed. This changed the functionality of multiple B2 API calls such that a user could create a restricted application key that could break a 3rd party integration to Backblaze B2. We subsequently updated the affected V1 API calls, so they could continue to work with the existing 3rd party integrations.

The V2 API fully implements the expected behavior when it comes to working with restricted application keys. The V1 API calls continue to operate as before.

Here is an example of how the V1 API and the V2 API will act differently as it relates to restricted application keys.

Set-up

  • The B2 account owner has created 2 public buckets, “Backblaze_123” and “Backblaze_456”
  • The account owner creates a restricted application key that allows the user to read the files in the bucket named “Backblaze_456”
  • The account owner uses the restricted application key in an application that uses the b2_list_buckets API call

In Version 1 of the B2 API

  • Action: The account owner uses the restricted application key (for bucket Backblaze_456) to access/list all the buckets they own (2 public buckets).
  • Result: The results returned are just for Backblaze_456 as the restricted application key is just for that bucket. Data about other buckets is not returned.

While this result may seem appropriate, the data returned did not match the question asked, i.e. list all buckets. V2 of the API ensures the data returned is responsive to the question asked.

In Version 2 of the B2 API

  • Action: The account owner uses the restricted application key (for bucket Backblaze_456) to access/list all the buckets they own (2 public buckets).
  • Result: A “401 unauthorized” error is returned as the request for access to “all” buckets does not match the restricted application key, e.g. bucket Backblaze_456. To achieve the desired result, the account owner can specify the name of the bucket being requested in the API call that matches the restricted application key.

Cleaning up the API

There are a handful of API calls in V2 where we dropped fields that were deprecated in V1 of the B2 API, but were still required. So in V2:

  • b2_authorize_account: The response no longer contains minimumPartSize. Use partSize and absoluteMinimumPartSize instead.
  • b2_list_file_names: The response no longer contains size. Use contentLength instead.
  • b2_list_file_versions: The response no longer contains size. Use contentLength instead.
  • b2_hide_file: The response no longer contains size. Use contentLength instead.

Support for Version 1 of the B2 API

As noted previously, V1 of the B2 API continues to function. There are no plans to stop supporting V1. If at some point in the future we do deprecate the V1 API, we will provide advance notice of at least one year before doing so.

The B2 Java SDK and the B2 Command Tool

Both the B2 Java SDK and the B2 Command Line Tool, do not currently support Version 2 of B2 API. They are being updated and will support the V2 API at the time the V2 API exits Beta and goes GA. Both of these tools, and more, can be found in the Backblaze GitHub repository.

More About the Version 2 Beta Program

We introduced Version 2 of the B2 API as beta so that developers can provide us feedback before V2 goes into production. With every B2 integration being coded differently, we want to hear from as many developers as possible. Give the V2 API a try and if you have any comments you can email our B2 beta team at b2beta@backblaze.com or contact Backblaze B2 support. Thanks.

The post Backblaze B2 API Version 2 Beta is Now Open appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LTO versus Cloud Storage: Choosing the Model That Fits Your Business

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/lto-vs-cloud-storage-vs-hybrid/

Choose Your Solution: Cloud Storage, LTO, Hybrid Cloud Storage/LTO

Years ago, when I did systems administration for a small company, we used RAID 1 for in-house data redundancy and an LTO tape setup for offsite data backup. Yes, the LTO cataloging and versioning were a pain, so was managing the tapes, and sometimes a tape would be unreadable, but the setup worked. And given there were few affordable alternatives out there at the time you lived and died with your tapes.

Over the last few years, cloud storage has emerged as a viable alternative to using LTO for offsite backups. Improvements in network speed coupled with lower costs are a couple of the factors that have changed the calculus of cloud storage. To see if enough has changed to make cloud storage a viable competitor to LTO, we’ll start by comparing the current and ongoing cost of LTO versus cloud storage and then dig into assumptions underlying the cost model. We’ll finish up by reviewing the pluses and minuses of three potential outcomes: switching to cloud storage, staying with LTO, or using a hybrid LTO/cloud storage solution.

Comparing the Cost of LTO Versus Cloud Storage

Cost calculators for comparing LTO to Cloud Storage have a tendency to be very simple or very complex. The simple ones generally compare hardware and tape costs to cloud storage costs and neglect things like personnel costs, maintenance costs, and so on. In the complex models you might see references to the cost of capital, interest on leasing equipment, depreciation, and the tax implications of buying equipment versus paying for a monthly subscription service.

The Backblaze LTO vs Cloud calculator is somewhere in between. The underlying model takes into account many factors, which we’ll get into in a moment, but if you are a Fortune 500 company with a warehouse full of tape robots, this model is not for you.

Calculator: LTO vs B2

To use the Backblaze calculator you enter:

  1. the amount of Existing Data you have on LTO tape
  2. the amount of data you expect to add in a given year
  3. the amount of incremental data you backup each day

Then you can use the slider to compare your total cost from 1 to 10 years. You can run the model as many times as you like under different scenarios.

Assumptions Behind the Model

To see the assumptions that were made in creating the model, start on the LTO Replacement page and scroll down past the LTO vs. B2 calculator. Click on the following text which will display the “Cost and Operational Assumptions” page.

+ See details on Cost and Operational Assumptions

Let’s take a few minutes to review some of the most relevant points and how they affect the cost numbers reported:

  • LTO Backup Model: We used the Grandfather-Father-Son (GFS) model. There are several others, but this was the most prevalent. If you use the “Tower of Hanoi” model for example, it uses fewer tapes and would lower the cost of the total LTO cost by some amount.
  • Data Compression: We assumed a 2-1 compression ratio for the data stored on the LTO tapes. If your data is principally video or photos, you will most likely not use compression. As such, film studios and post-production houses will need to double the cost of the total LTO solution to compensate for the increased number of tapes, the increased number of LTO tape units, and increased personnel costs.
  • Data Retention: We used a 30 day retention period as this is common in the GFS model. If you keep your incremental tapes/data for 2 weeks, then you would lower the number of tapes needed for incremental backups, but you would also lower the amount of incremental data you keep in the cloud storage system.
  • Tape Units: There are a wide variety of LTO tape systems. You can increase or decrease the total LTO cost based on the systems you are using. For example, you are considering the purchase of an LTO tape system which reads/writes up to 5 tapes simultaneously. That system is more expensive and has higher maintenance costs, but it also would mean you would have to purchase fewer tape units.
  • LTO-8 Tape Units: We used LTO-8 tape units as they are the currently available LTO system most likely to be around in 10 years.
  • Tape Migration: We made no provision for migration from an unsupported LTO version to a supported LTO version. During the next 10 years, many users with older LTO systems will find it likely they will have to migrate to newer systems as LTO only supports 2 generations back and is currently offering a new generation every 2 years.
  • Pickup Cost: The cost of having your tapes picked up so they are offsite. This cost can vary widely based on geography and service level. Our assumption of the cost is $60 per week or $3,120/year. You can adjust the LTO total cost according to your particular circumstances.
  • Network Cost: Using cloud storage requires that you have a reasonable amount of network bandwidth available. The number we used is incremental to your existing monthly cost for bandwidth. Network costs vary widely, so depending on your circumstance you can increase or decrease to the total cost of the cloud storage solution.
  • Personnel Cost: This is the total cost of what you are paying someone to manage and operate your LTO system. This raises or lowers the cost of both the LTO and cloud storage solutions at the same rate, so adjusting this number doesn’t affect the comparison, just the total values for each.
  • Time Savings Versus LTO: With a cloud storage solution, there are no tapes or tape machines to deal with. This saves a significant amount of time for the person managing the backup process. Increasing this value will increase the cost of the cloud storage solution relative to the LTO solution.

As hinted at earlier, we don’t consider the cost of capital, depreciation, etc. in our calculations. The general model is that a company purchases a number of LTO systems and the cost is spread over a 10 year period. After 10 years a replacement unit is purchased. Other items such as tapes and equipment maintenance are purchased and expensed as needed.

Choosing a Data Backup Model

We noted earlier the three potential outcomes when evaluating LTO versus cloud storage for data backup: switching to cloud storage, staying with LTO, or using a hybrid LTO/cloud storage solution. Here’s a look at each.

Switching to Cloud Storage

After using the calculator you find cloud storage is less expensive for your business or organization versus LTO. You don’t have a large amount of existing data, 100 terabytes for example, and you’d rather get out of the tape business entirely.

Your first challenge is to move your existing data to the cloud — quickly. One solution is the Backblaze B2 Fireball data transfer service. You can move up to 70 TB of data each trip from your location to Backblaze in days. This saves your bandwidth and saves time as well.

As the existing data is being transferred to Backblaze, you’ll want to select a product or service to move your daily generated information to the cloud on a regular basis. Backblaze has a number of integration partners that perform data backup services to Backblaze B2

Staying with LTO

After using the calculator you find cloud storage is less expensive, but you are one of those unlucky companies that can’t get reasonably priced bandwidth in their area. Or perhaps, the new LTO-8 equipment you ordered arrived minutes before you read this blog post. Regardless, you are destined to use LTO for at least a while longer. Tried and true, LTO does work and has the added benefit of making the person who manages the LTO setup nearly indispensable. Still, when you are ready, you can look at moving to the hybrid model described next.

Hybrid LTO/Cloud Storage model

In practice, many organizations that use LTO for backup and archive often store some data in the cloud as well, even if haphazardly. For our purposes, Hybrid LTO/Cloud Storage is defined as one of the following:

  1. Date Hybrid: All backups and archives from prior to the cut over date remain stored in LTO; everything after the cut over date date forward is stored in cloud storage.
  2. Classic Hybrid: All of the incremental backups are stored in cloud storage and all full backups and archives are stored on LTO.
  3. Type Hybrid: All data of a given type, say employee data, is stored on LTO, while all customer data is stored in cloud storage. We see this hybrid use case occur as a function of convenience and occasionally compliance, although some regulatory requirements such as GDPR may not be accommodated by LTO solutions.

You can imagine there being other splits, but in essence, there may be situations where keeping the legacy system going in some capacity for some period of time is the prudent business option.

If you have a large tape library, it can be almost paralyzing to think about moving to the cloud, even if it is less expensive. Being open to the hybrid LTO/cloud model is a way to break the task down into manageable steps. For example, solutions like Starwind VTL and Archiware P5 allow you to start backing up to the cloud with minimal changes to your existing tape-based backup schemes.

Many companies that start down the hybrid road typically begin with moving their daily incremental files to the cloud. This immediately reduces the amount of “tape work” you have to do each day and it has the added benefit of making the files readily available should they need to be restored. Once a company is satisfied that their cloud based backups for their daily incremental files are under control, they can consider whether or not they need to move the rest of their data to the cloud.

Will Cloud Storage Replace LTO?

At some point, the LTO tapes you have will need to be migrated to something else as the equipment to read your old tapes will become outdated, then unsupported, and finally unavailable. Users with LTO 4 and, to some degree, LTO 5 are already feeling this pain. To migrate all of that data from your existing LTO system to LTO version “X,” cloud storage, or something else, will be a monumental task. It is probably a good idea to start planning for that now.

In summary, many people will find that they can now choose cloud storage over LTO as an affordable way to store their data going forward. But, having a hybrid environment of both LTO and cloud storage is not only possible, it is a practical way to reduce your overall backup cost while maximizing your existing LTO investment. The hybrid model creates an improved operational environment and provides a pathway forward should you decide to move exclusively to storing your data in the cloud at some point in the future.

The post LTO versus Cloud Storage: Choosing the Model That Fits Your Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Leverage Your Amazon S3 Experience to Code the Backblaze B2 API

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/how-to-code-backblaze-b2-api-interface/

Going from S3 to learning Backblaze B2

We wrote recently about how the Backblaze B2 and Amazon S3 APIs are different. What we neglected to mention was how to bridge those differences so a developer can create a B2 interface if they’ve already coded one for S3. John Matze, Founder of BridgeSTOR, put together his list of things consider when levering your S3 API experience to create a B2 interface. Thanks John.   — Andy
BackBlaze B2 to Amazon S3 Conversion
by John Matze, Founder of BridgeSTOR

BackBlaze B2 Cloud Storage Platform has developed into a real alternative to the Amazon S3 online storage platform with the same redundancy capabilities but at a fraction of the cost.

Sounds great — sign up today!

Wait. If you’re an application developer, it doesn’t come free. The Backblaze REST API is not compatible with Amazon S3 REST API. That is the bad news. The good news — it includes almost the entire set of functionality so converting from S3 to B2 can be done with minimal work once you understand the differences between the two platforms.

This article will help you shortcut the process by describing the differences between B2 and S3.

  1. Endpoints: AWS has a standard endpoint of s3.amazonaws.com which redirects to the region where the bucket is located or you may send requests directly to the bucket by a region endpoint. B2 does not have regions, but does have an initial endpoint called api.blackblazeb2.com. Every application must start by talking to this endpoint. B2 also requires two other endpoints. One for uploading an object and another one for downloading an object. The upload endpoint is generated on demand when uploading an object while the download API is returned during the authentication process and may be saved for download requests.
  1. Host: Unlike Amazon S3, the HTML header requires the host token. If it is not present, B2 will not respond with an error.
  1. JSON: Unlike S3, which uses XML, all B2 calls use JSON. Some API calls require data to be sent on the request. This data must be in JSON and all APIs return JSON as a result. Fortunately, the amount of JSON required is minimal or none at all. We just built a JSON request when required and made a simple JSON parser for returned data.
  1. Authentication: Amazon currently has two major authentication mechanisms with complicated hashing formulas. B2 simply uses the industry standard “HTTP basic auth” algorithm. It takes only a few minutes to get to speed on this algorithm.
  1. Keys: Amazon has the concept of an access key and a secret key. B2 has the equivalent with the access key being your key id (your account id) and the secret key being the application id (returned from the website) that maps to the secret key.
  1. Bucket ID: Unlike S3, almost every B2 API requires a bucket ID. There is a special list bucket call that will display bucket IDs by bucket name. Once you find your bucket name, capture the bucket ID and save it for future API calls.
  1. Head Call: The bottom line — there is none. There is, however, a list_file_names call that can be used to build your own HEAD call. Parse the JSON returned values and create your own HEAD call.
  1. Directory Listings: B2 Directories again have the same functionality as S3, but with a different API format. Again the mapping is easy: marker is startFileName, prefix is prefix, max-keys is maxFileCount and delimiter is delimiter. The big difference is how B2 handles markers. The Amazon S3 nextmarker is literally the next marker to be searched, the B2 nextmarker is the last file name that was searched. This means the next listing will also include the last marker name again. This means your routines must parse out the name or your listing will show the next marker twice. That’s a difference, but not a difficult one.
  1. Uploading an object: Uploading an object in B2 is quite different than S3. S3 just requires you to send the object to an endpoint and they will automatically place the object somewhere in their environment. In the B2 world, you must request a location for the object with an API call and then send the object to the returned location. The first API will send you a temporary key and you can continue to use this key for one hour without generating another, with the caveat that you have to monitor for failures from B2. The B2 environment may become full or some other issue will require you to request another key.
  1. Downloading an Object: Downloading an object in B2 is really easy. There is a download endpoint that is returned during the authentication process and you pass your request to that endpoint. The object is downloaded just like Amazon S3.
  1. Multipart Upload: Finally, multipart upload. The beast in S3 is just as much of a beast in B2. Again the good news is there is a one to one mapping.
    1. Multipart Init: The equivalent initialization returns a fileid. This ID will be used for future calls.
    2. Mulitpart Upload: Similar to uploading an object, you will need to get the API location to place the part. So use the fileid from “a” above and call B2 for the endpoint to place the part. Another difference is the upload also requires the payload to be hashed with a SHA1 algorithm. Once done, simply pass the SHA and the part number to the URL and the part is uploaded. This SHA1 component is equivalent to an etag in the S3 world so save it for later.
    3. Multipart Complete: Like S3, you will have to build a return structure for each part. B2 of course requires this structure to be in JSON but like S3, B2 requires the part number and the SHA1 (etag) for each part.

What Doesn’t Port

We found almost everything we required easily mapped from S3 to B2 except for a few issues. To be fair, BackBlaze is working on the following in future versions.

  1. Copy Object doesn’t exist: This could cause some issues with applications for copying or renaming objects. BridgeSTOR has a workaround for this situation so it wasn’t a big deal for our application.
  2. Directory Objects don’t exist: Unlike Amazon, where an object with that ends with a “/” is considered a directory, this does not port to B2. There is an undocumented object name that B2 applications use called .bzEmpty. Numerous 3rd party applications, including BridgeSTOR, treat an object ending with .bzEmpty as a directory name. This is also important for directory listings described above. If you choose to use this method, you will be required to replace the “.bzEmpty” with a “/.”

In conclusion, you can see the B2 API is different than the Amazon S3, but as far as functionality they are basically the same. For us at first it looked like it was going to be a large task, but once we took the time to understand the differences, porting to B2 was not a major job for our application. We created a S3 to B2 shim in a week followed by a few extra weeks of testing and bug fixes. I hope this document helps in your S3 to B2 conversion.

— John Matze, BridgeSTOR

The post How to Leverage Your Amazon S3 Experience to Code the Backblaze B2 API appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Five Tips For Creating a Predictable Cloud Storage Budget

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/calculate-cost-cloud-storage/

Cloud Storage $$$, Transfer Rates $, Download Fees $$, Cute Piggy Bank $$$

Predicting your cloud storage cost should be easy. After all, there are only three cost dimensions: 1) storage (the rental for your slice of the cloud), 2) download (the fee to bring your data out of the cloud), and 3) transactions (charges for “stuff” you might do to your data inside the cloud). Yet, you probably know someone (you?) that was more than surprised when their cloud storage bill arrived. They have good company, as according to ZDNet, 37% of IT executives found their cloud storage costs to be unpredictable.

Here are five tips you can use when doing your due diligence on the cloud storage vendors you are considering. The goal is to create a cloud storage forecast that you can rely on each and every month.

Tip # 1 — Don’t Miscalculate Progressive (or is it Regressive?) Pricing Tiers

The words “Next” or “Over” on a pricing table are never a good thing.

Standard Storage Pricing Example

  • First 50 TB / Month $0.026 per GB
  • Next 450 TB / Month $0.025 per GB
  • Over 500 TB / Month $0.024 per GB

Those words mean there are tiers in the pricing table which, in this case, means you have to reach a specific level to get better pricing. You don’t get a retroactive discount — only the data above the minimum threshold enjoys the lower price.

The mistake sometimes made is calculating your entire storage cost based on the level for that amount of storage. For example, if you had 600 TB of storage, you could wrongly multiply as follows:

(600,000 x 0.024) = $14,400/month

When, in fact, you should do the following:

(50,000 x 0.026) + (450,000 x 0.025) + (100,000 x 0.024) = $15,150/month

That was just for storage. Make sure you consider the tiered pricing tables for data retrieval as well.

Tip # 2 — Don’t Choose the Wrong Service Level

Many cloud storage providers offer multiple levels of service. The idea is that you can trade service capabilities for cost. If you don’t need immediate access to your files or don’t want data replication or eleven 9s of durability, there is a choice for you. Besides giving away functionality, there’s a bigger problem. You have to know what you are going to do with your data to pick the right service because mistakes can get very expensive. For example:

  • You choose a low cost service tier that normally takes hours or days to restore your data. What can go wrong? You need some files back immediately and you end up paying 10-20 times the cost to expedite your restore.
  • You choose one level of service and decide you want to upload some data to a compute-based application or to another region — features not part of your current service. The good news? You can usually move the data. The bad news? You are charged a transfer fee to move the data within the same vendor’s infrastructure because you didn’t choose the right service tier when you started. These fees often eradicate any “savings” you had gotten from the lower priced tier.

Basically, if your needs change as they pertain to the data you have stored, you will pay more than you expect to get all that straightened out.

Tip # 3 — Don’t Pay for Deleted Files

Some cloud storage companies have a minimum amount of time you are charged for storage for each file uploaded. Typically this minimum period is between 30 and 90 days. You are charged even if you delete the file before the minimum period. For example (assuming a 90 day minimum period), if you upload a file today and delete the file tomorrow, you still have to pay for storing that deleted file for the next 88 days.

This “feature” often extends to files deleted due to versioning. Let’s say you want to keep three versions of each file, with older versions automatically deleted. If the now deleted versions were originally uploaded fewer than 90 days ago, you are charged for storing them for 90 days.

Using a typical backup scenario let’s say you are using a cloud storage service to store your files and your backup program is set to a 30 day retention. That means you will be perpetually paying for an additional 60 days worth of storage (for files that were pruned at 30 days). In other words, you would be paying for a 90 day retention period even though you only have 30 days worth of files.

Tip # 4 — Don’t Pay For Nothing

Some cloud storage vendors charge a minimum amount each month regardless of how little you have stored. For example, even if you only have 100 GB stored you get to pay like you have 1 TB (the minimum). This is the moral equivalent of a bank charging you a monthly fee if you don’t meet the minimum deposit amount.

Continuing on the theme of paying for nothing, be on the lookout for services that charge a minimum amount per each file stored regardless of how small the file is, including zero bytes. For example, some storage services have a minimum file size of 128K. Any files smaller than that are counted as being 128K for storage purposes. While the additional cost for even a couple of million zero-length files is trivial, you’re still being charged something for nothing.

Tip # 5 — Be Suspicious of the Fine Print

Misdirection is the art of getting you to focus on one thing so you don’t focus on other things going on. Practiced by magicians and some cloud storage companies, the idea is to get you to focus on certain features and capabilities without delving below the surface into the fine print.

Read the fine print and as you stroll through the multi-page pricing tables and linked pages of all of the rules that shape how you can use a given cloud storage service. Stop and ask, “what are they trying to hide?” If you find phrases like: “we reserve the right to limit your egress traffic,” or “new users gets free usage tier for 12 months,” or “provisioned requests should be used when you need a guarantee that your retrieval capacity will be available when you need it,” take heed.

How to Build a Predictable Cloud Storage Budget

As we noted previously, cloud storage costs are composed of three dimensions: storage, download and transactions. These are the cost drivers for cloud storage providers, and as such are the most straightforward way for service providers to pass on the cost of the service to its customers.

Let’s start with data storage as it is the easiest for a company to calculate. For a given month data storage cost is equal to:

Current data + new data – deleted data

Take that total and multiple by the monthly storage rate and you’ll get your monthly storage costs.

Computing download and transaction costs can be harder as these are variables you may have never calculated before, especially if you previously were using in-house or LTO-based storage. To help you out, below is a chart showing the breakdown of the revenue from Backblaze B2 Cloud Storage over the past 6 months.

% of Spend w/ B2

As you can see, download (2%) and transaction (3%) costs are, on average, minimal compared to storage costs. Unless you have reason to believe you are different, using these figures is a good proxy for your costs.

Let’s Give it a Try

Let’s start with 100 TB of original storage then add 10 TB each month and delete 5 TB each month. That’s 105 TB of storage for the first month. Backblaze has built a cloud storage calculator that computes costs for all of the major cloud storage providers. Using this calculator, we find that Amazon S3 would cost $2,205.50 to store this data for a month, while Backblaze B2 would charge just $525.10.

Using those numbers for storage and assuming that storage will be 95% of your total bill (as noted in the chart above), you get a total monthly cost of $2,321.05 for Amazon S3 and Backblaze B2 will be $552.74 a month.

The chart below provides the breakdown of the expected cost.

Backblaze B2Amazon S3
Storage$525.10$2,205.50
Download$11.06$42.22
Transactions$16.58$69.33
Totals:$552.74$2,321.05

Of course each month you will add and delete storage, so you’ll have to account for that in your forecast. Using the cloud storage calculator noted above, you can get a good sense of your total cost over the budget forecasting period.

Finally, you can use the Backblaze B2 storage calculator to address potential use cases that are outside of your normal operation. For example, you delete a large project from your storage or you need to download a large amount of data. Running the calculator for these types of actions lets you obtain a solid estimate for their effect on your budget before they happen and lets you plan accordingly.

Creating a predictable cloud storage forecast is key to taking full advantage of all of the value in cloud storage. Organizations like Austin City Limits, Fellowship Church, and Panna Cooking were able to move to the cloud because they could reliably predict their cloud storage cost with Backblaze B2. You don’t have to let pricing tiers, hidden costs and fine print stop you. Backblaze makes predicting your cloud storage costs easy.

The post Five Tips For Creating a Predictable Cloud Storage Budget appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q2 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q2-2018/

Backblaze Drive Stats Q2 2018

As of June 30, 2018 we had 100,254 spinning hard drives in Backblaze’s data centers. Of that number, there were 1,989 boot drives and 98,265 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take another look at comparing enterprise and consumer drives, get a first look at our 14 TB Toshiba drives, and introduce you to two new SMART stats. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q2 2018

Of the 98,265 hard drives we were monitoring at the end of Q2 2018, we removed from consideration those drives used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,184 hard drives. The table below covers just Q2 2018.

Backblaze Q2 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it just means that there were no drive failures of that model during Q2 2018.

The Annualized Failure Rate (AFR) for Q2 is just 1.08%, well below the Q1 2018 AFR and is our lowest quarterly AFR yet. That said, quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 81 drives (98,265 minus 98,184) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods.

Hard Drive Migrations Continue

The Q2 2018 Quarterly chart above was based on 98,184 hard drives. That was only 138 more hard drives than Q1 2018, which was based on 98,046 drives. Yet, we added nearly 40 PB of cloud storage during Q1. If we tried to store 40 PB on the 138 additional drives we added in Q2 then each new hard drive would have to store nearly 300 TB of data. While 300 TB hard drives would be awesome, the less awesome reality is that we replaced over 4,600 4 TB drives with nearly 4,800 12 TB drives.

The age of the 4 TB drives being replaced was between 3.5 and 4 years. In all cases their failure rates were 3% AFR (Annualized Failure Rate) or less, so why remove them? Simple, drive density — in this case three times the storage in the same cabinet space. Today, four years of service is the about the time where it makes financial sense to replace existing drives versus building out a new facility with new racks, etc. While there are several factors that go into the decision to migrate to higher density drives, keeping hard drives beyond that tipping point means we would be under utilizing valuable data center real estate.

Toshiba 14 TB drives and SMART Stats 23 and 24

In Q2 we added twenty 14 TB Toshiba hard drives (model: MG07ACA14TA) to our mix (not enough to be listed on our charts), but that will change as we have ordered an additional 1,200 drives to be deployed in Q3. These are 9-platter Helium filled drives which use their CMR/PRM (not SMR) recording technology.

In addition to being new drives for us, the Toshiba 14 TB drives also add two new SMART stat pairs: SMART 23 (Helium condition lower) and SMART 24 (Helium condition upper). Both attributes report normal and raw values, with the raw values currently being 0 and the normalized values being 100. As we learn more about these values, we’ll let you know. In the meantime, those of you who utilize our hard drive test data will need to update your data schema and upload scripts to read in the new attributes.

By the way, none of the 20 Toshiba 14 TB drives have failed after 3 weeks in service, but it is way too early to draw any conclusions.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models in operation as of June 30th, 2018. For each model, we compute its reliability starting from when it was first installed

Backblaze Lifetime Hard Drive Failure Rates

Notes and Observations

The combined AFR for all of the larger drives (8-, 10- and 12 TB) is only 1.02%. Many of these drives were deployed in the last year, so there is some volatility in the data, but we would expect this overall rate to decrease slightly over the next couple of years.

The overall failure rate for all hard drives in service is 1.80%. This is the lowest we have ever achieved, besting the previous low of 1.84% from Q1 2018.

Enterprise versus Consumer Hard Drives

In our Q3 2017 hard drive stats review, we compared two Seagate 8 TB hard drive models: one a consumer class drive (model: ST8000DM002) and the other an enterprise class drive (model: ST8000NM0055). Let’s compare the lifetime annualized failure rates from Q3 2017 and Q2 2018:

Lifetime AFR as of Q3 2017

    – 8 TB consumer drives: 1.1% annualized failure rate
    – 8 TB enterprise drives: 1.2% annualized failure rate

Lifetime AFR as of Q2 2018

    – 8 TB consumer drives: 1.03% annualized failure rate
    – 8 TB enterprise drives: 0.97% annualized failure rate

Hmmm, it looks like the enterprise drives are “winning.” But before we declare victory, let’s dig into a few details.

  1. Let’s start with drive days, the total number of days all the hard drives of a given model have been operational.- 8 TB consumer (model: ST8000DM002): 6,395,117 drive days
    – 8 TB enterprise (model: ST8000NM0055): 5,279,564 drive daysBoth models have a sufficient number of drive days and are reasonably close in their total number. No change to our conclusion do far.
  2. Next we’ll look at the confidence intervals for each model to see the range of possibilities within two deviations:- 8 TB consumer (model: ST8000DM002): Range 0.9% to 1.2%
    – 8 TB enterprise (model: ST8000NM0055): Range 0.8% to 1.1%The ranges are close, but multiple outcomes are possible. For example, the consumer drive could be as low as 0.9% and the enterprise drive could be as high as 1.1%. This doesn’t help or hurt our conclusion.
  3. Finally we’ll look at drive age — actually average drive age to be precise. This is the average time in operational service, in months, of all the drives of a given model. We’ll will start with the point in time when each drive reached approximately the current number of drives. That way the addition of new drives (not replacements) will have a minimal effect.
    Annualized Hard Drive Failure Rates by TimeWhen you constrain for drive count and average age, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much.

Whether every enterprise model is better than every corresponding consumer model is unknown, but below are a few reasons you might choose one class of drive over another:

EnterpriseConsumer
Longer Warranty: 5 vs. 2 yearsLower price: up to 50% less
More features, i.e. PowerChoice technologySimilar annualized failure rate as enterprise drives
Faster reads and writesUses less power

Backblaze is known to be “thrifty” when purchasing drives. When you purchase 100 drives at a time or are faced with a drive crisis, it makes sense to purchase consumer drives. When you starting purchasing 100 petabytes’ worth of hard drives at a time, the price gap between enterprise and consumer drives shrinks to the point where the other factors come into play.

Hard Drives By the Numbers

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are over 100 million entries. The complete data set used to create the information presented in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting in the comments below or by contacting us directly.

The post Hard Drive Stats for Q2 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/panna-cooking-creates-perfect-storage-recipe/

Panna Cooking custard dessert with strawberries

Panna Cooking is the smart home cook’s go-to resource for learning to cook, video recipes, and shopping lists, all delivered by 40+ of the world’s best chefs. Video is the primary method Panna uses to communicate with their customers. Joshua Stenseth is a full-time editor and part-time IT consultant in charge of wrangling the video content Panna creates.

Like many organizations, digital media archive storage wasn’t top of mind over the years at Panna Cooking. Over time, more and more media projects were archived to various external hard drives and dutifully stored in the archive closet. The external drive archive solution was inexpensive and was easy to administer for Joshua, until…

Joshua stared at the request from the chef. She wanted to update a recipe video from a year ago to include in her next weekly video. The edits being requested were the easy part as the new footage was ready to go. The trouble was locating the old digital video files. Over time, Panna had built up a digital video archive that resided on over 100 external hard drives scattered around the office. The digital files that Joshua needed were on one of those drives.

Panna Cooking, like many growing organizations, learned that the easy-to-do tasks, like using external hard drives for data archiving, don’t scale very well. This is especially true for media content such as video and photographs. It is easy to get overwhelmed.

Panna Cooking dessert

Joshua was given the task of ensuring all the content created by Panna was economically stored, readily available, and secured off site. The Panna Cooking case study details how Joshua was able to consolidate their existing scattered archive by employing the Hybrid Cloud Storage package from 45 Drives, a flexible, highly affordable, media archiving solution consisting of a 45 Drives Storinator storage server, Rclone, Duplicity, and B2 Cloud Storage.

The 45 Drives’ innovative Hybrid Cloud Storage package and their partnership with Backblaze B2 Cloud Storage was the perfect solution. The Hybrid Cloud Storage package installs on the Storinator system and utilizes Rclone or Duplicity to back up or sync files to the B2 cloud. This gave Panna a fully operational local storage system that sends changes automatically to the B2 cloud. For Joshua and his fellow editors, the Storinator/Backblaze B2 solution gave them the best of both worlds, high performance local storage and readily accessible, affordable off-site cloud storage all while eliminating their old archive: the closet full of external hard drives.

The post Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Computer Backup Awareness in 2018: Getting Better and Getting Worse

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/computer-backup-awareness-in-2018/

Backup Frequency - 10 Years of History

Back in June 2008, Backblaze launched our first Backup Awareness Survey. Beginning with that survey and each year since, we’ve asked the folks at The Harris Poll to conduct our annual survey. For the last 11 years now, they’ve asked the simple question, “How often do you backup all the data on your computer?” Let’s see what they’ve found.

First, a Little History

While we did the first survey in 2008, it wasn’t until 2009, after the second survey was conducted, that we declared June as Backup Awareness Month, making June 2018 the 10th anniversary of Backup Awareness Month. But, why June? You’re probably thinking that June is a good time to remind people about backing up their computers. It’s before summer vacations in the northern hemisphere and the onset of winter down under. In truth, back in 2008 Backblaze was barely a year old and the survey, while interesting, got pushed aside as we launched the first beta of our cloud backup product on June 4, 2008. When June 2009 rolled around, we had a little more time and two years worth of data. Thus, Backup Awareness Month was born (PS — the contest is over).

More People Are Backing Up, But…

Fast forward to June 2018, and the folks at The Harris Poll have diligently delivered another survey. You can see the details about the survey methodology at the end of this post. Here’s a high level look at the results over the last 11 years.
Computer Backup Frequency

The percentage of people backing up all the data on their computer has steadily increased over the years, from 62% in 2008 to 76% in 2018. That’s awesome, but at the other end of the time spectrum it’s not so pretty. The percentage of people backing up once a day or more is 5.5% in 2018. That’s the lowest percentage ever reported for daily backup. Wouldn’t it be nice if there were a program you could install on your computer that would back up all the data automatically?

Here’s how 2018 compares to 2008 for how often people back up all the data on their computers.

Computer Data Backup Frequency in 2008
Computer Data Backup Frequency in 2018

A lot has happened over the last 11 years in the world of computing, but at least people are taking backing up their computers a little more seriously. And that’s a good thing.

A Few Data Backup Facts

Each survey provides interesting insights into the attributes of backup fiends and backup slackers. Here are a few facts from the 2018 survey.

Men

  • 21% of American males have never backed up all the data on their computers.
  • 11% of American males, 18-34 years old, have never backed up all the data on their computers.
  • 33% of American males, 65 years and older, have never backed up all the data on their computers.

Women

  • 26% of American females have never backed up all the data on their computers.
  • 22% of American females, 18-34 years old, have never backed up all the data on their computers.
  • 36% of American females, 65 years and older, have never backed up all the data on their computers.

When we look at the four regions in the United States, we see that in 2018 the percentage of people who have backed up all the data on their computer at least once was about the same across regions. This was not the case back in 2012 as seen below:

YearNortheastSouthMidwestWest
201267%73%65%77%
201875%78%75%76%

 

Looking Back

Here are links to our previous blog posts on our annual Backup Awareness Survey:

Survey Method:

The surveys cited in this post were conducted online within the United States by The Harris Poll on behalf of Backblaze as follows: June 5-7, 2018 among 2,035 U.S. adults, among whom 1,871 own a computer. May 19-23, 2017 among 2048 U.S. adults, May 13-17, 2016 among 2,012 U.S. adults, May 15-19, 2015 among 2,090 U.S. adults, June 2-4, 2014 among 2,037 U.S. adults, June 13–17, 2013 among 2,021 U.S. adults, May 31–June 4, 2012 among 2,209 U.S. adults, June 28–30, 2011 among 2,257 U.S. adults, June 3–7, 2010 among 2,071 U.S. adults, May 13–14, 2009 among 2,185 U.S. adults, and May 27–29, 2008 among 2,761 U.S. adults. In all surveys, respondents consisted of U.S. adult computer users (aged 18+). These online surveys were not based on a probability sample and therefore no estimate of theoretical sampling error can be calculated. For complete survey methodology, including weighting variables and subgroup sample sizes, please contact Backblaze.

The 2018 Survey: Please note sample composition changed in the 2018 wave as new sample sources were introduced to ensure representativeness among all facets of the general population.

The post Computer Backup Awareness in 2018: Getting Better and Getting Worse appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.