All posts by Andy Klein

Backblaze Hard Drive Stats Q1 2019

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-hard-drive-stats-q1-2019/

Backblaze Drive Stats Q1 2019

As of March 31, 2019, Backblaze had 106,238 spinning hard drives in our cloud storage ecosystem spread across three data centers. Of that number, there were 1,913 boot drives and 104,325 data drives. This review looks at the Q1 2019 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. In addition, we have a few questions for you to ponder near the end of the post. As always, we look forward to your comments.

Hard Drive Failure Stats for Q1 2019

At the end of Q1 2019, Backblaze was using 104,325 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 104,130 hard drives. The table below covers what happened in Q1 2019.

Q1 2019 Hard Drive Failure Rates table

Notes and Observations

If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q1 2019. The two drives listed with zero failures in Q1 were the 4 TB and 5 TB Toshiba models. Neither has a large enough number of drive days to be statistically significant, but in the case of the 5 TB model, you have to go back to Q2 2016 to find the last drive failure we had of that model.

There were 195 drives (104,325 minus 104,130) that were not included in the list above because they were used as testing drives or we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods. Beginning next quarter that threshold will change; we’ll get to that shortly.

The Annualized Failure Rate (AFR) for Q1 is 1.56%. That’s as high as the quarterly rate has been since Q4 2017 and its part of an overall upward trend we’ve seen in the quarterly failure rates over the last few quarters. Let’s take a closer look.

Quarterly Trends

We noted in previous reports that using the quarterly reports is useful in spotting trends about a particular drive or even a manufacturer. Still, you need to have enough data (drive count and drive days) in each observed period (quarter) to make any analysis valid. To that end the chart below uses quarterly data from Seagate and HGST drives while leaving out Toshiba and WDC drives as we don’t have enough drives from those manufacturers over the course of the last three years.

Trends of the Quarterly Hard Drive Annualized Failure Rates by Manufacturer

Over the last three years, the trend for both Seagate and HGST annualized failure rates had improved, i.e. gone down. While Seagate has reduced their failure rate over 50% during that time, the upward trend over the last three quarters requires some consideration. We’ll take a look at this and let you know if we find anything interesting in a future post.

Changing the Qualification Threshold

As reported over the last several quarters, we’ve been migrating from lower density drives, 2, 3, and 4 TB drives, to larger 10, 12, and 14 TB hard drives. At the same time, we have been replacing our stand-alone 45-drive Storage Pods with 60-drive Storage Pods arranged into the Backblaze Vault configuration of 20 Storage Pods per vault. In Q1, the last stand-alone 45-drive Storage Pod was retired. Therefore, using 45 drives as the threshold for qualification to our quarterly report seems antiquated. This is a good time to switch to using Drive Days as the qualification criteria. In reviewing our data, we have decided to use 5,000 Drive Days as the threshold going forward. The exception, any current drives we are reporting, such as the Toshiba 5 TB model with about 4,000 hours each quarter, will continue to be included in our Hard Drive Stats reports.

Fewer Drives = More Data

Those of you who follow our quarterly reports might have observed that the total number of hard drives in service decreased in Q1 by 648 drives compared to Q4 2018, yet we added nearly 60 petabytes of storage. You can see what changed in the chart below.

Backblaze Cloud Storage: Drive Counts vs. Disk Space in Q1 2019 table

Lifetime Hard Drive Stats

The table below shows the lifetime failure rates for the hard drive models we had in service as of March 31, 2019. This is over the period beginning in April 2013 and ending March 31, 2019.

Backblaze Lifetime Hard Drive Failure Rates table

Predictions for the Rest of 2019

As 2019 unfolds, here are a few guesses as to what might happen over the course of the year. Let’s see what you think.

By the end of 2019, which, if any, of the following things will happen? Let us know in the comments.

  • Backblaze will continue to migrate out 4 TB drives and will have fewer than 15,000 by the end of 2019: we currently have about 35,000.
  • We will have installed at least twenty 20 TB drives for testing purposes.
  • Backblaze will go over 1 exabyte (1,000 petabytes) of available cloud storage. We are currently at about 850 petabytes of available storage.
  • We will have installed, for testing purposes, at least 1 HAMR based drive from Seagate and/or 1 MAMR drive from Western Digital.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone — it is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats Q1 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

An Inside Look at the Backblaze Storage Pod Museum

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-storage-pod-museum/

image of the back of a Backblaze Storage Pod

Merriam-Webster defines a museum as “an institution devoted to the procurement, care, study, and display of objects of lasting interest or value.” With that definition in mind, we’d like to introduce the Backblaze Storage Pod Museum. While some folks think of a museum as a place of static, outdated artifacts, others realize that those artifacts can tell a story over time of experimentation, evolution, and innovation. That is certainly the case with our Storage Pods. Modesty prevents from us saying that we changed the storage industry with our Storage Pod design, so let’s say we added a lot of red to the picture.

Over the years, Larry, our data center manager, has stashed away the various versions of our Storage Pods as they were removed from service. He also kept drives, SATA cards, power supplies, cables, and more. Thank goodness. With the equipment that Larry’s pack-rat tendencies saved, and a couple of current Storage Pods we borrowed (shhhh, don’t tell Larry), we were able to start the Backblaze Storage Pod Museum. Let’s take a quick photo trip through the years.

Before Storage Pod 1.0

Before we announced Storage Pod 1.0 to the world nearly 10 years ago, we had already built about twenty or so Storage Pods. These early pods used Western Digital 1.0 TB Green drives. There were multiple prototypes, but once we went into production, we had settled on the 45-drive design with 3 rows of 15 vertically mounted drives. We ordered the first batch of ten chassis to be built and then discovered we did not spec a hole for the on/off switch. We improvised.

Storage Pod 1.0 — Petabytes on a Budget

We introduced the storage world to inexpensive cloud storage with Storage Pod 1.0. Funny thing, we didn’t refer to this innovation as version 1.0 — just a Backblaze Storage Pod. We not only introduced the Storage Pod, we also open-sourced the design, publishing the design specs, parts list, and more. People took notice. We introduced the design with Seagate 1.5 TB drives for a total of 67 TB of storage. This version also had an Intel Desktop motherboard (DG43NB) and 4 GB of memory.

Storage Pod 2.0 — More Petabytes on a Budget

Storage Pod 2.0 was basically twice the system that 1.0 was. It had twice the memory, twice the speed, and twice the storage, but it was in the same chassis with the same number of drives. All of this combined to reduce the cost per GB of the Storage Pod system over 50%: from $0.117/GB in version 1 to $0.055/GB in version 2.

Among the changes: the desktop motherboard in V1 was upgraded to a server class motherboard, we simplified things by using three four-port SATA cards, and reduced the cost of the chassis itself. In addition, we used Hitachi (HGST) 3 TB drives in Storage Pod 2.0 to double the total amount of storage to 135 TB. Over their lifetime, these HGST drives had an annualized failure rate of 0.82%, with the last of them being replaced in Q2 2017.

Storage Pod 3.0 — Good Vibrations

Storage Pod 3.0 brought the first significant chassis redesign in our efforts to make the design easier to service and provide the opportunity to use a wider variety of components. The most noticeable change was the introduction of drive lids — one for each row of 15 drives. Each lid was held in place by a pair of steel rods. The drive lids held the drives below in place and replaced the drive bands used previously. The motherboard and CPU were upgraded and we went with memory that was Supermicro certified. In addition, we added standoffs to the chassis to allow for Micro ATX motherboards to be used if desired, and we added holes where needed to allow for someone to use one or two 2.5” drives as boot drives — we use one 3.5” drive.

Storage Pod 4.0 — Direct Wire

Up through Storage Pod 3.0, Protocase helped design and then build our Storage Pods. During that time, they also designed and produced a direct wire version, which replaced the nine backplanes with direct wiring to the SATA cards. Storage Pod 4.0 was based on the direct wire technology. We deployed a small number of these systems but we fought driver problems between our software and the new SATA cards. In the end, we went back to our backplanes and Protocase continued forward with direct wire systems that they continued to deploy successfully. Conclusion: there are multiple ways you can be successful with the Storage Pod design.

Storage Pod 4.5 — Backplanes are Back

This version started with the Storage Pod 3.0 design and introduced new 5-port backplanes and upgraded to SATA III cards. Both of these parts were built on Marvel chipsets. The backplanes we previously used were being phased out, which prompted us to examine other alternatives like the direct wire pods. Now we had a ready supply of 5-port backplanes and Storage Pod 4.5 was ready to go.

We also began using Evolve Manufacturing to build these systems. They were located near Backblaze and were able to scale to meet our ever increasing production needs. In addition, they were full of great ideas on how to improve the Storage Pod design.

Storage Pod 5.0 — Evolution from the Chassis on Up

While Storage Pod 3.0 was the first chassis redesign, Storage Pod 5.0 was, to date, the most substantial. Working with Evolve Manufacturing, we examined everything down to the rivets and stand-offs, looking for a better, more cost efficient design. Driving many of the design decisions was the introduction of Backblaze B2 Cloud Storage that was designed to run on our Backblaze Vault architecture. From a performance point-of-view we upgraded the motherboard and CPU, increased memory fourfold, upgraded the networking to 10 GB on the motherboard, and moved from SATA II to SATA III. We also completely redid the drive enclosures, replacing the 15-drive clampdown lids to nine five-drive compartments with drive guides.

Storage Pod 6.0 — 60 Drives

Storage Pod 6.0 increased the amount of storage from 45 to 60 drives. We had a lot of questions when this idea was first proposed, like would we need: bigger power supplies (answer: no), more memory (no), a bigger CPU (no), or more fans (no). We did need to redesign our SATA cable routes from the SATA cards to the backplanes as we needed to stay under the one meter spec length for the SATA cables. We also needed to update our power cable harness, and, of course, add length to the chassis to accommodate the 15 additional drives, but nothing unexpected cropped up — it just worked.

What’s Next?

We’ll continue to increase the density of our storage systems. For example, we unveiled a Backblaze Vault full of 14 TB drives in our 2018 Drive Stats report. Each Storage Pod in that vault contains 840 terabytes worth of hard drives, meaning the 20 Storage Pods that make up the Backblaze Vault bring 16.8 petabytes of storage online when the vault is activated. As higher density drives and new technologies like HAMR and MAMR are brought to market, you can be sure we’ll be testing them for inclusion in our environment.

Nearly 10 years after the first Storage Pod altered the storage landscape, the innovation continues to deliver great returns to the market. Many other companies, from 45Drives to Dell and HP, have leveraged the Storage Pod’s concepts to make affordable, high-density storage systems. We think that’s awesome.

The post An Inside Look at the Backblaze Storage Pod Museum appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats for 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-2018/

Backblaze Hard Drive Stats for 2018

We published our first “Hard Drive Stats” report just over 5 years ago on January 21, 2014. We titled that report “What Hard Drive Should I Buy.” In hindsight, that might have been a bit of an overreach, but we were publishing data that was basically non-existent otherwise.

Many people like our reports, some don’t, and some really don’t — and that’s fine. From the beginning, the idea was to share our experience and use our data to shine a light on the otherwise opaque world of hard disk drives. We hope you have enjoyed reading our reports and we look forward to publishing them for as long as people find them useful.

Thank you.

As of December 31, 2018, we had 106,919 spinning hard drives. Of that number, there were 1,965 boot drives and 104,954 data drives. This review looks at the hard drive failure rates for the data drive models in operation in our data centers. In addition, we’ll take a look at the new hard drive models we’ve added in 2018 including our 12 TB HGST and 14 TB Toshiba drives. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

2018 Hard Drive Failure Rates: What 100,000+ Hard Drives Tell Us

At the end of 2018 Backblaze was monitoring 104,954 hard drives used to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 104,778 hard drives. The table below covers what happened just in 2018.

2018 annualized hard drive failure rates

Notes and Observations

If a drive model has a failure rate of 0%, it means there were no drive failures of that model during 2018.

For 2018, the Annualized Failure Rate (AFR) stated is usually pretty solid. The exception is when a given drive model has a small number of drives (fewer than 500) and/or a small number of drive days (fewer than 50,000). In these cases, the APR can be too wobbly to be used reliably for buying or retirement decisions.

There were 176 drives (104,954 minus 104,778) that were not included in the list above. These drives were either used for testing or we did not have at least 45 drives of a given model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. This is a historical number based on the number of drives needed to fill one Backblaze Storage Pod (version 5 or earlier).

The Annualized Failure Rate (AFR) for 2018 for all drive models was just 1.25%, well below the rates from previous years as we’ll discuss later on in this review.

What’s New in 2018

In 2018 the big trend was hard drive migration: replacing lower density 2, 3, and 4 TB drives, with 8, 10, 12, and in Q4, 14 TB drives. In 2018 we migrated 13,720 hard drives and we added another 13,389 hard drives as we increased our total storage from about 500 petabytes to over 750 petabytes. So in 2018, our data center techs migrated or added 75 drives a day on average, every day of the year.

Here’s a quick review of what’s new in 2018.

  • There are no more 4 TB Western Digital drives; the last of them was replaced in Q4. This leaves us with only 383 Western Digital drives remaining — all 6 TB drives. That’s 0.37% of our drive farm. We do have plenty of drives from HGST (owned by WDC), but over the years we’ve never been able to get the quantity of Western Digital drives we need at a reasonable price.
  • Speaking of HGST drives, in Q4 we added 1,200 HGST 12 TB drives (model: HUH721212ALN604). We had previously tested these drives in Q3 with no failures, so we have filled a Backblaze Vault with 1,200 drives. After about one month we’ve only had one failure, so they are off to a good start.
  • The HGST drives have a ways to go as in Q4 we also added 6,045 Seagate 12 TB drives (model: ST12000NM0007) to bring us to 31,146 of this drive model. That’s 29.7% of our drive farm.
  • Finally in Q4, we added 1,200 Toshiba 14 TB drives (model: MG07ACA14TA). These are helium-filled PMR (perpendicular magnetic recording) drives. The initial annualized failure rate (AFR) is just over 3%, which is similar to the other new models and we would expect the AFR to drop over time as the drives settle in.

Comparing Hard Drive Failure Rates Over Time

When we compare Hard Drive stats for 2018 to previous years two things jump out. First, the migration to larger drives, and second, the improvement in the overall annual failure rate each year. The chart below compares each of the last three years. The data for each year is inclusive of that year only.

Annualized Hard Drive Failure Rates by Year

Notes and Observations

  • In 2016 the average size of hard drives in use was 4.5 TB. By 2018 the average size had grown to 7.7 TB.
  • The 2018 annualized failure rate of 1.25% was the lowest by far of any year we’ve recorded.
  • None of the 45 Toshiba 5 TB drives (model: MD04ABA500V) has failed since Q2 2016. While the drive count is small, that’s still a pretty good run.
  • The Seagate 10 TB drives (model: ST10000NM0086) continue to impress as their AFR for 2018 was just 0.33%. That’s based on 1,220 drives and nearly 500,000 drive days, making the AFR pretty solid.

Lifetime Hard Drive Stats

While comparing the annual failure rates of hard drives over multiple years is a great way to spot trends, we also look at the lifetime annualized failure rates of our hard drives. The chart below is the annualized failure rates of all of the drives currently in production.

Annualized Hard Drive Failure Rates for Active Drives

Hard Drive Stats Webinar

We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2018” on Thursday, January 24, 2019 at 10:00 Pacific time. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up today.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the CSV file.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats for 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How We Optimized Storage and Performance of Apache Cassandra at Backblaze

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

Guest post by Mick Semb Wever

Backblaze uses Apache Cassandra, a high-performance, scalable distributed database to help manage hundreds of petabytes of data. We engaged the folks at The Last Pickle to use their extensive experience to optimize the capabilities and performance of our Cassandra 3.11 cluster, and now they want to share their experience with a wider audience to explain what they found. We agree; enjoy!

— Andy

Wide Partitions in Apache Cassandra 3.11

by Mick Semb Wever, Consultant, The Last Pickle

Wide partitions in Cassandra can put tremendous pressure on the Java heap and garbage collector, impact read latencies, and can cause issues ranging from load shedding and dropped messages to crashed and downed nodes.

While the theoretical limit on the number of cells per partition has always been two billion cells, the reality has been quite different, as the impacts of heap pressure show. To mitigate these problems, the community has offered a standard recommendation for Cassandra users to keep partitions under 400MB, and preferably under 100MB.

However, in version 3 many improvements were made that affected how Cassandra handles wide partitions. Memtables, caches, and SSTable components were moved off-heap, the storage engine was rewritten in CASSANDRA-8099, and Robert Stupp made a number of other improvements listed under CASSANDRA-11206.

While working with Backblaze and operating a Cassandra version 3.11 cluster, we had the opportunity to test and validate how Cassandra actually handles partitions with this latest version. We will demonstrate that well designed data models can go beyond the existing 400MB recommendation without nodes crashing through heap pressure.

Below, we walk through how Cassandra writes partitions to disk in 3.11, look at how wide partitions impact read latencies, and then present our testing and verification of wide partition impacts on the cluster using the work we did with Backblaze.

The Art and Science of Writing Wide Partitions to Disk

First we need to understand what a partition is and how Cassandra writes partitions to disk in version 3.11.

Each SSTable contains a set of files, and the (–Data.db) file contains numerous partitions.

The layout of a partition in the –Data.db file has three components: a header, followed by zero or one static rows, which is followed by zero or more ordered Clusterable objects. The Clusterable object in this file may either be a row or a RangeTombstone that deletes data with each wide partition containing many Clusterable objects. For an excellent in-depth examination of this, see Aaron’s blog post Cassandra 3.x Storage Engine.

The –Index.db file stores offsets for the partitions, as well as the IndexInfo serialized objects for each partition. These indices facilitate locating the data on disk within the –Data.db file. Stored partition offsets are represented by a subclass of the RowIndexEntry. This subclass is chosen by the the ColumnIndex and depends on the size of the partition:

  • RowIndexEntry is used when there are no Clusterable objects in the partition, such as when there is only a static row. In this case there are no IndexInfo objects to store and so the parent RowIndexEntry class is used rather than a subclass.
  • The IndexEntry subclass holds the IndexInfo objects in memory until the partition has finished writing to disk. It is used in partitions where the total serialized size of the IndexInfo objects is less than the column_index_cache_size_in_kb configuration setting (which defaults to 2KB).
  • The ShallowIndexEntry subclass serializes IndexInfo objects to disk as they are created and references these objects using only their position in the file. This is used in partitions where the total serialized size of the IndexInfo objects is more than the column_index_cache_size_in_kb configuration setting.

These IndexInfo objects provide a sampling of positional offsets for rows within a partition, creating an index. Each object specifies the offset the page starts at, the first row and the last row.

So, in general, the bigger the partition, the more IndexInfo objects need to be created when writing to disk — and if they are held in memory until the partition is fully written to disk they can cause memory pressure. This is why the column_index_cache_size_in_kb setting was added in Cassandra 3.6 and the objects are now serialized as they are created.

The relationship between partition size and the number of objects was quantified by Robert Stupp in his presentation, Myths of Big Partitions.

IndexInfo numbers from Robert Stupp

How Wide Partitions Impact Read Latencies

Cassandra’s key cache is an optimization that is enabled by default and helps to improve the speed and efficiency of the read path by reducing the amount of disk activity per read.

Each key cache entry is identified by a combination of the keyspace, table name, SSTable, and the partition key. The value of the key cache is a RowIndexEntry or one of its subclasses — either IndexedEntry or the new ShallowIndexedEntry. The size of the key cache is limited by the key_cache_size_in_mb configuration setting.

When a read operation in the storage engine gets a cache hit it avoids having to access the –Summary.db and –Index.db SSTable components, which reduces that read request’s latency. Wide partitions, however, can decrease the efficiency of this key cache optimization because fewer hot partitions will fit into the allocated cache size.

Indeed, before the ShallowIndexedEntry was added in Cassandra version 3.6, a single wide row could fill the key cache, reducing the hit rate efficiency. When applied to multiple rows, this will cause greater churn of additions and evictions of cache entries.

For example, if the IndexEntry for a 512MB partition contains 100K+ IndexInfo objects and if these IndexInfo objects total 1.4MB, then the key cache would only be able to hold 140 entries.

The introduction of ShallowIndexedEntry objects changed how the key cache can hold data. The ShallowIndexedEntry contains a list of file pointers referencing the serialized IndexInfo objects and can binary search through this list, rather than having to deserialize the entire IndexInfo objects list. Thus when the ShallowIndexedEntry is used no IndexInfo objects exist within the key cache. This increases the storage efficiency of the key cache in storing more entries, but does still require that the IndexInfo objects are binary searched and deserialized from the –Index.db file on a cache hit.

In short, on wide partitions a key cache miss still results in two additional disk reads, as it did before Cassandra 3.6, but now a key cache hit incurs a disk read to the -Index.db file where it did not before Cassandra 3.6.

Object Creation and Heap Behavior with Wide Partitions in 2.2.13 vs 3.11.3

Introducing the ShallowIndexedEntry into Cassandra version 3.6 creates a measurable improvement in the performance of wide partitions. To test the effects of this and the other performance enhancement features introduced in version 3 we compared how Cassandra 2.2.13 and 3.11.3 performed when one hundred thousand, one million, or ten million rows were each written to a single partition.

The results and accompanying screenshots help illustrate the impact of object creation and heap behavior when inserting rows into wide partitions. While version 2.2.13 crashed repeatedly during this test, 3.11.3 was able to write over 30 million rows to a single partition before Cassandra Out-of-Memory crashed. The test and results are reproduced below.

Both Cassandra versions were started as single-node clusters with default configurations, excepting heap customization in the cassandra–env.sh:

MAX_HEAP_SIZE=”1G”
HEAP_NEWSIZE=”600M”

In Cassandra only the configured concurrency of memtable flushes and compactors determines how many partitions are processed by a node and thus pressuring its heap at any one time. Based on this known concurrency limitation, profiling can be done by inserting data into one partition against one Cassandra node with a small heap. These results extrapolate to production environments.

The tlp-stress tool inserted data in three separate profiling passes against both versions of Cassandra, creating wide partitions of one hundred thousand (100K), one million (1M), or ten million (10M) rows.

A tlp-stress profile for wide partitions was written, as no suitable profile existed. The read to write ratio used the default setting of 1:100.

The following command lines then implemented the tlp-stress tool:

# To write 100000 rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 100K# To write 1M rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 1M# To write 10M rows into one partition
tlp-stress run Wide –replication “{‘class’:’SimpleStrategy’,’replication_factor’: 1}” -n 10M

Each time tlp-stress executed it was immediately followed by a command to ensure the full count of specified rows passed through the memtable flush and were written to disk:

nodetool flush

The graphs in the sections below, taken from the Apache NetBeans Profiler, illustrate how the ShallowIndexEntry in Cassandra version 3.11 avoids keeping IndexInfo objects in memory.

Notably, the IndexInfo objects are instantiated far more often, but are referenced for much shorter periods of time. The Garbage Collector is more effective at removing short-lived objects, as illustrated by the GC pause times being barely present in the Cassandra 3.11 graphs compared to Cassandra 2.2 where GC pause times overwhelm the JVM.

Wide Partitions in Cassandra 2.2

Benchmarks were against Cassandra 2.2.13

One Partition with 100K Rows (2.2.13)

The following three screenshots shows the number of IndexInfo objects instantiated during the write benchmark, during compaction, and a heap profile.

The partition grew to be ~40MB.

Objects created during tlp-stress

screenshot of Cassandra 2.2 objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 2.2 objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 2.2 Heap profiled during tlp-stress and major compaction

The above diagrams do not have their x-axis expanded to the full width, but still encompass the startup, stress test, flush, and compaction periods of the benchmark.

When stress testing starts with tlp-stress, the CPU Time and Surviving Generations starts to climb. During this time the heap also starts to increase and decrease more frequently as it fills up and then the Garbage Collector cleans it out. In these diagrams the garbage collection intervals are easy to identify and isolate from one another.

One Partition with 1M Rows (2.2.13)

Here, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Times and the heap profile from the time writes started through when the compaction was completed.

The partition grew to be ~400MB.

Already at this size the Cassandra JVM is GC thrashing and has occasionally Out-of-Memory crashed.

Objects created during tlp-stress

screenshot of Cassandra 2.2.13 Objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 2.2.13 Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 2.2.13 Heap profiled during tlp-stress and major compaction

The above diagrams display a longer running benchmark, with the quiet period during the startup barely noticeable on the very left-hand side of each diagram. The number of garbage collection intervals and the oscillations in heap size are far more frequent. The GC Pause Time during the stress testing period is now consistently higher and comparable to the CPU Time. It only dissipates when the benchmark performs the flush and compaction.

One Partition with 10M Rows (2.2.13)

In this final test of Cassandra version 2.2.13, the results were difficult to reproduce reliably, as more often than not this test Out-of-Memory crashed from GC heap pressure.

The first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the GC Pause Time and the heap profile from the time writes started until compaction was completed.

The partition grew to be ~4GB.

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra Heap profiled during tlp-stress and major compaction

The above diagrams display consistently very high GC Pause Time compared to CPU Time. Any Cassandra node under this much duress from garbage collection is not healthy. It is suffering from high read latencies, could become blacklisted by other nodes due to its lack of responsiveness, and even crash altogether from Out-of-Memory errors (as it did often during this benchmark).

Wide Partitions in Cassandra 3.11.3

Benchmarks were against Cassandra 3.11.3

In this series, the graphs demonstrate how IndexInfo objects are created either from memtable flushes or from deserialization off disk. The ShallowIndexEntry is used in Cassandra 3.11.3 when deserializing the IndexInfo objects from the -Index.db file.

Neither form of IndexInfo objects reside long in the heap and thus the GC Pause Time is barely visible in comparison to Cassandra 2.2.13 despite the additional numbers of IndexInfo objects created via deserialization.

One Partition with 100K Rows (3.11.3)

As with the earlier version test of this size, the following two screenshots shows the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile from the time writes started through when the compaction was completed.

The partition grew to be ~40MB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

screenshot of Cassandra 3.11.3 objects created during tlp-stress

Objects created during subsequent major compaction

screenshot of Cassandra 3.11.3 objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

screenshot of Cassandra 3.11.3 Heap profiled during tlp-stress and major compaction

The diagrams above are roughly comparable to the first diagrams presented under Cassandra 2.2.13, except here the x-axis is expanded to full width. Note there are significantly more instantiated IndexInfo objects, but barely any noticeable GC Pause Time.

One Partition with 1M Rows (3.11.3)

Again, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile over the time writes started until the compaction was completed.

The partition grew to be ~400MB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

The above diagrams show a wildly oscillating heap as many IndexInfo objects are created, and shows many garbage collection intervals, yet the GC Pause Time remains low, if at all noticeable.

One Partition with 10M Rows (3.11.3)

Here again, the first two screenshots show the number of IndexInfo objects instantiated during the write benchmark and during the subsequent compaction process. The third screenshot shows the CPU & GC Pause Time and the heap profile over the time writes started until the compaction was completed.

The partition grew to be ~4GB, the same as with Cassandra 2.2.13

Objects created during tlp-stress

Objects created during subsequent major compaction

Heap profiled during tlp-stress and major compaction

Unlike this profile in 2.2.13, the cluster remains stable as it was when running 1M rows per partition. The above diagrams display an oscillating heap when IndexInfo objects are created, and many garbage collection intervals, yet GC Pause Time remains low, if at all noticeable.

Maximum Rows in 1GB Heap (3.11.3)

In an attempt to push Cassandra 3.11.3 to the limit, we ran a test to see how much data could be written to a single partition before Cassandra Out-of-Memory crashed.

The result was 30M+ rows, which is ~12GB of data on disk.

This is similar to the limit of 17GB of data written to a single partition as Robert Stupp found in CASSANDRA-9754 when using a 5GB Java heap.

screenshot of Cassandra 3.11.3 memory usage

What about Reads

The following graph reruns the benchmark on Cassandra version 3.11.3 over a longer period of time with a read to write ratio of 10:1. It illustrates that reads of wide partitions do not create the heap pressure that writes do.

screenshot of Cassandra 3.11.3 read functions

Conclusion

While the 400MB community recommendation for partition size is clearly appropriate for version 2.2.13, version 3.11.3 shows that performance improvements have created a tremendous ability to handle wide partitions and they can easily be an order of magnitude larger than earlier versions of Cassandra without nodes crashing through heap pressure.

The trade-off for better supporting wide partitions in Cassandra 3.11.3 is increased read latency as row offsets now need to be read off disk. However, modern SSDs and kernel pagecaches take advantage of larger configurations of physical memory providing enough IO improvements to compensate for the read latency trade-offs.

The improved stability and falling back on better hardware to deal with the read latency issue allows Cassandra operators to worry less about how to store massive amounts of data in different schemas and unexpected data growth patterns on those schemas.

Some CASSANDRA-9754 custom B+ tree structures will be used to more effectively look up the deserialised row offsets and further avoid the deserialization and instantiation of short-lived unused IndexInfo objects.


Mick Semb WeverMick Semb Wever designs, builds, and is an evangelist for distributed systems, from data-driven backends using Cassandra, Hadoop, Spark, to enterprise microservices platform.

The post How We Optimized Storage and Performance of Apache Cassandra at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

2018 in the Rear View Mirror

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/2018-in-the-rear-view-mirror/

2018 Year in Review

Thank you to all of our customers and friends. You’ve made 2018 a great year for Backblaze. Here’s a quick look at what we’ve been up to in the past year.

Behind the Scenes

Backblaze likes to be transparent in how we do business. Here are just a few areas where we pointed the light on ourselves:

Storage: We started the year with 500 petabytes of data storage. We’ll finish with over 750 petabytes of storage under management — next up, our first exabyte.

Durability: The durability of Backblaze B2 is eleven 9’s. Here’s how we calculated that number and what 99.999999999 really means for you.

Hard Drive Stats: We continue to publish our Hard Drive Stats reports each quarter, detailing the failure rates of the hard drives in the Backblaze data centers. Here’s the most recent report for Q3 2018 and here’s everything since we started.

$30 Million ARR: Backblaze got to $30 million annualized recurring revenue with only $3 million in funding. For some reason, some companies insist on doing the opposite.

Making Lemonade: Peer into Yev’s mind and see how he tackles the wild ride that is social media management.

Data Center PDUs: Read how one of our data center technicians solved the problem of too many cables in the wrong place in the data center.

The Startup CEO: Our CEO continued to publish a series of blog posts on the lessons learned from starting and operating Backblaze for the past 11 years.

Helium and Hard Drives: Take a look at how helium affects the hard drives we use.

People

Over this last year Backblaze hired 34 new people, let’s welcome: Janet, Morgan, John, Cheryl, Ebony, Athrea, Cameron, Skip, John, Vanna, Tina, Daniel, Jyotsna, Jack, Tim, Elliott, Josh, Steven, Victoria, Daren, Billy, Jacob, Nathan, Michele, Matt, Lin, Alex, and a few of others who choose to remain unnamed. Let’s not forget our 2018 interns; Nelly, Kelly, Angie, and Colin.

Come join us, we have some great jobs open for the right people.

Backblaze people 2011Backblaze people 2018

One thing that happens when you add people is you need a place for them to work. Sometimes that means people have to move around. Back in 2012, Yev and I staked out the marketing corner in the office, complete with a view of the back alley. It was our work home for over six years, but we recently had to say goodbye. The marketing group had grown too big for our little corner and we all had to move to a new location in the office. Sigh.

Marketing corner

Fun

MegaBot versus Backblaze: We gave a few busted Backblaze Storage Pods to a 30 foot tall robot who crushes other robots for a living. What could possibly go wrong?

Blog Noir: Reggie, the Macbook, had croaked. Could our hero use Backblaze to recover Reggie’s data from beyond the grave?

Backblaze Bling: We made a shiny version of Backblaze that has no additional features and is ridiculously more expensive version of Backblaze, but its only for sale on April 1st. Be the first on you block to overpay.

Holiday Wishes There’s a gift on the list for everyone on your shopping list.

Numbers

Behind every business there are a lot of numbers, here are a few we thought might be interesting.

11 years — Over the last 11 years, Backblaze has conducted our annual backup awareness survey looking at how often people back up their computers.

6% — In 2018, a mere 6% of the respondents backed up all the data on their computers at least once a day.

35 billion — The number of files Backblaze has restored for our Consumer Backup and Business Backup customers since we started keeping track in 2011.

876,388,286 — The number of Consumer and Business Backup files restored by our customers just in November 2018. That’s 29.2 million files per day or 1.2 million files per hour or 20,868 files per minute. That’s a lot of memories and musings returned to their rightful owners.

104,527 — The number of spinning hard drives in our data center, including data drives and boot drives.

1,920 — The number of Backblaze Storage Pods in use today. Nearly all are deployed in Backblaze Vaults.

Onward

2018 was a good year for Backblaze. Growth was not too fast and not too slow, but just right. We are all looking forward to 2019 and continuing to keep people’s data backed up, safe, and ready for when it’s needed.

The post 2018 in the Rear View Mirror appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LTO Versus Cloud Storage Costs — the Math Revealed

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/lto-versus-cloud-storage/

B2 Cloud Storage $68,813 vs. LTO 8 Tape $119,873

A few months back we did a blog post titled, LTO versus Cloud Storage: Choosing the Model That Fits Your Business. In that post we presented our version of an LTO vs. B2 Cloud Storage calculator, a useful tool to determine whether or not it makes economic sense to consider using cloud storage over your LTO storage.

Rather than just saying, “trust us, it’s cheaper,” we thought it would be a good idea to show you what’s inside the model: the assumptions we used, the variables we defined, and the actual math we used to compute our answers. In fact, we’re making the underlying model available for download.

Our Model: LTO vs Cloud Storage

The LTO vs. B2 calculator that is on our website was based on a Microsoft Excel spreadsheet we built. The Excel file we’ve provided for download below is completely self-contained; there are no macros and no external data sources.

Download Excel file: Backblaze-LTO-Calculator-Public-Nov2018.xlsx

The spreadsheet is divided into multiple sections. In the first section, you enter the four values the model needs to calculate the LTO and B2 cloud storage costs. The website implementation is obviously much prettier, but the variables and math are the same as the spreadsheet. Let’s look at the remaining sections.

Entered Values Section

The second section is for organization and documentation of the data that is entered. You also can see the limits we imposed on the data elements.

One question you may have is why we limited the Daily Incremental Backup value to 10 TB. As the comment notes, that’s about as much traffic you can cram through a 1Gbps upload connection in a 24-hour period. If you have bigger (or smaller) pipes, adjust accordingly.

Don’t use the model for one-time archives. You may be tempted to enter zeros in both the Yearly Added Data and Daily Incremental Backup fields to compare the cost of a one-time archive. The model is not designed to compare the cost of a one-time archive. It will give you an answer, but the LTO costs will be overstated by anywhere from 10%-50%. The model was designed for the typical LTO use case where data is written to tape, typically daily, based on the data backup plan.

Variables Section

The third section stores all the variable values you can play with in the model. There is a short description for each variable, but let’s review some general concepts:

Tapes — We use LTO-8 tapes that will decrease in cost about 20% per year down to $60. Non-compressed, these tapes store 12 TB each and take about 9.5 hours to fully load. We use 24 TB for each tape assuming 2:1 compression. If some or all of your data is comprised of video or photos, then compression cannot be used, which makes actual tape capacity number much lower and increases the cost of the LTO solution.

Tapes Used — Based on the grandfather-father-son (GFS) model and assumes you replace tapes once a year.

Maintenance — Assumes you have no spare units, so you cannot miss more than one business day for backups. You could add a spare unit and remove the maintenance or just decide it is OK to miss a day or two while the unit is being repaired.

Off-site Storage — The cost of getting your tapes off-site (and back) assuming a once a week pick-up/drop-off.

Personnel — The cost of the person doing the LTO work, and how much time per week they spend doing the LTO related work, including data restoration. The cost of a person doing the cloud storage work is calculated from this value as described in the Time Savings paragraph below.

Data Restoration — How much of your data on average you will restore each month. The model is a bit limited here in that we use an average for all time periods when downloads are typically uneven across time. You are, of course, welcome to adjust the model. One thing to remember is that you’ll want to test your restore process from time to time, so make sure you allocate resources for that task.

Time Savings — We make the assumption that you will only spend 25% of the time working with cloud storage versus managing and maintaining an LTO system, i.e. no more buying, mounting, unmounting, labeling, cataloging, packaging, reading, or writing tapes.

Model Section

The last section is where the math gets done. Don’t change specific values in this section as they all originate in previous sections. If you decide to change a formula, remember to do so across all 10 years. It is quite possible that many of these steps can be combined into more complex formulas. We break them out to try to make an already complicated calculation somewhat easier to follow. Let’s look at the major subsections.

Data Storage — This section is principally used to organize the different data types and amounts. The model does not apply any corporate data retention policies such as deleting financial records after seven years. Data that is deleted is done so solely based on the GFS backup model, for example, deleting incremental data sets after 30 days.

LTO Costs — This starts with defining the amount of data to store, then calculates the quantity of tapes needed and their costs, along with the number of drive units and their annual unit cost and annual maintenance cost. The purchase price of a tape drive unit is divided evenly over a 10-year period.

Why 10 years? The LTO foundation, states is will support LTO tapes two versions back and expects to release a new version every two years. If you buy an LTO-8 system is 2018, in 2024 LTO-11 will not be able to read your LTO-8 tapes. You are now using obsolete hardware. We assume your LTO-8 hardware will continue to be supported through third party vendors for at least four years (to 2028) after it goes obsolete.

We finish up with calculating the cost of the off-site storage service and finally the personnel cost of managing the system and maintaining the tape library. Other models seem to forget this cost or just assume it is the same as your cloud storage personnel costs.

Cloud Storage Costs — We start with calculating the cost to store the data. This uses the amount of data at the end of the year, versus trying to compute monthly numbers throughout the year. This overstates the total amount a bit, but simplifies the math without materially changing the results. We then calculate the cost to download the data, again using the number at the end of the period. We calculate the incremental cost of enhancing the network to send and restore cloud data. This is an incremental cost, not the total cost. Finally, we add in the personnel cost to access and check on the cloud storage system as needed.

Result Tables — These are the totals from the LTO and cloud storage section in one place.

B2 Fireball Section

There is a small section and some variables associated with the B2 Fireball data transfer service. This service is useful to transfer large amounts of data from your organization to Backblaze. There is a cost for this service of $550 per month to rent the Fireball, plus $75 for shipping. Organizations with existing LTO libraries often don’t want to use their network bandwidth to transfer their entire library, so they end up keeping some LTO systems just to read their archived tapes. The B2 Fireball can move the data in the library quickly and let you move completely away from LTO if desired.

Summary

While we think the model is pretty good there is always room for improvement. If you have any thoughts you’d like to share, let us know in the comments. One more thing: the model is free to update and use within your organization, but if you publicize it anywhere please cite Backblaze as the original source.

The post LTO Versus Cloud Storage Costs — the Math Revealed appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Buying a Hard Drive this Holiday Season? These Tips Will Help

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-buying-guide/

Hard drives with bows
Over the last few years we’ve shared many observations in our quarterly Hard Drive Stats reports that go beyond the hard drive failure rates. We decided to consolidate some of these additional observations into one post just in time for the holiday buying season. If you have “buy a hard drive” on your shopping list this holiday season, here is just about everything we know about hard disk drives.

First, let’s establish that we are talking about hard disk drives (HDDs) here and not solid state drives (SSDs). Here’s a Backblaze “What’s the Diff” blog post where we discuss the differences between HDD and SSD drives.

How Will You Use Your HDD?

Hard drive manufacturers build drive models for different use cases; that is, a given drive model is optimized for a given purpose. For example, a consumer drive may spin slower to save energy and provides little if any access to tools that can adjust the firmware settings on the drive. An enterprise class drive, on the other hand, is typically much faster and provides the user with access to features they can tweak to adjust performance and/or power usage.

Each drive manufacturer has their own criteria for their use cases, but in general there are five categories: consumer, NAS (network attached storage), archiving/video recording, enterprise, and more recently, data center. The different drive manufacturers have different variations on these categories, so the first thing you should do is to know what you are going to do with the drive before you start looking.

Hard Drive Recording Technologies

For a long time, the recording technology a drive manufacturer used was not important. Then SMR (shingled magnetic recording) drives appeared a couple of years ago.

Let’s explain:

PMR: Perpendicular Magnetic Recording
This is the technology inside of most hard drives. With PMR data is written to and read from circular tracks on a spinning platter.
SMR: Shingled Magnetic Recording
This type of drive overlaps recording tracks to store data at a lower cost than PMR technology. The downside occurs when data is deleted and that space is reused. If existing data overlaps the space you want to reuse, this can mean delays in writing the new data. These drives are great for archive storage (write once, read many) use cases, but if your files turn over with some regularity, stick with PMR drives.

That sounds simple, but here are two things you should know:

  1. SMR drives are often the least expensive drives available when you consider the cost per gigabyte. If you are price sensitive, you may believe you are getting a great deal, but you may be buying the wrong drive for your use case. For example, buying SMR drives for your NAS device running RAID 6 would be ugly because of all the rewrites that may be involved.
  2. It is sometimes really hard to figure out if the drive you want to buy is an SMR or PMR drive. For example, based on the cost per gigabyte, the 8TB Seagate external drive (model: STEB8000100) is one of the least expensive external drives out there right now. But, the 8TB drive inside is an SMR drive, and that fact is not obvious to the buyer. To be fair, the manufacturers try to guide buyers to the right drive for their use case, but a lot of that guiding information is lost on reseller sites such as Amazon and Newegg, where the buyer is often blinded by price.

Over the next couple of years, HAMR (heat-assisted magnetic recording) by Seagate and MAMR (microwave-assisted magnetic recording) by Western Digital will be introduced, making the drive selection process even more complicated.

What About Refurbished Drives?

Refurbished drives are hard drives that have been returned to the manufacturer and repaired in some way to make them operational. Given the cost, repairs are often limited to what can be done in the software or firmware of the failed drive. For example, the repair may consist of identifying a section of bad media on a drive platter and telling the drive to read and write around it.

Once repaired, refurbished drives are tested and often marked certified by the manufacturer, e.g. “Certified Refurbished.” Refurbished drives are typically less expensive and come with a limited warranty, often one year or less. You can decide if you want to use these types of drives in your environment.

Helium-Filled versus Air-Filled Drives

Helium-filled drives are finally taking center stage after spending years as an experimental technology. Backblaze has in part used helium-filled drives since 2015, and over the years we’ve compared helium-filled drives to air-filled drives. Here’s what we know so far.

The first commercial helium-filled drives were 6TB; the transition to helium took hold at 8TB as we started seeing helium-filled 8TB drives from every manufacturer. Today helium-filled 12 and 14TB drives are now available at a reasonable price per terabyte.

Helium drives have two advantages over their air-filled cohorts: they create less heat and they use less power. Both of these are important in data centers, but may be less important to you, especially when you consider the primary two disadvantages: a higher cost and lack of experience. The street-price premium for a helium-filled drive is roughly 20% right now versus an air-filled drive of the same size. That premium is expected to decrease as time goes on.

While price is important, the lack of experience of helium-filled drives may be more interesting as these drives have only been in the field in quantity a little over four years. That said, we have had helium-filled drives in service for 3.5 years. They are solid performers with a 1.2% annualized failure rate and show no signs of hitting the wall.

Enterprise versus Consumer Drives

In our Q2 2018 Hard Drive Stats report we delved into this topic, so let’s just summarize some of the findings below.

We have both 8TB consumer and enterprise models to compare. Both models are from Seagate. The consumer drive is model ST800DM002 and the Enterprise drive model is ST800NM0055. The chart below, from the Q2 2018 report, shows the failure rates for each of these drive models at the same average age of all of the drives of the specified model.

Annualized Hard Drive Failure Rates by Time table

When you constrain for the average age of each of the drive models, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much. By the way, conducting the same analysis at an average age of 15 months showed little change, with the consumer drive recording a 1.10% AFR and the enterprise drive holding at 0.97% AFR.

Whether every enterprise model is better than every corresponding consumer model is unknown, but below are a few reasons you might choose one class of drive over another:

Enterprise Class Drives

  • Longer Warranty: 5 years vs. 2 years
  • More Accessible Features, i.e. Seagate PowerChoice technology
  • Faster reads and writes

Consumer Class Drives

  • Lower Price: Up to 50% less
  • Similar annualized failure rates as enterprise drives
  • Uses less power and produces less heat

Hard Drive Failure Rates

As many of you know, each quarter Backblaze publishes our Hard Drive Stats report for the hard drives in our data centers. Here’s the lifetime chart from our most recent Q3 2018 report.

Backblaze Lifetime Hard Drive Failure Rates table

Along with the report, we also publish the data we used to create the reports. We are not alone. Let’s look at the various ways you can find hard drive failure rates for the drive you wish to purchase.

Backblaze AFR (annualized failure rate)
The failure rate of a given hard drive model based on the number of days a drive model is in use and the number of failures of that drive model. Here’s the formula:

( ( Drive Failures / ( Drive Days / 365 ) ) * 100 )
MTBF (mean time between failures)
TBF is the term some disk drive manufacturers use to quantify disk drive average failure rates. It is the average number of service hours between failures. This is similar to MTTF (mean time to failure), which is the average time to the first failure. MTBF has been superseded by AFR for some drive vendors as described below.
AFR (Seagate and Western Digital)
These manufacturers have decided to replace MTBF with AFR. Their definition of AFR is the probable percent of failures per year, based on the manufacturer’s total number of installed units of similar type. While Seagate and WD don’t give the specific formula for calculating AFR, Seagate notes that AFR is similar to MTBF and differs only in units. One way for converting MTBF to AFR can be found here.
Comparing Backblaze AFR to the Seagate/WD AFR
The Backblaze environment is a closed system, meaning we know with a high degree of certainty the variables we need to compute the Backblaze AFR percentage. We also know most, if not all, of the mitigating factors. The Seagate/WD AFR environment is made up of potentially millions of drives in the field (home, office, mobile, etc.) where the environmental variables can be quite varied and in some cases unknown. Either of the AFR calculations can be considered as part of your evaluation if you are comfortable with how they are calculated.
CDL (component design life)
This term is used by Western Digital in their support knowledge base although we don’t see it in their technical specifications yet. The example provided in the knowledge base article is, “The Component Design Life of the drive is 5 years and the Annualized Failure Rate is less than 0.8%.” With those two numbers you can calculate that no more than four out of 100 drives will die in a five-year period. The is really good information, but it is not readily available yet.

Which Hard Drive Do I Need?

While hard drive failure rates are interesting, we believe that our Hard Drive Stats reports are just one of the factors to consider in your hard drive buying decision. Here are some things you should think about, in no particular order:

  • Your use case
    • What you will do with the drive.
  • What size drive do you need?
    • Using it as a Time Machine backup? It should be 3-4 times the size of your internal hard drive. Using it as an archive for your photo collection? — bigger is better.
  • How long do you want the drive to last?
    • Forever is not a valid answer. We suggest starting with the warranty period and subtracting a year if you move the drive around a lot or if you fill it up and stuff it in the closet.
  • The failure rate of the drive
    • We talked about that above.
  • What your friends think
    • You might get some good advice.
  • What the community thinks
    • reddit, Hacker News, Spiceworks, etc.
  • Product reviews
    • I read them, but only to see if there is anything else worth investigating via other sources.
  • Product review sites
    • These days, many review sites on the internet are pay-to-play, although not all. Pay-to-play means the vendor pays the site either for their review or if the review leads to a sale. Sometimes, whoever pays the most gets to the top of the list. This isn’t true for all sites, but often it is really hard to tell the good guys. One of our favorite sites, Tom’s Hardware, has stopped doing HDD reviews, so if you have a site you trust for such reviews, share it in the comments, we’d all like to know.
  • The drive manufacturer
    • Most drive manufacturer websites provide information that can help you determine the right drive for your use case. Of course, they are also trying to sell you a drive, but the information, especially the technical specs, can be useful.

What about price? We left that out of our list as many people start and end their evaluation with just price and we wanted to mention a few other things we thought could be important. Speaking of price…

What’s a Good Price for a Hard Drive?

Below is our best guess as to what you could pay over the next couple of months for different sized internal drives. Of course, there are bound to be some great discounts on Black Friday, Cyber Monday, Hanukkah, Christmas, Kwanzaa, Boxing Day, Winter Solstice, and Festivus — to name a few holiday season reasons for a sale on hard disk drives.

Drive SizePriceCost per GB
1TB$35$0.035
2TB$50$0.25
3TB$75$0.25
4TB$100$0.25
6TB$170$0.28
8TB$250$0.31
10TB$300$0.30
12TB$380$0.32
14TB$540$0.39

How Much Do External Hard Drives Cost?

We wanted to include the same information about external hard drives, but there is just too much unclear information to feel good about doing it. While researching this topic, we came across multiple complaints about a wide variety of external drive systems containing refurbished or used drives. In reviewing the advertisements and technical specs, the fact that the HDD inside an external drive sometimes is not new often gets left off the specifications. In addition, on Amazon and similar sites, many of the complaints were from purchases made via third party sellers and not the original external drive manufacturers, so check the “by” tag before buying.

Let’s make it easy: an external hard drive should have at least a two-year warranty and be available from a trusted source. The list price for the external drive should be about 10-15% higher than the same sized internal drive. What you will actually pay, the street price, is based on supply and demand and a host of other factors. Don’t be surprised if the cost of an external drive is sometimes less than a corresponding internal drive — that’s just supply and demand at work. Following this guidance doesn’t mean the drive won’t fail, it just means you’ll have better odds at getting a good external drive for your money.

One More Thing Before You Buy

The most important thing to consider when buying a hard drive is the value of the data on the drive and what it would cost to replace that data. If you have a good backup plan and practice the 3-2-1 backup strategy, then the value of a given drive is low and limited to the time and cost it takes to replace the drive that goes bad. That’s annoying, yes, but you still have your data. In other words, if you want to get the most for your money when buying a hard drive, have a good backup plan.

The post Buying a Hard Drive this Holiday Season? These Tips Will Help appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Migrating from CrashPlan: Arq and B2

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/migrating-crashplan-arq-backup-b2/

Arq and Backblaze B2 logos on a computer screen

Many ex-CrashPlan for Home users have moved to Backblaze over the last year. We gave them a reliable, set-and-forget backup experience for the amazing price of $5/month per computer. Yet some people wanted features such as network share backup and CrashPlan’s rollback policy, and Arq Backup can provide those capabilities. So we asked Stefan Reitshamer of Arq to tell us about his solution.

— Andy

Migrating from CrashPlan
by Stefan Reitshamer, Founder, Arq Backup

CrashPlan for Home is gone — no more backups to CrashPlan and no more ability to restore from your old backups. Time to find an alternative!

Arq + Backblaze B2 = CrashPlan Home

If you’re looking for many of the same features as CrashPlan plus affordable storage, Arq + B2 cloud storage is a great option. MacWorld’s review of Arq called it “more reliable and easier to use than CrashPlan.”

Just like CrashPlan for Home, Arq lets you choose your own encryption password. Everything is encrypted before it leaves your computer, with a password that only you know.

Also just like CrashPlan for Home, Arq keeps all backups forever by default. Optionally you can tell it to “thin” your backup records from hourly to daily to weekly as they age, similar to the way Time Machine does it. And/or you can set a budget and Arq will periodically delete the oldest backup records to keep your costs under control.

With Arq you can back up whatever you want — no limits. Back up your external hard drives, network shares, etc. Arq won’t delete backups of an external drive no matter how long it’s been since you’ve connected it to your computer.

The license for Arq is a one-time cost and, if you use multiple Macs and/or PCs, one license covers all of them. The pricing for B2 storage is a fraction of the cost of any scale cloud storage provider — just $0.005/GB per month and the first 10GB is free. To put that in context, that’s 1/4th the price of Amazon S3. The savings becomes more pronounced if/when you need to restore your files. B2 only charges a flat rate of $0.01/GB for data download, and you get 1 GB of downloads free every day. By contract, Amazon S3 has tiered pricing that starts at 9 times that of B2.

Arq’s Advanced Features

Arq is a mature product with plenty of advanced features:

  • You can tell Arq to pause backups whenever you’re on battery.
  • You can tell Arq to pause backups during a certain time window every day.
  • You can tell Arq to keep your computer awake until it finishes the backup.
  • You can restrict which Wi-Fi networks and which network interfaces Arq uses for backup.
  • You can restrict how much bandwidth Arq uses when backing up.
  • You can configure Arq to send you email every time it finishes backing up, or only if there were errors during backup.
  • You can configure Arq to run a script before and/or after backup.
  • You can configure Arq to back up to multiple B2 accounts if you wish. Back up different folders to different B2 accounts, configure different schedules for each B2 account, etc.

Arq is fully compatible with B2. You can configure it with your B2 account ID and master application key, or you can use B2’s new application keys feature to restrict which bucket(s) Arq can write to.

Privacy and Control

With Arq and B2 storage, you keep control of your data because it’s your B2 account and your encryption password — even if an attacker got access to the B2 data they wouldn’t be able to read your encrypted files. Your backups are stored in an open, documented format. There’s even an open-source restore tool.

The post Migrating from CrashPlan: Arq and B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q3 2018: Less is More

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/2018-hard-drive-failure-rates/

Backblaze Drive Stats Q3 2018

As of September 30, 2018 Backblaze had 99,636 spinning hard drives. Of that number, there were 1,866 boot drives and 97,770 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. In addition, we’ll say goodbye to the last of our 3TB drives, hello to our new 12TB HGST drives, and we’ll explain how we have 584 fewer drives than last quarter, but have added over 40 petabytes of storage. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q3 2018

At the end of Q3 2018, Backblaze was monitoring 97,770 hard drives used to store data. For our evaluation, we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 97,600 hard drives. The table below covers what happened in Q3 2018.

Backblaze Q3 2018 Hard Drive Failure Rates chart

Notes and Observations

  • If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q3 2018.
  • Quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
  • There were 170 drives (97,770 minus 97,600) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

When to Replace a Hard Drive

As noted, at the end of Q3 that we had 584 fewer drives, but over 40 petabytes more storage space. We replaced 3TB, 4TB, and even a handful of 6TB drives with 3,600 new 12TB drives using the very same data center infrastructure, i.e. racks of Storage Pods. The drives we are replacing are about 4 years old. That’s plus or minus a few months depending on how much we paid for the drive and a number of other factors. Keeping lower density drives in service when higher density drives are both available and efficiently priced does not make economic sense.

Why Drive Migration Will Continue

Over the next several years, data growth is expected to explode. Hard drives are still expected to store the bulk of that data, meaning cloud storage companies like Backblaze will have to increase capacity by either increasing existing storage density and/or building, or building out, more data centers. Drive manufacturers, like Seagate and Western Digital, are looking at HDD storage densities of 40TB as early as 2023, just 5 years away. It is significantly less expensive to replace lower density operational drives in a data center versus building a new facility or even building out an existing facility to house the higher density drives.

Goodbye 3TB WD Drives

For the last couple of quarters, we had 180 Western Digital 3TB drives (model: WD30EFRX) remaining — the last of our 3TB drives. In early Q3, they were removed and replaced with 12TB drives. These 3TB drives were purchased in the aftermath of the Thailand drive crisis and installed in mid-2014 and were still hard at work when we replaced them. Sometime over the next couple of years we expect to say goodbye to all of our 4TB drives and upgrade them to 14, 16, or even 20TB drives. After that it will be time to “up-density” our 6TB systems, then our 8TB systems, and so on.

Hello 12TB HGST Drives

In Q3 we added 79 HGST 12TB drives (model: HUH721212ALN604) to the farm. While 79 may seem like an unusual number of drives to add, it represents “stage 2” of our drive testing process. Stage 1 uses 20 drives, the number of hard drives in one Backblaze Vault tome. That is, there are are 20 Storage Pods in a Backblaze Vault, and there is one “test” drive in each Storage Pod. This allows us to compare the performance, etc., of the test tome to the remaining 59 production tomes (which are running already-qualified drives). There are 60 tomes in each Backblaze Vault. In stage 2, we fill an entire Storage Pod with the test drives, adding 59 test drives to the one currently being tested in one of the 20 Storage Pods in a Backblaze Vault.

To date, none of the 79 HGST drives have failed, but as of September 30th, they were installed only 9 days. Let’s see how they perform over the next few months.

A New Drive Count Leader

For the last 4 years, the drive model we’ve deployed the most has been the 4TB Seagate drive, model ST4000DM000. In Q3 we had 24,208 of this drive model, which is now only good enough for second place. The 12TB Seagate drive, model ST12000NM0007, became our new drive count leader with 25,101 drives in Q3.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models in operation as of September 30th, 2018. For each model, we compute their reliability starting from when they were first installed.

Backblaze Lifetime Hard Drive Failure Rates Chart

Notes and Observations

  • The failure rates of all of the larger drives (8, 10, and 12 TB) are very good: 1.21% AFR (Annualized Failure Rate) or less. In particular, the Seagate 10TB drives, which have been in operation for over 1 year now, are performing very nicely with a failure rate of 0.48%.
  • The overall failure rate of 1.71% is the lowest we have ever achieved, besting the previous low of 1.82% from Q2 of 2018.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q3 2018: Less is More appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze B2 API Version 2 Beta is Now Open

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-b2-api-version-2-beta-is-now-open/

cloud storage workflow image

Since B2 cloud storage was introduced nearly 3 years ago, we’ve been adding enhancements and new functionality to the B2 API, including capabilities like CORS support and lifecycle rules. Today, we’d like to introduce the beta of version 2 of the B2 API, which formalizes rules on application keys, provides a consistent structure for all API calls returning information about files, and cleans up outdated request parameters and returned data. All version 1 B2 API calls will continue to work as is, so no changes are required to existing integrations and applications.

The API Versions section of the B2 documentation on the Backblaze website provides the details on how the V1 and V2 APIs differ, but in the meantime here’s an overview into the what, why, and how of the V2 API.

What Has Changed Between the B2 Cloud Storage Version 1 and Version 2 APIs?

The most obvious difference between a V1 and V2 API call is the version number in the URL. For example:

https://apiNNN.backblazeb2.com/b2api/v1/b2_create_bucket

https://apiNNN.backblazeb2.com/b2api/v2/b2_create_bucket

In addition, the V2 API call may have different required request parameters and/or required response data. For example, the V2 version of b2_hide_file always returns accountId and bucketId, while V1 returns accountId.

The documentation for each API call will show whether there are any differences between API versions for a given API call.

No Change is Required For V1 Applications

With the introduction of V2 of the B2 API there will be V1 and V2 versions for every B2 API call. All applications using V1 API calls will continue to work with no change in behavior. In some cases, a given V2 API call will be different from its companion V1 API call as noted in the B2 API documentation. For the remaining API calls a given V1 API call and its companion V2 call will be the same, have identical parameters, return the same data, and have the same errors. This provides a B2 developer the flexibility to choose how to upgrade to the V2 API.

Obviously, if you want to use the functionality associated with a V2 API version, then you must use the V2 API call and update your code accordingly.

One last thing: beginning today, if we create a new B2 API call it will be created in the current API version (V2) and most likely will not be created in V1.

Standardizing B2 File Related API Calls

As requested by many B2 developers, the V2 API now uses a consistent structure for all API calls returning information about files. To enable this there are some V2 API calls that return additional fields, for example:

Restricted Application Keys

In August we introduced the ability to create restricted applications keys using the B2 API. This capability allows an account owner the ability to restrict who, how, and when the data in a given bucket can be accessed. This changed the functionality of multiple B2 API calls such that a user could create a restricted application key that could break a 3rd party integration to Backblaze B2. We subsequently updated the affected V1 API calls, so they could continue to work with the existing 3rd party integrations.

The V2 API fully implements the expected behavior when it comes to working with restricted application keys. The V1 API calls continue to operate as before.

Here is an example of how the V1 API and the V2 API will act differently as it relates to restricted application keys.

Set-up

  • The B2 account owner has created 2 public buckets, “Backblaze_123” and “Backblaze_456”
  • The account owner creates a restricted application key that allows the user to read the files in the bucket named “Backblaze_456”
  • The account owner uses the restricted application key in an application that uses the b2_list_buckets API call

In Version 1 of the B2 API

  • Action: The account owner uses the restricted application key (for bucket Backblaze_456) to access/list all the buckets they own (2 public buckets).
  • Result: The results returned are just for Backblaze_456 as the restricted application key is just for that bucket. Data about other buckets is not returned.

While this result may seem appropriate, the data returned did not match the question asked, i.e. list all buckets. V2 of the API ensures the data returned is responsive to the question asked.

In Version 2 of the B2 API

  • Action: The account owner uses the restricted application key (for bucket Backblaze_456) to access/list all the buckets they own (2 public buckets).
  • Result: A “401 unauthorized” error is returned as the request for access to “all” buckets does not match the restricted application key, e.g. bucket Backblaze_456. To achieve the desired result, the account owner can specify the name of the bucket being requested in the API call that matches the restricted application key.

Cleaning up the API

There are a handful of API calls in V2 where we dropped fields that were deprecated in V1 of the B2 API, but were still required. So in V2:

  • b2_authorize_account: The response no longer contains minimumPartSize. Use partSize and absoluteMinimumPartSize instead.
  • b2_list_file_names: The response no longer contains size. Use contentLength instead.
  • b2_list_file_versions: The response no longer contains size. Use contentLength instead.
  • b2_hide_file: The response no longer contains size. Use contentLength instead.

Support for Version 1 of the B2 API

As noted previously, V1 of the B2 API continues to function. There are no plans to stop supporting V1. If at some point in the future we do deprecate the V1 API, we will provide advance notice of at least one year before doing so.

The B2 Java SDK and the B2 Command Tool

Both the B2 Java SDK and the B2 Command Line Tool, do not currently support Version 2 of B2 API. They are being updated and will support the V2 API at the time the V2 API exits Beta and goes GA. Both of these tools, and more, can be found in the Backblaze GitHub repository.

More About the Version 2 Beta Program

We introduced Version 2 of the B2 API as beta so that developers can provide us feedback before V2 goes into production. With every B2 integration being coded differently, we want to hear from as many developers as possible. Give the V2 API a try and if you have any comments you can email our B2 beta team at b2beta@backblaze.com or contact Backblaze B2 support. Thanks.

The post Backblaze B2 API Version 2 Beta is Now Open appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LTO versus Cloud Storage: Choosing the Model That Fits Your Business

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/lto-vs-cloud-storage-vs-hybrid/

Choose Your Solution: Cloud Storage, LTO, Hybrid Cloud Storage/LTO

Years ago, when I did systems administration for a small company, we used RAID 1 for in-house data redundancy and an LTO tape setup for offsite data backup. Yes, the LTO cataloging and versioning were a pain, so was managing the tapes, and sometimes a tape would be unreadable, but the setup worked. And given there were few affordable alternatives out there at the time you lived and died with your tapes.

Over the last few years, cloud storage has emerged as a viable alternative to using LTO for offsite backups. Improvements in network speed coupled with lower costs are a couple of the factors that have changed the calculus of cloud storage. To see if enough has changed to make cloud storage a viable competitor to LTO, we’ll start by comparing the current and ongoing cost of LTO versus cloud storage and then dig into assumptions underlying the cost model. We’ll finish up by reviewing the pluses and minuses of three potential outcomes: switching to cloud storage, staying with LTO, or using a hybrid LTO/cloud storage solution.

Comparing the Cost of LTO Versus Cloud Storage

Cost calculators for comparing LTO to Cloud Storage have a tendency to be very simple or very complex. The simple ones generally compare hardware and tape costs to cloud storage costs and neglect things like personnel costs, maintenance costs, and so on. In the complex models you might see references to the cost of capital, interest on leasing equipment, depreciation, and the tax implications of buying equipment versus paying for a monthly subscription service.

The Backblaze LTO vs Cloud calculator is somewhere in between. The underlying model takes into account many factors, which we’ll get into in a moment, but if you are a Fortune 500 company with a warehouse full of tape robots, this model is not for you.

Calculator: LTO vs B2

To use the Backblaze calculator you enter:

  1. the amount of Existing Data you have on LTO tape
  2. the amount of data you expect to add in a given year
  3. the amount of incremental data you backup each day

Then you can use the slider to compare your total cost from 1 to 10 years. You can run the model as many times as you like under different scenarios.

Assumptions Behind the Model

To see the assumptions that were made in creating the model, start on the LTO Replacement page and scroll down past the LTO vs. B2 calculator. Click on the following text which will display the “Cost and Operational Assumptions” page.

+ See details on Cost and Operational Assumptions

Let’s take a few minutes to review some of the most relevant points and how they affect the cost numbers reported:

  • LTO Backup Model: We used the Grandfather-Father-Son (GFS) model. There are several others, but this was the most prevalent. If you use the “Tower of Hanoi” model for example, it uses fewer tapes and would lower the cost of the total LTO cost by some amount.
  • Data Compression: We assumed a 2-1 compression ratio for the data stored on the LTO tapes. If your data is principally video or photos, you will most likely not use compression. As such, film studios and post-production houses will need to double the cost of the total LTO solution to compensate for the increased number of tapes, the increased number of LTO tape units, and increased personnel costs.
  • Data Retention: We used a 30 day retention period as this is common in the GFS model. If you keep your incremental tapes/data for 2 weeks, then you would lower the number of tapes needed for incremental backups, but you would also lower the amount of incremental data you keep in the cloud storage system.
  • Tape Units: There are a wide variety of LTO tape systems. You can increase or decrease the total LTO cost based on the systems you are using. For example, you are considering the purchase of an LTO tape system which reads/writes up to 5 tapes simultaneously. That system is more expensive and has higher maintenance costs, but it also would mean you would have to purchase fewer tape units.
  • LTO-8 Tape Units: We used LTO-8 tape units as they are the currently available LTO system most likely to be around in 10 years.
  • Tape Migration: We made no provision for migration from an unsupported LTO version to a supported LTO version. During the next 10 years, many users with older LTO systems will find it likely they will have to migrate to newer systems as LTO only supports 2 generations back and is currently offering a new generation every 2 years.
  • Pickup Cost: The cost of having your tapes picked up so they are offsite. This cost can vary widely based on geography and service level. Our assumption of the cost is $60 per week or $3,120/year. You can adjust the LTO total cost according to your particular circumstances.
  • Network Cost: Using cloud storage requires that you have a reasonable amount of network bandwidth available. The number we used is incremental to your existing monthly cost for bandwidth. Network costs vary widely, so depending on your circumstance you can increase or decrease to the total cost of the cloud storage solution.
  • Personnel Cost: This is the total cost of what you are paying someone to manage and operate your LTO system. This raises or lowers the cost of both the LTO and cloud storage solutions at the same rate, so adjusting this number doesn’t affect the comparison, just the total values for each.
  • Time Savings Versus LTO: With a cloud storage solution, there are no tapes or tape machines to deal with. This saves a significant amount of time for the person managing the backup process. Increasing this value will increase the cost of the cloud storage solution relative to the LTO solution.

As hinted at earlier, we don’t consider the cost of capital, depreciation, etc. in our calculations. The general model is that a company purchases a number of LTO systems and the cost is spread over a 10 year period. After 10 years a replacement unit is purchased. Other items such as tapes and equipment maintenance are purchased and expensed as needed.

Choosing a Data Backup Model

We noted earlier the three potential outcomes when evaluating LTO versus cloud storage for data backup: switching to cloud storage, staying with LTO, or using a hybrid LTO/cloud storage solution. Here’s a look at each.

Switching to Cloud Storage

After using the calculator you find cloud storage is less expensive for your business or organization versus LTO. You don’t have a large amount of existing data, 100 terabytes for example, and you’d rather get out of the tape business entirely.

Your first challenge is to move your existing data to the cloud — quickly. One solution is the Backblaze B2 Fireball data transfer service. You can move up to 70 TB of data each trip from your location to Backblaze in days. This saves your bandwidth and saves time as well.

As the existing data is being transferred to Backblaze, you’ll want to select a product or service to move your daily generated information to the cloud on a regular basis. Backblaze has a number of integration partners that perform data backup services to Backblaze B2

Staying with LTO

After using the calculator you find cloud storage is less expensive, but you are one of those unlucky companies that can’t get reasonably priced bandwidth in their area. Or perhaps, the new LTO-8 equipment you ordered arrived minutes before you read this blog post. Regardless, you are destined to use LTO for at least a while longer. Tried and true, LTO does work and has the added benefit of making the person who manages the LTO setup nearly indispensable. Still, when you are ready, you can look at moving to the hybrid model described next.

Hybrid LTO/Cloud Storage model

In practice, many organizations that use LTO for backup and archive often store some data in the cloud as well, even if haphazardly. For our purposes, Hybrid LTO/Cloud Storage is defined as one of the following:

  1. Date Hybrid: All backups and archives from prior to the cut over date remain stored in LTO; everything after the cut over date date forward is stored in cloud storage.
  2. Classic Hybrid: All of the incremental backups are stored in cloud storage and all full backups and archives are stored on LTO.
  3. Type Hybrid: All data of a given type, say employee data, is stored on LTO, while all customer data is stored in cloud storage. We see this hybrid use case occur as a function of convenience and occasionally compliance, although some regulatory requirements such as GDPR may not be accommodated by LTO solutions.

You can imagine there being other splits, but in essence, there may be situations where keeping the legacy system going in some capacity for some period of time is the prudent business option.

If you have a large tape library, it can be almost paralyzing to think about moving to the cloud, even if it is less expensive. Being open to the hybrid LTO/cloud model is a way to break the task down into manageable steps. For example, solutions like Starwind VTL and Archiware P5 allow you to start backing up to the cloud with minimal changes to your existing tape-based backup schemes.

Many companies that start down the hybrid road typically begin with moving their daily incremental files to the cloud. This immediately reduces the amount of “tape work” you have to do each day and it has the added benefit of making the files readily available should they need to be restored. Once a company is satisfied that their cloud based backups for their daily incremental files are under control, they can consider whether or not they need to move the rest of their data to the cloud.

Will Cloud Storage Replace LTO?

At some point, the LTO tapes you have will need to be migrated to something else as the equipment to read your old tapes will become outdated, then unsupported, and finally unavailable. Users with LTO 4 and, to some degree, LTO 5 are already feeling this pain. To migrate all of that data from your existing LTO system to LTO version “X,” cloud storage, or something else, will be a monumental task. It is probably a good idea to start planning for that now.

In summary, many people will find that they can now choose cloud storage over LTO as an affordable way to store their data going forward. But, having a hybrid environment of both LTO and cloud storage is not only possible, it is a practical way to reduce your overall backup cost while maximizing your existing LTO investment. The hybrid model creates an improved operational environment and provides a pathway forward should you decide to move exclusively to storing your data in the cloud at some point in the future.

The post LTO versus Cloud Storage: Choosing the Model That Fits Your Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Leverage Your Amazon S3 Experience to Code the Backblaze B2 API

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/how-to-code-backblaze-b2-api-interface/

Going from S3 to learning Backblaze B2

We wrote recently about how the Backblaze B2 and Amazon S3 APIs are different. What we neglected to mention was how to bridge those differences so a developer can create a B2 interface if they’ve already coded one for S3. John Matze, Founder of BridgeSTOR, put together his list of things consider when levering your S3 API experience to create a B2 interface. Thanks John.   — Andy
BackBlaze B2 to Amazon S3 Conversion
by John Matze, Founder of BridgeSTOR

BackBlaze B2 Cloud Storage Platform has developed into a real alternative to the Amazon S3 online storage platform with the same redundancy capabilities but at a fraction of the cost.

Sounds great — sign up today!

Wait. If you’re an application developer, it doesn’t come free. The Backblaze REST API is not compatible with Amazon S3 REST API. That is the bad news. The good news — it includes almost the entire set of functionality so converting from S3 to B2 can be done with minimal work once you understand the differences between the two platforms.

This article will help you shortcut the process by describing the differences between B2 and S3.

  1. Endpoints: AWS has a standard endpoint of s3.amazonaws.com which redirects to the region where the bucket is located or you may send requests directly to the bucket by a region endpoint. B2 does not have regions, but does have an initial endpoint called api.blackblazeb2.com. Every application must start by talking to this endpoint. B2 also requires two other endpoints. One for uploading an object and another one for downloading an object. The upload endpoint is generated on demand when uploading an object while the download API is returned during the authentication process and may be saved for download requests.
  1. Host: Unlike Amazon S3, the HTML header requires the host token. If it is not present, B2 will not respond with an error.
  1. JSON: Unlike S3, which uses XML, all B2 calls use JSON. Some API calls require data to be sent on the request. This data must be in JSON and all APIs return JSON as a result. Fortunately, the amount of JSON required is minimal or none at all. We just built a JSON request when required and made a simple JSON parser for returned data.
  1. Authentication: Amazon currently has two major authentication mechanisms with complicated hashing formulas. B2 simply uses the industry standard “HTTP basic auth” algorithm. It takes only a few minutes to get to speed on this algorithm.
  1. Keys: Amazon has the concept of an access key and a secret key. B2 has the equivalent with the access key being your key id (your account id) and the secret key being the application id (returned from the website) that maps to the secret key.
  1. Bucket ID: Unlike S3, almost every B2 API requires a bucket ID. There is a special list bucket call that will display bucket IDs by bucket name. Once you find your bucket name, capture the bucket ID and save it for future API calls.
  1. Head Call: The bottom line — there is none. There is, however, a list_file_names call that can be used to build your own HEAD call. Parse the JSON returned values and create your own HEAD call.
  1. Directory Listings: B2 Directories again have the same functionality as S3, but with a different API format. Again the mapping is easy: marker is startFileName, prefix is prefix, max-keys is maxFileCount and delimiter is delimiter. The big difference is how B2 handles markers. The Amazon S3 nextmarker is literally the next marker to be searched, the B2 nextmarker is the last file name that was searched. This means the next listing will also include the last marker name again. This means your routines must parse out the name or your listing will show the next marker twice. That’s a difference, but not a difficult one.
  1. Uploading an object: Uploading an object in B2 is quite different than S3. S3 just requires you to send the object to an endpoint and they will automatically place the object somewhere in their environment. In the B2 world, you must request a location for the object with an API call and then send the object to the returned location. The first API will send you a temporary key and you can continue to use this key for one hour without generating another, with the caveat that you have to monitor for failures from B2. The B2 environment may become full or some other issue will require you to request another key.
  1. Downloading an Object: Downloading an object in B2 is really easy. There is a download endpoint that is returned during the authentication process and you pass your request to that endpoint. The object is downloaded just like Amazon S3.
  1. Multipart Upload: Finally, multipart upload. The beast in S3 is just as much of a beast in B2. Again the good news is there is a one to one mapping.
    1. Multipart Init: The equivalent initialization returns a fileid. This ID will be used for future calls.
    2. Mulitpart Upload: Similar to uploading an object, you will need to get the API location to place the part. So use the fileid from “a” above and call B2 for the endpoint to place the part. Another difference is the upload also requires the payload to be hashed with a SHA1 algorithm. Once done, simply pass the SHA and the part number to the URL and the part is uploaded. This SHA1 component is equivalent to an etag in the S3 world so save it for later.
    3. Multipart Complete: Like S3, you will have to build a return structure for each part. B2 of course requires this structure to be in JSON but like S3, B2 requires the part number and the SHA1 (etag) for each part.

What Doesn’t Port

We found almost everything we required easily mapped from S3 to B2 except for a few issues. To be fair, BackBlaze is working on the following in future versions.

  1. Copy Object doesn’t exist: This could cause some issues with applications for copying or renaming objects. BridgeSTOR has a workaround for this situation so it wasn’t a big deal for our application.
  2. Directory Objects don’t exist: Unlike Amazon, where an object with that ends with a “/” is considered a directory, this does not port to B2. There is an undocumented object name that B2 applications use called .bzEmpty. Numerous 3rd party applications, including BridgeSTOR, treat an object ending with .bzEmpty as a directory name. This is also important for directory listings described above. If you choose to use this method, you will be required to replace the “.bzEmpty” with a “/.”

In conclusion, you can see the B2 API is different than the Amazon S3, but as far as functionality they are basically the same. For us at first it looked like it was going to be a large task, but once we took the time to understand the differences, porting to B2 was not a major job for our application. We created a S3 to B2 shim in a week followed by a few extra weeks of testing and bug fixes. I hope this document helps in your S3 to B2 conversion.

— John Matze, BridgeSTOR

The post How to Leverage Your Amazon S3 Experience to Code the Backblaze B2 API appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Five Tips For Creating a Predictable Cloud Storage Budget

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/calculate-cost-cloud-storage/

Cloud Storage $$$, Transfer Rates $, Download Fees $$, Cute Piggy Bank $$$

Predicting your cloud storage cost should be easy. After all, there are only three cost dimensions: 1) storage (the rental for your slice of the cloud), 2) download (the fee to bring your data out of the cloud), and 3) transactions (charges for “stuff” you might do to your data inside the cloud). Yet, you probably know someone (you?) that was more than surprised when their cloud storage bill arrived. They have good company, as according to ZDNet, 37% of IT executives found their cloud storage costs to be unpredictable.

Here are five tips you can use when doing your due diligence on the cloud storage vendors you are considering. The goal is to create a cloud storage forecast that you can rely on each and every month.

Tip # 1 — Don’t Miscalculate Progressive (or is it Regressive?) Pricing Tiers

The words “Next” or “Over” on a pricing table are never a good thing.

Standard Storage Pricing Example

  • First 50 TB / Month $0.026 per GB
  • Next 450 TB / Month $0.025 per GB
  • Over 500 TB / Month $0.024 per GB

Those words mean there are tiers in the pricing table which, in this case, means you have to reach a specific level to get better pricing. You don’t get a retroactive discount — only the data above the minimum threshold enjoys the lower price.

The mistake sometimes made is calculating your entire storage cost based on the level for that amount of storage. For example, if you had 600 TB of storage, you could wrongly multiply as follows:

(600,000 x 0.024) = $14,400/month

When, in fact, you should do the following:

(50,000 x 0.026) + (450,000 x 0.025) + (100,000 x 0.024) = $15,150/month

That was just for storage. Make sure you consider the tiered pricing tables for data retrieval as well.

Tip # 2 — Don’t Choose the Wrong Service Level

Many cloud storage providers offer multiple levels of service. The idea is that you can trade service capabilities for cost. If you don’t need immediate access to your files or don’t want data replication or eleven 9s of durability, there is a choice for you. Besides giving away functionality, there’s a bigger problem. You have to know what you are going to do with your data to pick the right service because mistakes can get very expensive. For example:

  • You choose a low cost service tier that normally takes hours or days to restore your data. What can go wrong? You need some files back immediately and you end up paying 10-20 times the cost to expedite your restore.
  • You choose one level of service and decide you want to upload some data to a compute-based application or to another region — features not part of your current service. The good news? You can usually move the data. The bad news? You are charged a transfer fee to move the data within the same vendor’s infrastructure because you didn’t choose the right service tier when you started. These fees often eradicate any “savings” you had gotten from the lower priced tier.

Basically, if your needs change as they pertain to the data you have stored, you will pay more than you expect to get all that straightened out.

Tip # 3 — Don’t Pay for Deleted Files

Some cloud storage companies have a minimum amount of time you are charged for storage for each file uploaded. Typically this minimum period is between 30 and 90 days. You are charged even if you delete the file before the minimum period. For example (assuming a 90 day minimum period), if you upload a file today and delete the file tomorrow, you still have to pay for storing that deleted file for the next 88 days.

This “feature” often extends to files deleted due to versioning. Let’s say you want to keep three versions of each file, with older versions automatically deleted. If the now deleted versions were originally uploaded fewer than 90 days ago, you are charged for storing them for 90 days.

Using a typical backup scenario let’s say you are using a cloud storage service to store your files and your backup program is set to a 30 day retention. That means you will be perpetually paying for an additional 60 days worth of storage (for files that were pruned at 30 days). In other words, you would be paying for a 90 day retention period even though you only have 30 days worth of files.

Tip # 4 — Don’t Pay For Nothing

Some cloud storage vendors charge a minimum amount each month regardless of how little you have stored. For example, even if you only have 100 GB stored you get to pay like you have 1 TB (the minimum). This is the moral equivalent of a bank charging you a monthly fee if you don’t meet the minimum deposit amount.

Continuing on the theme of paying for nothing, be on the lookout for services that charge a minimum amount per each file stored regardless of how small the file is, including zero bytes. For example, some storage services have a minimum file size of 128K. Any files smaller than that are counted as being 128K for storage purposes. While the additional cost for even a couple of million zero-length files is trivial, you’re still being charged something for nothing.

Tip # 5 — Be Suspicious of the Fine Print

Misdirection is the art of getting you to focus on one thing so you don’t focus on other things going on. Practiced by magicians and some cloud storage companies, the idea is to get you to focus on certain features and capabilities without delving below the surface into the fine print.

Read the fine print and as you stroll through the multi-page pricing tables and linked pages of all of the rules that shape how you can use a given cloud storage service. Stop and ask, “what are they trying to hide?” If you find phrases like: “we reserve the right to limit your egress traffic,” or “new users gets free usage tier for 12 months,” or “provisioned requests should be used when you need a guarantee that your retrieval capacity will be available when you need it,” take heed.

How to Build a Predictable Cloud Storage Budget

As we noted previously, cloud storage costs are composed of three dimensions: storage, download and transactions. These are the cost drivers for cloud storage providers, and as such are the most straightforward way for service providers to pass on the cost of the service to its customers.

Let’s start with data storage as it is the easiest for a company to calculate. For a given month data storage cost is equal to:

Current data + new data – deleted data

Take that total and multiple by the monthly storage rate and you’ll get your monthly storage costs.

Computing download and transaction costs can be harder as these are variables you may have never calculated before, especially if you previously were using in-house or LTO-based storage. To help you out, below is a chart showing the breakdown of the revenue from Backblaze B2 Cloud Storage over the past 6 months.

% of Spend w/ B2

As you can see, download (2%) and transaction (3%) costs are, on average, minimal compared to storage costs. Unless you have reason to believe you are different, using these figures is a good proxy for your costs.

Let’s Give it a Try

Let’s start with 100 TB of original storage then add 10 TB each month and delete 5 TB each month. That’s 105 TB of storage for the first month. Backblaze has built a cloud storage calculator that computes costs for all of the major cloud storage providers. Using this calculator, we find that Amazon S3 would cost $2,205.50 to store this data for a month, while Backblaze B2 would charge just $525.10.

Using those numbers for storage and assuming that storage will be 95% of your total bill (as noted in the chart above), you get a total monthly cost of $2,321.05 for Amazon S3 and Backblaze B2 will be $552.74 a month.

The chart below provides the breakdown of the expected cost.

Backblaze B2Amazon S3
Storage$525.10$2,205.50
Download$11.06$42.22
Transactions$16.58$69.33
Totals:$552.74$2,321.05

Of course each month you will add and delete storage, so you’ll have to account for that in your forecast. Using the cloud storage calculator noted above, you can get a good sense of your total cost over the budget forecasting period.

Finally, you can use the Backblaze B2 storage calculator to address potential use cases that are outside of your normal operation. For example, you delete a large project from your storage or you need to download a large amount of data. Running the calculator for these types of actions lets you obtain a solid estimate for their effect on your budget before they happen and lets you plan accordingly.

Creating a predictable cloud storage forecast is key to taking full advantage of all of the value in cloud storage. Organizations like Austin City Limits, Fellowship Church, and Panna Cooking were able to move to the cloud because they could reliably predict their cloud storage cost with Backblaze B2. You don’t have to let pricing tiers, hidden costs and fine print stop you. Backblaze makes predicting your cloud storage costs easy.

The post Five Tips For Creating a Predictable Cloud Storage Budget appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q2 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q2-2018/

Backblaze Drive Stats Q2 2018

As of June 30, 2018 we had 100,254 spinning hard drives in Backblaze’s data centers. Of that number, there were 1,989 boot drives and 98,265 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take another look at comparing enterprise and consumer drives, get a first look at our 14 TB Toshiba drives, and introduce you to two new SMART stats. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q2 2018

Of the 98,265 hard drives we were monitoring at the end of Q2 2018, we removed from consideration those drives used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,184 hard drives. The table below covers just Q2 2018.

Backblaze Q2 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it just means that there were no drive failures of that model during Q2 2018.

The Annualized Failure Rate (AFR) for Q2 is just 1.08%, well below the Q1 2018 AFR and is our lowest quarterly AFR yet. That said, quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 81 drives (98,265 minus 98,184) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods.

Hard Drive Migrations Continue

The Q2 2018 Quarterly chart above was based on 98,184 hard drives. That was only 138 more hard drives than Q1 2018, which was based on 98,046 drives. Yet, we added nearly 40 PB of cloud storage during Q1. If we tried to store 40 PB on the 138 additional drives we added in Q2 then each new hard drive would have to store nearly 300 TB of data. While 300 TB hard drives would be awesome, the less awesome reality is that we replaced over 4,600 4 TB drives with nearly 4,800 12 TB drives.

The age of the 4 TB drives being replaced was between 3.5 and 4 years. In all cases their failure rates were 3% AFR (Annualized Failure Rate) or less, so why remove them? Simple, drive density — in this case three times the storage in the same cabinet space. Today, four years of service is the about the time where it makes financial sense to replace existing drives versus building out a new facility with new racks, etc. While there are several factors that go into the decision to migrate to higher density drives, keeping hard drives beyond that tipping point means we would be under utilizing valuable data center real estate.

Toshiba 14 TB drives and SMART Stats 23 and 24

In Q2 we added twenty 14 TB Toshiba hard drives (model: MG07ACA14TA) to our mix (not enough to be listed on our charts), but that will change as we have ordered an additional 1,200 drives to be deployed in Q3. These are 9-platter Helium filled drives which use their CMR/PRM (not SMR) recording technology.

In addition to being new drives for us, the Toshiba 14 TB drives also add two new SMART stat pairs: SMART 23 (Helium condition lower) and SMART 24 (Helium condition upper). Both attributes report normal and raw values, with the raw values currently being 0 and the normalized values being 100. As we learn more about these values, we’ll let you know. In the meantime, those of you who utilize our hard drive test data will need to update your data schema and upload scripts to read in the new attributes.

By the way, none of the 20 Toshiba 14 TB drives have failed after 3 weeks in service, but it is way too early to draw any conclusions.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models in operation as of June 30th, 2018. For each model, we compute its reliability starting from when it was first installed

Backblaze Lifetime Hard Drive Failure Rates

Notes and Observations

The combined AFR for all of the larger drives (8-, 10- and 12 TB) is only 1.02%. Many of these drives were deployed in the last year, so there is some volatility in the data, but we would expect this overall rate to decrease slightly over the next couple of years.

The overall failure rate for all hard drives in service is 1.80%. This is the lowest we have ever achieved, besting the previous low of 1.84% from Q1 2018.

Enterprise versus Consumer Hard Drives

In our Q3 2017 hard drive stats review, we compared two Seagate 8 TB hard drive models: one a consumer class drive (model: ST8000DM002) and the other an enterprise class drive (model: ST8000NM0055). Let’s compare the lifetime annualized failure rates from Q3 2017 and Q2 2018:

Lifetime AFR as of Q3 2017

    – 8 TB consumer drives: 1.1% annualized failure rate
    – 8 TB enterprise drives: 1.2% annualized failure rate

Lifetime AFR as of Q2 2018

    – 8 TB consumer drives: 1.03% annualized failure rate
    – 8 TB enterprise drives: 0.97% annualized failure rate

Hmmm, it looks like the enterprise drives are “winning.” But before we declare victory, let’s dig into a few details.

  1. Let’s start with drive days, the total number of days all the hard drives of a given model have been operational.- 8 TB consumer (model: ST8000DM002): 6,395,117 drive days
    – 8 TB enterprise (model: ST8000NM0055): 5,279,564 drive daysBoth models have a sufficient number of drive days and are reasonably close in their total number. No change to our conclusion do far.
  2. Next we’ll look at the confidence intervals for each model to see the range of possibilities within two deviations:- 8 TB consumer (model: ST8000DM002): Range 0.9% to 1.2%
    – 8 TB enterprise (model: ST8000NM0055): Range 0.8% to 1.1%The ranges are close, but multiple outcomes are possible. For example, the consumer drive could be as low as 0.9% and the enterprise drive could be as high as 1.1%. This doesn’t help or hurt our conclusion.
  3. Finally we’ll look at drive age — actually average drive age to be precise. This is the average time in operational service, in months, of all the drives of a given model. We’ll will start with the point in time when each drive reached approximately the current number of drives. That way the addition of new drives (not replacements) will have a minimal effect.
    Annualized Hard Drive Failure Rates by TimeWhen you constrain for drive count and average age, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much.

Whether every enterprise model is better than every corresponding consumer model is unknown, but below are a few reasons you might choose one class of drive over another:

EnterpriseConsumer
Longer Warranty: 5 vs. 2 yearsLower price: up to 50% less
More features, i.e. PowerChoice technologySimilar annualized failure rate as enterprise drives
Faster reads and writesUses less power

Backblaze is known to be “thrifty” when purchasing drives. When you purchase 100 drives at a time or are faced with a drive crisis, it makes sense to purchase consumer drives. When you starting purchasing 100 petabytes’ worth of hard drives at a time, the price gap between enterprise and consumer drives shrinks to the point where the other factors come into play.

Hard Drives By the Numbers

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are over 100 million entries. The complete data set used to create the information presented in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting in the comments below or by contacting us directly.

The post Hard Drive Stats for Q2 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/panna-cooking-creates-perfect-storage-recipe/

Panna Cooking custard dessert with strawberries

Panna Cooking is the smart home cook’s go-to resource for learning to cook, video recipes, and shopping lists, all delivered by 40+ of the world’s best chefs. Video is the primary method Panna uses to communicate with their customers. Joshua Stenseth is a full-time editor and part-time IT consultant in charge of wrangling the video content Panna creates.

Like many organizations, digital media archive storage wasn’t top of mind over the years at Panna Cooking. Over time, more and more media projects were archived to various external hard drives and dutifully stored in the archive closet. The external drive archive solution was inexpensive and was easy to administer for Joshua, until…

Joshua stared at the request from the chef. She wanted to update a recipe video from a year ago to include in her next weekly video. The edits being requested were the easy part as the new footage was ready to go. The trouble was locating the old digital video files. Over time, Panna had built up a digital video archive that resided on over 100 external hard drives scattered around the office. The digital files that Joshua needed were on one of those drives.

Panna Cooking, like many growing organizations, learned that the easy-to-do tasks, like using external hard drives for data archiving, don’t scale very well. This is especially true for media content such as video and photographs. It is easy to get overwhelmed.

Panna Cooking dessert

Joshua was given the task of ensuring all the content created by Panna was economically stored, readily available, and secured off site. The Panna Cooking case study details how Joshua was able to consolidate their existing scattered archive by employing the Hybrid Cloud Storage package from 45 Drives, a flexible, highly affordable, media archiving solution consisting of a 45 Drives Storinator storage server, Rclone, Duplicity, and B2 Cloud Storage.

The 45 Drives’ innovative Hybrid Cloud Storage package and their partnership with Backblaze B2 Cloud Storage was the perfect solution. The Hybrid Cloud Storage package installs on the Storinator system and utilizes Rclone or Duplicity to back up or sync files to the B2 cloud. This gave Panna a fully operational local storage system that sends changes automatically to the B2 cloud. For Joshua and his fellow editors, the Storinator/Backblaze B2 solution gave them the best of both worlds, high performance local storage and readily accessible, affordable off-site cloud storage all while eliminating their old archive: the closet full of external hard drives.

The post Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Computer Backup Awareness in 2018: Getting Better and Getting Worse

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/computer-backup-awareness-in-2018/

Backup Frequency - 10 Years of History

Back in June 2008, Backblaze launched our first Backup Awareness Survey. Beginning with that survey and each year since, we’ve asked the folks at The Harris Poll to conduct our annual survey. For the last 11 years now, they’ve asked the simple question, “How often do you backup all the data on your computer?” Let’s see what they’ve found.

First, a Little History

While we did the first survey in 2008, it wasn’t until 2009, after the second survey was conducted, that we declared June as Backup Awareness Month, making June 2018 the 10th anniversary of Backup Awareness Month. But, why June? You’re probably thinking that June is a good time to remind people about backing up their computers. It’s before summer vacations in the northern hemisphere and the onset of winter down under. In truth, back in 2008 Backblaze was barely a year old and the survey, while interesting, got pushed aside as we launched the first beta of our cloud backup product on June 4, 2008. When June 2009 rolled around, we had a little more time and two years worth of data. Thus, Backup Awareness Month was born (PS — the contest is over).

More People Are Backing Up, But…

Fast forward to June 2018, and the folks at The Harris Poll have diligently delivered another survey. You can see the details about the survey methodology at the end of this post. Here’s a high level look at the results over the last 11 years.
Computer Backup Frequency

The percentage of people backing up all the data on their computer has steadily increased over the years, from 62% in 2008 to 76% in 2018. That’s awesome, but at the other end of the time spectrum it’s not so pretty. The percentage of people backing up once a day or more is 5.5% in 2018. That’s the lowest percentage ever reported for daily backup. Wouldn’t it be nice if there were a program you could install on your computer that would back up all the data automatically?

Here’s how 2018 compares to 2008 for how often people back up all the data on their computers.

Computer Data Backup Frequency in 2008
Computer Data Backup Frequency in 2018

A lot has happened over the last 11 years in the world of computing, but at least people are taking backing up their computers a little more seriously. And that’s a good thing.

A Few Data Backup Facts

Each survey provides interesting insights into the attributes of backup fiends and backup slackers. Here are a few facts from the 2018 survey.

Men

  • 21% of American males have never backed up all the data on their computers.
  • 11% of American males, 18-34 years old, have never backed up all the data on their computers.
  • 33% of American males, 65 years and older, have never backed up all the data on their computers.

Women

  • 26% of American females have never backed up all the data on their computers.
  • 22% of American females, 18-34 years old, have never backed up all the data on their computers.
  • 36% of American females, 65 years and older, have never backed up all the data on their computers.

When we look at the four regions in the United States, we see that in 2018 the percentage of people who have backed up all the data on their computer at least once was about the same across regions. This was not the case back in 2012 as seen below:

YearNortheastSouthMidwestWest
201267%73%65%77%
201875%78%75%76%

 

Looking Back

Here are links to our previous blog posts on our annual Backup Awareness Survey:

Survey Method:

The surveys cited in this post were conducted online within the United States by The Harris Poll on behalf of Backblaze as follows: June 5-7, 2018 among 2,035 U.S. adults, among whom 1,871 own a computer. May 19-23, 2017 among 2048 U.S. adults, May 13-17, 2016 among 2,012 U.S. adults, May 15-19, 2015 among 2,090 U.S. adults, June 2-4, 2014 among 2,037 U.S. adults, June 13–17, 2013 among 2,021 U.S. adults, May 31–June 4, 2012 among 2,209 U.S. adults, June 28–30, 2011 among 2,257 U.S. adults, June 3–7, 2010 among 2,071 U.S. adults, May 13–14, 2009 among 2,185 U.S. adults, and May 27–29, 2008 among 2,761 U.S. adults. In all surveys, respondents consisted of U.S. adult computer users (aged 18+). These online surveys were not based on a probability sample and therefore no estimate of theoretical sampling error can be calculated. For complete survey methodology, including weighting variables and subgroup sample sizes, please contact Backblaze.

The 2018 Survey: Please note sample composition changed in the 2018 wave as new sample sources were introduced to ensure representativeness among all facets of the general population.

The post Computer Backup Awareness in 2018: Getting Better and Getting Worse appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/firmware-for-data-center-optimization/

Data Center with Backblaze Pod

How and Why Advanced Devices Go Through Evolution in the Field

By Jason Feist, Seagate Senior Director for Technology Strategy and Product Planning

One of the most powerful features in today’s hard drives is the ability to update the firmware of deployed hard drives. Firmware changes can be straight-forward, such as changing a power setting, or as delicate as adjusting the height a read/write head flies above a spinning platter. By combining customer inputs, drive statistics, and a myriad of engineering talents, Seagate can use firmware updates to optimize the customer experience for the workload at hand.

In today’s guest post we are pleased to have Jason Feist, Senior Director for Technology Strategy and Product Planning at Seagate, describe how the Seagate ecosystem works.

 —Andy Klein

Storage Devices for the Data Center: Both Design-to-Application and In-Field Design Updates Are Important

As data center managers bring new IT architectures online, and as various installed components mature, technology device makers release firmware updates to enhance device operation, add features, and improve interoperability. The same is true for hard drives.

Hardware design takes years; firmware design can unlock the ability for that same hardware platform to persist in the field at the best cost structure if updates are deployed for continuous improvement over the product life cycle. In close and constant consultation with data center customers, hard drive engineers release firmware updates to ensure products provide the best experience in the field. Having the latest firmware is critical to ensure optimal drive operation and data center reliability. Likewise, as applications evolve, performance and features can mature over time to more effectively solve customer needs.

Data Center Managers Must Understand the Evolution of Data Center Needs, Architectures, and Solutions

Scientists and engineers at advanced technology companies like Seagate develop solutions based on understanding customers’ applications up front. But the job doesn’t end there; we also continue to assess and tweak devices in the field to fit very specific and evolving customer needs.

Likewise, the data center manager or IT architect must understand many technical considerations when installing new hardware. Integrating storage devices into a data center is never a matter of choosing any random hard drive or SSD that features a certain capacity or a certain IOPS specification. The data center manager must know the ins and outs of each storage device, and how it impacts myriad factors like performance, power, heat, and device interoperability.

But after rolling out new hardware, the job is not done. In fact, the job’s never done. Data center devices continue to evolve, even after integration. The hardware built for data centers is designed to be updated on a regular basis, based on a continuous cycle of feedback from ever-evolving applications and implementations.

As continued in-field quality assurance activities and updates in the field maintain the device’s appropriate interaction with the data center’s evolving needs, a device will continue to improve in terms of interoperability and performance until the architecture and the device together reach maturity. Managing these evolving needs and technology updates is a critical factor in achieving the expected best TCO (total cost of ownership) for the data center.

It’s important for data center managers to work closely with device makers to ensure integration is planned and executed correctly, monitoring and feedback is continuous, and updates are developed and deployed. In recent years as cloud and hyperscale data centers have evolved, Seagate has worked hard to develop a powerful support ecosystem for these partners.

The Team of Engineers Behind Storage Integration

The key to creating a successful program is to establish an application engineering and technical customer management team that’s engaged with the customer. Our engineering team meets with large data center customers on an ongoing basis. We work together from the pre-development phase to the time we qualify a new storage device. We collaborate to support in-field system monitoring, and sustaining activities like analyzing the logs on the hard drives, consulting about solutions within the data center, and ensuring the correct firmware updates are in place on the storage devices.

The science and engineering specialties on the team are extensive and varied. Depending on the topics at each meeting, analysis and discussion requires a breadth of engineering expertise. Dozens of engineering degrees and years of experience are on hand, including engineers expert in firmware, servo control systems, mechanical, tribology, electrical, reliability, and manufacturing; The titles of experts contributing include computer engineers, aerospace engineers, test engineers, statisticians, data analysts, and material scientists. Within each discipline are unique specializations such as ASIC engineers, channel technology engineers, and mechanical resonance engineers who understand shock and vibration factors.

The skills each engineer brings are necessary to understand the data customers are collecting and analyzing, how to deploy new products and technologies, and when to develop changes that’ll improve the data center’s architecture. It takes this team of engineering talent to comprehend the intricate interplay of devices, code, and processes needed to keep the architecture humming in harmony from the customer’s point of view.

How a Device Maker Works With a Data Center to Integrate and Sustain Performance and Reliability

After we establish our working team with a customer and when we’re introducing a new product for integration into a customer data center, we meet weekly to go over qualification status. We do a full design review on new features, consider the differences from the previous to the new design and how to address particular asks they may have for our next product design.

Traditionally, storage component designers would simply comply with whatever the T10 or T13 interface specification says. These days, many of the cloud data centers are asking for their own special sauce in some form, whether they’re trying to get a certain number of IOPS per terabyte, or trying to match their latency down to a certain number — for example, “I want to achieve four or five 9’s at this latency number; I want to be able to stream data at this rate; I want to have this power consumption.”

Recently, working with a customer to solve a specific need they had, we deployed Flex dynamic recording technology, which enables a single hard drive to use both SMR (Shingled Magnetic Recording) and CMR (Conventional Magnetic Recording, for example Perpendicular Recording) methods on the same drive media. This required a very high-level team integration with the customer. We spent great effort going back and forth on what an interface design should be, what a command protocol should be, and what the behavior should be in certain conditions.

Sometimes a drive design is unique to one customer, and sometimes it’s good for all our customers. There’s always a tradeoff; if you want really high performance, you’re probably going to pay for it with power. But when a customer asks for a certain special sauce, that drives us to figure out how to achieve that in balance with other needs. Then — similarly to when an automaker like Chevy or Honda builds race car engines and learns how to achieve new efficiency and performance levels — we can apply those new features to a broader customer set, and ultimately other customers will benefit too.

What Happens When Adjustments Are Needed in the Field?

Once a new product is integrated, we then continue to work closely from a sustaining standpoint. Our engineers interface directly with the customer’s team in the field, often in weekly meetings and sometimes even more frequently. We provide a full rundown on the device’s overall operation, dealing with maintenance and sustaining issues. For any error that comes up in the logs, we bring in an expert specific to that error to pore over the details.

In any given week we’ll have a couple of engineers in the customer’s data center monitoring new features and as needed debugging drive issues or issues with the customer’s system. Any time something seems amiss, we’ve got plans in place that let us do log analysis remotely and in the field.

Let’s take the example of a drive not performing as the customer intended. There are a number of reliability features in our drives that may interact with drive response — perhaps adding latency on the order of tens of milliseconds. We work with the customer on how we can manage those features more effectively. We help analyze the drive’s logs to tell them what’s going on and weigh the options. Is the latency a result of an important operation they can’t do without, and the drive won’t survive if we don’t allow that operation? Or is it something that we can defer or remove, prioritizing the workload goal?

How Storage Architecture and Design Has Changed for Cloud and Hyperscale

The way we work with cloud and data center partners has evolved over the years. Back when IT managers would outfit business data centers with turn-key systems, we were very familiar with the design requirements for traditional OEM systems with transaction-based workloads, RAID rebuild, and things of that nature. Generally, we were simply testing workloads that our customers ran against our drives.

As IT architects in the cloud space moved toward designing their data centers made-to-order, on open standards, they had a different notion of reliability and doing replication or erasure coding to create a more reliable environment. Understanding these workloads, gathering these traces and getting this information back from these customers was important so we could optimize drive performance under new and different design strategies: not just for performance, but for power consumption also. The number of drives populating large data centers is mind boggling, and when you realize what the power consumption is, you realize how important it is to optimize the drive for that particular variable.

Turning Information Into Improvements

We have always executed a highly standardized set of protocols on drives in our lab qualification environment, using racks that are well understood. In these scenarios, the behavior of the drive is well understood. By working directly with our cloud and data center partners we’re constantly learning from their unique environments.

For example, the customer’s architecture may have big fans in the back to help control temperature, and the fans operate with variable levels of cooling: as things warm up, the fans spin faster. At one point we may discover these fan operations are affecting the performance of the hard drive in the servo subsystem. Some of the drive logging our engineers do has been brilliant at solving issues like that. For example, we’d look at our position error signal, and we could actually tell how fast the fan was spinning based on the adjustments the drive was making to compensate for the acoustic noise generated by the fans.

Information like this is provided to our servo engineering team when they’re developing new products or firmware so they can make loop adjustments in our servo controllers to accommodate the range of frequencies we’re seeing from fans in the field. Rather than having the environment throw the drive’s heads off track, our team can provide compensation to keep the heads on track and let the drives perform reliably in environments like that. We can recreate the environmental conditions and measurements in our shop to validate we can control it as expected, and our future products inherit these benefits as we go forward.

In another example, we can monitor and work to improve data throughput while also maintaining reliability by understanding how the data center environment is affecting the read/write head’s ability to fly with stability at a certain height above the disk platter while reading bits. Understanding the ambient humidity and the temperature is essential to controlling the head’s fly height. We now have an active fly-height control system with the controller-firmware system and servo systems operating based on inputs from sensors within the drive. Traditionally a hard drive’s fly-height control was calibrated in the factory — a set-and-forget kind of thing. But with this field adjustable fly-height capability, the drive is continually monitoring environmental data. When the environment exceeds certain thresholds, the drive will recalculate what that fly height should be, so it’s optimally flying and getting the best air rates, ideally providing the best reliability in the field.

The Benefits of in-Field Analysis

These days a lot of information can be captured in logs and gathered from a drive to be brought back to our lab to inform design changes. You’re probably familiar with SMART logs that have been traditional in drives; this data provides a static snapshot in time of the status of a drive. In addition, field analysis reliability logs measure environmental factors the drive is experiencing like vibration, shock, and temperature. We can use this information to consider how the drive is responding and how firmware updates might deal with these factors more efficiently. For example, we might use that data to understand how a customer’s data center architecture might need to change a little bit to enable better performance, or reduce heat or power consumption, or lower vibrations.

What Does This Mean for the Data Center Manager?

There’s a wealth of information we can derive from the field, including field log data, customers’ direct feedback, and what our failure analysis teams have learned from returned drives. By actively participating in the process, our data center partners maximize the benefit of everything we’ve jointly learned about their environment so they can apply the latest firmware updates with confidence.

Updating firmware is an important part of fleet management that many data center operators struggle with. Some data centers may continue to run firmware even when an update is available, because they don’t have clear policies for managing firmware. Or they may avoid updates because they’re unsure if an update is right for their drive or their situation.

Would You Upgrade a Live Data Center?

Nobody wants their team to be responsible for allowing a server go down due to a firmware issue. How will the team know when new firmware is available, and whether it applies to specific components in the installed configuration? One method is for IT architects to set up a regular quarterly schedule to review possible firmware updates of all data center components. At the least, devising a review and upgrade schedule requires maintaining a regular inventory of all critical equipment, and setting up alerts or pull-push communications with each device maker so the team can review the latest release notes and schedule time to install updates as appropriate.

Firmware sent to the field for the purpose of updating in-service drives undergoes the same rigorous testing that the initial code goes through. In addition, the payload is verified to be compatible with the code and drive model that’s being updated. That means you can’t accidentally download firmware that isn’t for the intended drive. There are internal consistency checks to reject invalid code. Also, to help minimize performance impacts, firmware downloads support segmented download; the firmware can be downloaded in small pieces (the user can choose the size) so they can be interleaved with normal system work and have a minimal impact on performance. The host can decide when to activate the new code once the download is completed.

In closing, working closely with data center managers and architects to glean information from the field is important because it helps bring Seagate’s engineering team closer to our customers. This is the most powerful piece of the equation. Seagate needs to know what our customers are experiencing because it may be new and different for us, too. We intend these tools and processes to help both data center architecture and hard drive science continue to evolve.

The post An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Practical Effects of GDPR at Backblaze

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/the-practical-effects-of-gdpr-at-backblaze/


GDPR day, May 25, 2018, is nearly here. On that day, will your inbox explode with update notices, opt-in agreements, and offers from lawyers searching for GDPR violators? Perhaps all the companies on earth that are not GDPR ready will just dissolve into dust. More likely, there will be some changes, but business as usual will continue and we’ll all be more aware of data privacy. Let’s go with the last one.

What’s Different With GDPR at Backblaze

The biggest difference you’ll notice is a completely updated Privacy Policy. Last week we sent out a service email announcing the new Privacy Policy. Some people asked what was different. Basically everything. About 95% of the agreement was rewritten. In the agreement, we added in the appropriate provisions required by GDPR, and hopefully did a better job specifying the data we collect from you, why we collect it, and what we are going to do with it.

As a reminder, at Backblaze your data falls into two catagories. The first type of data is the data you store with us — stored data. These are the files and objects you upload and store, and as needed, restore. We do not share this data. We do not process this data, except as requested by you to store and restore the data. We do not analyze this data looking for keywords, tags, images, etc. No one outside of Backblaze has access to this data unless you explicitly shared the data by providing that person access to one or more files.

The second type of data is your account data. Some of your account data is considered personal data. This is the information we collect from you to provide our Personal Backup, Business Backup and B2 Cloud Storage services. Examples include your email address to provide access to your account, or the name of your computer so we can organize your files like they are arranged on your computer to make restoration easier. We have written a number of Help Articles covering the different ways this information is collected and processed. In addition, these help articles outline the various “rights” granted via GDPR. We will continue to add help articles over the coming weeks to assist in making it easy to work with us to understand and exercise your rights.

What’s New With GDPR at Backblaze

The most obvious addition is the Data Processing Addendum (DPA). This covers how we protect the data you store with us, i.e. stored data. As noted above, we don’t do anything with your data, except store it and keep it safe until you need it. Now we have a separate document saying that.

It is important to note the new Data Processing Addendum is now incorporated by reference into our Terms of Service, which everyone agrees to when they sign up for any of our services. Now all of our customers have a shiny new Data Processing Agreement to go along with the updated Privacy Policy. We promise they are not long or complicated, and we encourage you to read them. If you have any questions, stop by our GDPR help section on our website.

Patience, Please

Every company we have dealt with over the last few months is working hard to comply with GDPR. It has been a tough road whether you tried to do it yourself or like Backblaze, hired an EU-based law firm for advice. Over the coming weeks and months as you reach out to discover and assert your rights, please have a little patience. We are all going through a steep learning curve as GDPR gets put into practice. Along the way there are certain to be some growing pains — give us a chance, we all want to get it right.

Regardless, at Backblaze we’ve been diligently protecting our customers’ data for over 11 years and nothing that will happen on May 25th will change that.

The post The Practical Effects of GDPR at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Helium Factor and Hard Drive Failure Rates

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/helium-filled-hard-drive-failure-rates/

Seagate Enterprise Capacity 3.5 Helium HDD

In November 2013, the first commercially available helium-filled hard drive was introduced by HGST, a Western Digital subsidiary. The 6 TB drive was not only unique in being helium-filled, it was for the moment, the highest capacity hard drive available. Fast forward a little over 4 years later and 12 TB helium-filled drives are readily available, 14 TB drives can be found, and 16 TB helium-filled drives are arriving soon.

Backblaze has been purchasing and deploying helium-filled hard drives over the past year and we thought it was time to start looking at their failure rates compared to traditional air-filled drives. This post will provide an overview, then we’ll continue the comparison on a regular basis over the coming months.

The Promise and Challenge of Helium Filled Drives

We all know that helium is lighter than air — that’s why helium-filled balloons float. Inside of an air-filled hard drive there are rapidly spinning disk platters that rotate at a given speed, 7200 rpm for example. The air inside adds an appreciable amount of drag on the platters that in turn requires an appreciable amount of additional energy to spin the platters. Replacing the air inside of a hard drive with helium reduces the amount of drag, thereby reducing the amount of energy needed to spin the platters, typically by 20%.

We also know that after a few days, a helium-filled balloon sinks to the ground. This was one of the key challenges in using helium inside of a hard drive: helium escapes from most containers, even if they are well sealed. It took years for hard drive manufacturers to create containers that could contain helium while still functioning as a hard drive. This container innovation allows helium-filled drives to function at spec over the course of their lifetime.

Checking for Leaks

Three years ago, we identified SMART 22 as the attribute assigned to recording the status of helium inside of a hard drive. We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute. It appears the normalized and raw values for SMART 22 currently report the same value, which starts at 100 and goes down.

To date only one HGST drive has reported a value of less than 100, with multiple readings between 94 and 99. That drive continues to perform fine, with no other errors or any correlating changes in temperature, so we are not sure whether the change in value is trying to tell us something or if it is just a wonky sensor.

Helium versus Air-Filled Hard Drives

There are several different ways to compare these two types of drives. Below we decided to use just our 8, 10, and 12 TB drives in the comparison. We did this since we have helium-filled drives in those sizes. We left out of the comparison all of the drives that are 6 TB and smaller as none of the drive models we use are helium-filled. We are open to trying different comparisons. This just seemed to be the best place to start.

Lifetime Hard Drive Failure Rates: Helium vs. Air-Filled Hard Drives table

The most obvious observation is that there seems to be little difference in the Annualized Failure Rate (AFR) based on whether they contain helium or air. One conclusion, given this evidence, is that helium doesn’t affect the AFR of hard drives versus air-filled drives. My prediction is that the helium drives will eventually prove to have a lower AFR. Why? Drive Days.

Let’s go back in time to Q1 2017 when the air-filled drives listed in the table above had a similar number of Drive Days to the current number of Drive Days for the helium drives. We find that the failure rate for the air-filled drives at the time (Q1 2017) was 1.61%. In other words, when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.

Helium or Air?

My hypothesis is that after normalizing the data so that the helium and air-filled drives have the same (or similar) usage (Drive Days), the helium-filled drives we use will continue to have a lower Annualized Failure Rate versus the air-filled drives we use. I expect this trend to continue for the next year at least. What side do you come down on? Will the Annualized Failure Rate for helium-filled drives be better than air-filled drives or vice-versa? Or do you think the two technologies will be eventually produce the same AFR over time? Pick a side and we’ll document the results over the next year and see where the data takes us.

The post The Helium Factor and Hard Drive Failure Rates appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.