Tag Archives: Seagate

An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/firmware-for-data-center-optimization/

Data Center with Backblaze Pod

How and Why Advanced Devices Go Through Evolution in the Field

By Jason Feist, Seagate Senior Director for Technology Strategy and Product Planning

One of the most powerful features in today’s hard drives is the ability to update the firmware of deployed hard drives. Firmware changes can be straight-forward, such as changing a power setting, or as delicate as adjusting the height a read/write head flies above a spinning platter. By combining customer inputs, drive statistics, and a myriad of engineering talents, Seagate can use firmware updates to optimize the customer experience for the workload at hand.

In today’s guest post we are pleased to have Jason Feist, Senior Director for Technology Strategy and Product Planning at Seagate, describe how the Seagate ecosystem works.

 —Andy Klein

Storage Devices for the Data Center: Both Design-to-Application and In-Field Design Updates Are Important

As data center managers bring new IT architectures online, and as various installed components mature, technology device makers release firmware updates to enhance device operation, add features, and improve interoperability. The same is true for hard drives.

Hardware design takes years; firmware design can unlock the ability for that same hardware platform to persist in the field at the best cost structure if updates are deployed for continuous improvement over the product life cycle. In close and constant consultation with data center customers, hard drive engineers release firmware updates to ensure products provide the best experience in the field. Having the latest firmware is critical to ensure optimal drive operation and data center reliability. Likewise, as applications evolve, performance and features can mature over time to more effectively solve customer needs.

Data Center Managers Must Understand the Evolution of Data Center Needs, Architectures, and Solutions

Scientists and engineers at advanced technology companies like Seagate develop solutions based on understanding customers’ applications up front. But the job doesn’t end there; we also continue to assess and tweak devices in the field to fit very specific and evolving customer needs.

Likewise, the data center manager or IT architect must understand many technical considerations when installing new hardware. Integrating storage devices into a data center is never a matter of choosing any random hard drive or SSD that features a certain capacity or a certain IOPS specification. The data center manager must know the ins and outs of each storage device, and how it impacts myriad factors like performance, power, heat, and device interoperability.

But after rolling out new hardware, the job is not done. In fact, the job’s never done. Data center devices continue to evolve, even after integration. The hardware built for data centers is designed to be updated on a regular basis, based on a continuous cycle of feedback from ever-evolving applications and implementations.

As continued in-field quality assurance activities and updates in the field maintain the device’s appropriate interaction with the data center’s evolving needs, a device will continue to improve in terms of interoperability and performance until the architecture and the device together reach maturity. Managing these evolving needs and technology updates is a critical factor in achieving the expected best TCO (total cost of ownership) for the data center.

It’s important for data center managers to work closely with device makers to ensure integration is planned and executed correctly, monitoring and feedback is continuous, and updates are developed and deployed. In recent years as cloud and hyperscale data centers have evolved, Seagate has worked hard to develop a powerful support ecosystem for these partners.

The Team of Engineers Behind Storage Integration

The key to creating a successful program is to establish an application engineering and technical customer management team that’s engaged with the customer. Our engineering team meets with large data center customers on an ongoing basis. We work together from the pre-development phase to the time we qualify a new storage device. We collaborate to support in-field system monitoring, and sustaining activities like analyzing the logs on the hard drives, consulting about solutions within the data center, and ensuring the correct firmware updates are in place on the storage devices.

The science and engineering specialties on the team are extensive and varied. Depending on the topics at each meeting, analysis and discussion requires a breadth of engineering expertise. Dozens of engineering degrees and years of experience are on hand, including engineers expert in firmware, servo control systems, mechanical, tribology, electrical, reliability, and manufacturing; The titles of experts contributing include computer engineers, aerospace engineers, test engineers, statisticians, data analysts, and material scientists. Within each discipline are unique specializations such as ASIC engineers, channel technology engineers, and mechanical resonance engineers who understand shock and vibration factors.

The skills each engineer brings are necessary to understand the data customers are collecting and analyzing, how to deploy new products and technologies, and when to develop changes that’ll improve the data center’s architecture. It takes this team of engineering talent to comprehend the intricate interplay of devices, code, and processes needed to keep the architecture humming in harmony from the customer’s point of view.

How a Device Maker Works With a Data Center to Integrate and Sustain Performance and Reliability

After we establish our working team with a customer and when we’re introducing a new product for integration into a customer data center, we meet weekly to go over qualification status. We do a full design review on new features, consider the differences from the previous to the new design and how to address particular asks they may have for our next product design.

Traditionally, storage component designers would simply comply with whatever the T10 or T13 interface specification says. These days, many of the cloud data centers are asking for their own special sauce in some form, whether they’re trying to get a certain number of IOPS per terabyte, or trying to match their latency down to a certain number — for example, “I want to achieve four or five 9’s at this latency number; I want to be able to stream data at this rate; I want to have this power consumption.”

Recently, working with a customer to solve a specific need they had, we deployed Flex dynamic recording technology, which enables a single hard drive to use both SMR (Shingled Magnetic Recording) and CMR (Conventional Magnetic Recording, for example Perpendicular Recording) methods on the same drive media. This required a very high-level team integration with the customer. We spent great effort going back and forth on what an interface design should be, what a command protocol should be, and what the behavior should be in certain conditions.

Sometimes a drive design is unique to one customer, and sometimes it’s good for all our customers. There’s always a tradeoff; if you want really high performance, you’re probably going to pay for it with power. But when a customer asks for a certain special sauce, that drives us to figure out how to achieve that in balance with other needs. Then — similarly to when an automaker like Chevy or Honda builds race car engines and learns how to achieve new efficiency and performance levels — we can apply those new features to a broader customer set, and ultimately other customers will benefit too.

What Happens When Adjustments Are Needed in the Field?

Once a new product is integrated, we then continue to work closely from a sustaining standpoint. Our engineers interface directly with the customer’s team in the field, often in weekly meetings and sometimes even more frequently. We provide a full rundown on the device’s overall operation, dealing with maintenance and sustaining issues. For any error that comes up in the logs, we bring in an expert specific to that error to pore over the details.

In any given week we’ll have a couple of engineers in the customer’s data center monitoring new features and as needed debugging drive issues or issues with the customer’s system. Any time something seems amiss, we’ve got plans in place that let us do log analysis remotely and in the field.

Let’s take the example of a drive not performing as the customer intended. There are a number of reliability features in our drives that may interact with drive response — perhaps adding latency on the order of tens of milliseconds. We work with the customer on how we can manage those features more effectively. We help analyze the drive’s logs to tell them what’s going on and weigh the options. Is the latency a result of an important operation they can’t do without, and the drive won’t survive if we don’t allow that operation? Or is it something that we can defer or remove, prioritizing the workload goal?

How Storage Architecture and Design Has Changed for Cloud and Hyperscale

The way we work with cloud and data center partners has evolved over the years. Back when IT managers would outfit business data centers with turn-key systems, we were very familiar with the design requirements for traditional OEM systems with transaction-based workloads, RAID rebuild, and things of that nature. Generally, we were simply testing workloads that our customers ran against our drives.

As IT architects in the cloud space moved toward designing their data centers made-to-order, on open standards, they had a different notion of reliability and doing replication or erasure coding to create a more reliable environment. Understanding these workloads, gathering these traces and getting this information back from these customers was important so we could optimize drive performance under new and different design strategies: not just for performance, but for power consumption also. The number of drives populating large data centers is mind boggling, and when you realize what the power consumption is, you realize how important it is to optimize the drive for that particular variable.

Turning Information Into Improvements

We have always executed a highly standardized set of protocols on drives in our lab qualification environment, using racks that are well understood. In these scenarios, the behavior of the drive is well understood. By working directly with our cloud and data center partners we’re constantly learning from their unique environments.

For example, the customer’s architecture may have big fans in the back to help control temperature, and the fans operate with variable levels of cooling: as things warm up, the fans spin faster. At one point we may discover these fan operations are affecting the performance of the hard drive in the servo subsystem. Some of the drive logging our engineers do has been brilliant at solving issues like that. For example, we’d look at our position error signal, and we could actually tell how fast the fan was spinning based on the adjustments the drive was making to compensate for the acoustic noise generated by the fans.

Information like this is provided to our servo engineering team when they’re developing new products or firmware so they can make loop adjustments in our servo controllers to accommodate the range of frequencies we’re seeing from fans in the field. Rather than having the environment throw the drive’s heads off track, our team can provide compensation to keep the heads on track and let the drives perform reliably in environments like that. We can recreate the environmental conditions and measurements in our shop to validate we can control it as expected, and our future products inherit these benefits as we go forward.

In another example, we can monitor and work to improve data throughput while also maintaining reliability by understanding how the data center environment is affecting the read/write head’s ability to fly with stability at a certain height above the disk platter while reading bits. Understanding the ambient humidity and the temperature is essential to controlling the head’s fly height. We now have an active fly-height control system with the controller-firmware system and servo systems operating based on inputs from sensors within the drive. Traditionally a hard drive’s fly-height control was calibrated in the factory — a set-and-forget kind of thing. But with this field adjustable fly-height capability, the drive is continually monitoring environmental data. When the environment exceeds certain thresholds, the drive will recalculate what that fly height should be, so it’s optimally flying and getting the best air rates, ideally providing the best reliability in the field.

The Benefits of in-Field Analysis

These days a lot of information can be captured in logs and gathered from a drive to be brought back to our lab to inform design changes. You’re probably familiar with SMART logs that have been traditional in drives; this data provides a static snapshot in time of the status of a drive. In addition, field analysis reliability logs measure environmental factors the drive is experiencing like vibration, shock, and temperature. We can use this information to consider how the drive is responding and how firmware updates might deal with these factors more efficiently. For example, we might use that data to understand how a customer’s data center architecture might need to change a little bit to enable better performance, or reduce heat or power consumption, or lower vibrations.

What Does This Mean for the Data Center Manager?

There’s a wealth of information we can derive from the field, including field log data, customers’ direct feedback, and what our failure analysis teams have learned from returned drives. By actively participating in the process, our data center partners maximize the benefit of everything we’ve jointly learned about their environment so they can apply the latest firmware updates with confidence.

Updating firmware is an important part of fleet management that many data center operators struggle with. Some data centers may continue to run firmware even when an update is available, because they don’t have clear policies for managing firmware. Or they may avoid updates because they’re unsure if an update is right for their drive or their situation.

Would You Upgrade a Live Data Center?

Nobody wants their team to be responsible for allowing a server go down due to a firmware issue. How will the team know when new firmware is available, and whether it applies to specific components in the installed configuration? One method is for IT architects to set up a regular quarterly schedule to review possible firmware updates of all data center components. At the least, devising a review and upgrade schedule requires maintaining a regular inventory of all critical equipment, and setting up alerts or pull-push communications with each device maker so the team can review the latest release notes and schedule time to install updates as appropriate.

Firmware sent to the field for the purpose of updating in-service drives undergoes the same rigorous testing that the initial code goes through. In addition, the payload is verified to be compatible with the code and drive model that’s being updated. That means you can’t accidentally download firmware that isn’t for the intended drive. There are internal consistency checks to reject invalid code. Also, to help minimize performance impacts, firmware downloads support segmented download; the firmware can be downloaded in small pieces (the user can choose the size) so they can be interleaved with normal system work and have a minimal impact on performance. The host can decide when to activate the new code once the download is completed.

In closing, working closely with data center managers and architects to glean information from the field is important because it helps bring Seagate’s engineering team closer to our customers. This is the most powerful piece of the equation. Seagate needs to know what our customers are experiencing because it may be new and different for us, too. We intend these tools and processes to help both data center architecture and hard drive science continue to evolve.

The post An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Helium Factor and Hard Drive Failure Rates

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/helium-filled-hard-drive-failure-rates/

Seagate Enterprise Capacity 3.5 Helium HDD

In November 2013, the first commercially available helium-filled hard drive was introduced by HGST, a Western Digital subsidiary. The 6 TB drive was not only unique in being helium-filled, it was for the moment, the highest capacity hard drive available. Fast forward a little over 4 years later and 12 TB helium-filled drives are readily available, 14 TB drives can be found, and 16 TB helium-filled drives are arriving soon.

Backblaze has been purchasing and deploying helium-filled hard drives over the past year and we thought it was time to start looking at their failure rates compared to traditional air-filled drives. This post will provide an overview, then we’ll continue the comparison on a regular basis over the coming months.

The Promise and Challenge of Helium Filled Drives

We all know that helium is lighter than air — that’s why helium-filled balloons float. Inside of an air-filled hard drive there are rapidly spinning disk platters that rotate at a given speed, 7200 rpm for example. The air inside adds an appreciable amount of drag on the platters that in turn requires an appreciable amount of additional energy to spin the platters. Replacing the air inside of a hard drive with helium reduces the amount of drag, thereby reducing the amount of energy needed to spin the platters, typically by 20%.

We also know that after a few days, a helium-filled balloon sinks to the ground. This was one of the key challenges in using helium inside of a hard drive: helium escapes from most containers, even if they are well sealed. It took years for hard drive manufacturers to create containers that could contain helium while still functioning as a hard drive. This container innovation allows helium-filled drives to function at spec over the course of their lifetime.

Checking for Leaks

Three years ago, we identified SMART 22 as the attribute assigned to recording the status of helium inside of a hard drive. We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute. It appears the normalized and raw values for SMART 22 currently report the same value, which starts at 100 and goes down.

To date only one HGST drive has reported a value of less than 100, with multiple readings between 94 and 99. That drive continues to perform fine, with no other errors or any correlating changes in temperature, so we are not sure whether the change in value is trying to tell us something or if it is just a wonky sensor.

Helium versus Air-Filled Hard Drives

There are several different ways to compare these two types of drives. Below we decided to use just our 8, 10, and 12 TB drives in the comparison. We did this since we have helium-filled drives in those sizes. We left out of the comparison all of the drives that are 6 TB and smaller as none of the drive models we use are helium-filled. We are open to trying different comparisons. This just seemed to be the best place to start.

Lifetime Hard Drive Failure Rates: Helium vs. Air-Filled Hard Drives table

The most obvious observation is that there seems to be little difference in the Annualized Failure Rate (AFR) based on whether they contain helium or air. One conclusion, given this evidence, is that helium doesn’t affect the AFR of hard drives versus air-filled drives. My prediction is that the helium drives will eventually prove to have a lower AFR. Why? Drive Days.

Let’s go back in time to Q1 2017 when the air-filled drives listed in the table above had a similar number of Drive Days to the current number of Drive Days for the helium drives. We find that the failure rate for the air-filled drives at the time (Q1 2017) was 1.61%. In other words, when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.

Helium or Air?

My hypothesis is that after normalizing the data so that the helium and air-filled drives have the same (or similar) usage (Drive Days), the helium-filled drives we use will continue to have a lower Annualized Failure Rate versus the air-filled drives we use. I expect this trend to continue for the next year at least. What side do you come down on? Will the Annualized Failure Rate for helium-filled drives be better than air-filled drives or vice-versa? Or do you think the two technologies will be eventually produce the same AFR over time? Pick a side and we’ll document the results over the next year and see where the data takes us.

The post The Helium Factor and Hard Drive Failure Rates appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats for 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-2017/

Backbalze Drive Stats 2017 Review

Beginning in April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. As of the end of 2017, there are about 88 million entries totaling 23 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Overview

At the end of 2017 we had 93,240 spinning hard drives. Of that number, there were 1,935 boot drives and 91,305 data drives. This post looks at the hard drive statistics of the data drives we monitor. We’ll review the stats for Q4 2017, all of 2017, and the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q4 2017

At the end of Q4 2017 Backblaze was monitoring 91,305 hard drives used to store data. For our evaluation we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives (read why after the chart). This leaves us with 91,243 hard drives. The table below is for the period of Q4 2017.

Hard Drive Annualized Failure Rates for Q4 2017

A few things to remember when viewing this chart:

  • The failure rate listed is for just Q4 2017. If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q4 2017.
  • There were 62 drives (91,305 minus 91,243) that were not included in the list above because we did not have at least 45 of a given drive model. The most common reason we would have fewer than 45 drives of one model is that we needed to replace a failed drive and we had to purchase a different model as a replacement because the original model was no longer available. We use 45 drives of the same model as the minimum number to qualify for reporting quarterly, yearly, and lifetime drive statistics.
  • Quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of drive days. For example, the Seagate 4 TB drive, model ST4000DM005, has a annualized failure rate of 29.08%, but that is based on only 1,255 drive days and 1 (one) drive failure.
  • AFR stands for Annualized Failure Rate, which is the projected failure rate for a year based on the data from this quarter only.

Bulking Up and Adding On Storage

Looking back over 2017, we not only added new drives, we “bulked up” by swapping out functional and smaller 2, 3, and 4TB drives with larger 8, 10, and 12TB drives. The changes in drive quantity by quarter are shown in the chart below:

Backblaze Drive Population by Drive Size

For 2017 we added 25,746 new drives, and lost 6,442 drives to retirement for a net of 19,304 drives. When you look at storage space, we added 230 petabytes and retired 19 petabytes, netting us an additional 211 petabytes of storage in our data center in 2017.

2017 Hard Drive Failure Stats

Below are the lifetime hard drive failure statistics for the hard drive models that were operational at the end of Q4 2017. As with the quarterly results above, we have removed any non-production drives and any models that had fewer than 45 drives.

Hard Drive Annualized Failure Rates

The chart above gives us the lifetime view of the various drive models in our data center. The Q4 2017 chart at the beginning of the post gives us a snapshot of the most recent quarter of the same models.

Let’s take a look at the same models over time, in our case over the past 3 years (2015 through 2017), by looking at the annual failure rates for each of those years.

Annual Hard Drive Failure Rates by Year

The failure rate for each year is calculated for just that year. In looking at the results the following observations can be made:

  • The failure rates for both of the 6 TB models, Seagate and WDC, have decreased over the years while the number of drives has stayed fairly consistent from year to year.
  • While it looks like the failure rates for the 3 TB WDC drives have also decreased, you’ll notice that we migrated out nearly 1,000 of these WDC drives in 2017. While the remaining 180 WDC 3 TB drives are performing very well, decreasing the data set that dramatically makes trend analysis suspect.
  • The Toshiba 5 TB model and the HGST 8 TB model had zero failures over the last year. That’s impressive, but with only 45 drives in use for each model, not statistically useful.
  • The HGST/Hitachi 4 TB models delivered sub 1.0% failure rates for each of the three years. Amazing.

A Few More Numbers

To save you countless hours of looking, we’ve culled through the data to uncover the following tidbits regarding our ever changing hard drive farm.

  • 116,833 — The number of hard drives for which we have data from April 2013 through the end of December 2017. Currently there are 91,305 drives (data drives) in operation. This means 25,528 drives have either failed or been removed from service due for some other reason — typically migration.
  • 29,844 — The number of hard drives that were installed in 2017. This includes new drives, migrations, and failure replacements.
  • 81.76 — The number of hard drives that were installed each day in 2017. This includes new drives, migrations, and failure replacements.
  • 95,638 — The number of drives installed since we started keeping records in April 2013 through the end of December 2017.
  • 55.41 — The average number of hard drives installed per day from April 2013 to the end of December 2017. The installations can be new drives, migration replacements, or failure replacements.
  • 1,508 — The number of hard drives that were replaced as failed in 2017.
  • 4.13 — The average number of hard drives that have failed each day in 2017.
  • 6,795 — The number of hard drives that have failed from April 2013 until the end of December 2017.
  • 3.94 — The average number of hard drives that have failed each day from April 2013 until the end of December 2017.

Can’t Get Enough Hard Drive Stats?

We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2017” on Thursday February 9, 2017 at 10:00 Pacific time. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up today.

As a reminder, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone — it is free.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats for 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What is HAMR and How Does It Enable the High-Capacity Needs of the Future?

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hamr-hard-drives/

HAMR drive illustration

During Q4, Backblaze deployed 100 petabytes worth of Seagate hard drives to our data centers. The newly deployed Seagate 10 and 12 TB drives are doing well and will help us meet our near term storage needs, but we know we’re going to need more drives — with higher capacities. That’s why the success of new hard drive technologies like Heat-Assisted Magnetic Recording (HAMR) from Seagate are very relevant to us here at Backblaze and to the storage industry in general. In today’s guest post we are pleased to have Mark Re, CTO at Seagate, give us an insider’s look behind the hard drive curtain to tell us how Seagate engineers are developing the HAMR technology and making it market ready starting in late 2018.

What is HAMR and How Does It Enable the High-Capacity Needs of the Future?

Guest Blog Post by Mark Re, Seagate Senior Vice President and Chief Technology Officer

Earlier this year Seagate announced plans to make the first hard drives using Heat-Assisted Magnetic Recording, or HAMR, available by the end of 2018 in pilot volumes. Even as today’s market has embraced 10TB+ drives, the need for 20TB+ drives remains imperative in the relative near term. HAMR is the Seagate research team’s next major advance in hard drive technology.

HAMR is a technology that over time will enable a big increase in the amount of data that can be stored on a disk. A small laser is attached to a recording head, designed to heat a tiny spot on the disk where the data will be written. This allows a smaller bit cell to be written as either a 0 or a 1. The smaller bit cell size enables more bits to be crammed into a given surface area — increasing the areal density of data, and increasing drive capacity.

It sounds almost simple, but the science and engineering expertise required, the research, experimentation, lab development and product development to perfect this technology has been enormous. Below is an overview of the HAMR technology and you can dig into the details in our technical brief that provides a point-by-point rundown describing several key advances enabling the HAMR design.

As much time and resources as have been committed to developing HAMR, the need for its increased data density is indisputable. Demand for data storage keeps increasing. Businesses’ ability to manage and leverage more capacity is a competitive necessity, and IT spending on capacity continues to increase.

History of Increasing Storage Capacity

For the last 50 years areal density in the hard disk drive has been growing faster than Moore’s law, which is a very good thing. After all, customers from data centers and cloud service providers to creative professionals and game enthusiasts rarely go shopping looking for a hard drive just like the one they bought two years ago. The demands of increasing data on storage capacities inevitably increase, thus the technology constantly evolves.

According to the Advanced Storage Technology Consortium, HAMR will be the next significant storage technology innovation to increase the amount of storage in the area available to store data, also called the disk’s “areal density.” We believe this boost in areal density will help fuel hard drive product development and growth through the next decade.

Why do we Need to Develop Higher-Capacity Hard Drives? Can’t Current Technologies do the Job?

Why is HAMR’s increased data density so important?

Data has become critical to all aspects of human life, changing how we’re educated and entertained. It affects and informs the ways we experience each other and interact with businesses and the wider world. IDC research shows the datasphere — all the data generated by the world’s businesses and billions of consumer endpoints — will continue to double in size every two years. IDC forecasts that by 2025 the global datasphere will grow to 163 zettabytes (that is a trillion gigabytes). That’s ten times the 16.1 ZB of data generated in 2016. IDC cites five key trends intensifying the role of data in changing our world: embedded systems and the Internet of Things (IoT), instantly available mobile and real-time data, cognitive artificial intelligence (AI) systems, increased security data requirements, and critically, the evolution of data from playing a business background to playing a life-critical role.

Consumers use the cloud to manage everything from family photos and videos to data about their health and exercise routines. Real-time data created by connected devices — everything from Fitbit, Alexa and smart phones to home security systems, solar systems and autonomous cars — are fueling the emerging Data Age. On top of the obvious business and consumer data growth, our critical infrastructure like power grids, water systems, hospitals, road infrastructure and public transportation all demand and add to the growth of real-time data. Data is now a vital element in the smooth operation of all aspects of daily life.

All of this entails a significant infrastructure cost behind the scenes with the insatiable, global appetite for data storage. While a variety of storage technologies will continue to advance in data density (Seagate announced the first 60TB 3.5-inch SSD unit for example), high-capacity hard drives serve as the primary foundational core of our interconnected, cloud and IoT-based dependence on data.

HAMR Hard Drive Technology

Seagate has been working on heat assisted magnetic recording (HAMR) in one form or another since the late 1990s. During this time we’ve made many breakthroughs in making reliable near field transducers, special high capacity HAMR media, and figuring out a way to put a laser on each and every head that is no larger than a grain of salt.

The development of HAMR has required Seagate to consider and overcome a myriad of scientific and technical challenges including new kinds of magnetic media, nano-plasmonic device design and fabrication, laser integration, high-temperature head-disk interactions, and thermal regulation.

A typical hard drive inside any computer or server contains one or more rigid disks coated with a magnetically sensitive film consisting of tiny magnetic grains. Data is recorded when a magnetic write-head flies just above the spinning disk; the write head rapidly flips the magnetization of one magnetic region of grains so that its magnetic pole points up or down, to encode a 1 or a 0 in binary code.

Increasing the amount of data you can store on a disk requires cramming magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other.

Heat Assisted Magnetic Recording (HAMR) is the next step to enable us to increase the density of grains — or bit density. Current projections are that HAMR can achieve 5 Tbpsi (Terabits per square inch) on conventional HAMR media, and in the future will be able to achieve 10 Tbpsi or higher with bit patterned media (in which discrete dots are predefined on the media in regular, efficient, very dense patterns). These technologies will enable hard drives with capacities higher than 100 TB before 2030.

The major problem with packing bits so closely together is that if you do that on conventional magnetic media, the bits (and the data they represent) become thermally unstable, and may flip. So, to make the grains maintain their stability — their ability to store bits over a long period of time — we need to develop a recording media that has higher coercivity. That means it’s magnetically more stable during storage, but it is more difficult to change the magnetic characteristics of the media when writing (harder to flip a grain from a 0 to a 1 or vice versa).

That’s why HAMR’s first key hardware advance required developing a new recording media that keeps bits stable — using high anisotropy (or “hard”) magnetic materials such as iron-platinum alloy (FePt), which resist magnetic change at normal temperatures. Over years of HAMR development, Seagate researchers have tested and proven out a variety of FePt granular media films, with varying alloy composition and chemical ordering.

In fact the new media is so “hard” that conventional recording heads won’t be able to flip the bits, or write new data, under normal temperatures. If you add heat to the tiny spot on which you want to write data, you can make the media’s coercive field lower than the magnetic field provided by the recording head — in other words, enable the write head to flip that bit.

So, a challenge with HAMR has been to replace conventional perpendicular magnetic recording (PMR), in which the write head operates at room temperature, with a write technology that heats the thin film recording medium on the disk platter to temperatures above 400 °C. The basic principle is to heat a tiny region of several magnetic grains for a very short time (~1 nanoseconds) to a temperature high enough to make the media’s coercive field lower than the write head’s magnetic field. Immediately after the heat pulse, the region quickly cools down and the bit’s magnetic orientation is frozen in place.

Applying this dynamic nano-heating is where HAMR’s famous “laser” comes in. A plasmonic near-field transducer (NFT) has been integrated into the recording head, to heat the media and enable magnetic change at a specific point. Plasmonic NFTs are used to focus and confine light energy to regions smaller than the wavelength of light. This enables us to heat an extremely small region, measured in nanometers, on the disk media to reduce its magnetic coercivity,

Moving HAMR Forward

HAMR write head

As always in advanced engineering, the devil — or many devils — is in the details. As noted earlier, our technical brief provides a point-by-point short illustrated summary of HAMR’s key changes.

Although hard work remains, we believe this technology is nearly ready for commercialization. Seagate has the best engineers in the world working towards a goal of a 20 Terabyte drive by 2019. We hope we’ve given you a glimpse into the amount of engineering that goes into a hard drive. Keeping up with the world’s insatiable appetite to create, capture, store, secure, manage, analyze, rapidly access and share data is a challenge we work on every day.

With thousands of HAMR drives already being made in our manufacturing facilities, our internal and external supply chain is solidly in place, and volume manufacturing tools are online. This year we began shipping initial units for customer tests, and production units will ship to key customers by the end of 2018. Prepare for breakthrough capacities.

The post What is HAMR and How Does It Enable the High-Capacity Needs of the Future? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q3 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q3-2017/

Q3 2017 Hard Drive Stats

In Q3 2017, Backblaze introduced both 10 TB and 12 TB hard drives into our data centers, we continued to retire 3 TB and 4 TB hard drives to increase storage density, and we added over 59 petabytes of data storage to bring our total storage capacity to 400 petabytes.

In this update, we’ll review the Q3 2017 and lifetime hard drive failure rates for all our drive models in use at the end of Q3. We’ll also check in on our 8 TB enterprise versus consumer hard drive comparison, and look at the storage density changes in our data centers over the past couple of years. Along the way, we’ll share our observations and insights, and as always, you can download the hard drive statistics data we use to create these reports.

Q3 2017 Hard Drive Failure Rates

Since our Q2 2017 report, we added 9,599 new hard drives and retired 6,221 hard drives, for a net add of 3,378 drives and a total of 86,529. These numbers are for those hard drives of which we have 45 or more drives — with one exception that we’ll get to in a minute.

Let’s look at the Q3 statistics that include our first look at the 10 TB and 12 TB hard drives we added in Q3. The chart below is for activity that occurred just in Q3 2017.

Hard Drive Failure Rates for Q3 2017

Observations

  1. The hard drive failure rate for the quarter was 1.84%, our lowest quarterly rate ever. There are several factors that contribute to this, but one that stands out is the average age of the hard drives in use. Only the 4 TB HGST drives (model: HDS5C4040ALE630) have an average age over 4 years — 51.3 months to be precise. The average age of all the other drive models is less than 4 years, with nearly 80% of all of the drives being less than 3 years old.
  2. The 10- and 12 TB drive models are new. With a combined 13,000 drive days in operation, they’ve had zero failures. While all of these drives passed through formatting and load testing without incident, it is a little too early to reach any conclusions.

Testing Drives

Normally, we list only those drive models where we have 45 drives or more, as it formerly took 45 drives (currently 60), to fill a Storage Pod. We consider a Storage Pod as a base unit for drive testing. Yet, we listed the 12 TB drives even though we only have 20 of them in operation. What gives? It’s the first step in testing drives.

A Backblaze Vault consists of 20 Storage Pods logically grouped together. Twenty 12 TB drives are deployed in the same drive position in each of the 20 Storage Pods and grouped together into a storage unit we call a “tome.” An incoming file is stored in one tome, and is spread out across the 20 storage pods in the tome for reliability and availability. The remaining 59 tomes, in this case, use 8 TB drives. This allows us to see the performance and reliability of a 12 TB hard drive model in an operational environment without having to buy 1,200 of them to start.

Breaking news: Our first Backblaze Vault filled with 1,200 Seagate 12 TB hard drives (model: ST12000NM007) went into production on October 20th.

Storage Density Continues to Increase

As noted earlier, we retired 6,221 hard drives in Q3: all 3- or 4 TB hard drives. The retired drives have been replaced by 8-, 10-, and 12 TB drive models. This dramatic increase in storage density added 59 petabytes of storage in Q3. The following chart shows that change since the beginning of 2016.

Hard Drive Count by Drive Size

You clearly can see the retirement of the 2 TB and 3 TB drives, each being replaced predominantly by 8 TB drives. You also can see the beginning of the retirement curve for the 4 TB drives that will be replaced most likely by 12 TB drives over the coming months. A subset of the 4 TB drives, about 10,000 or so which were installed in the past year or so, will most likely stay in service for at least the next couple of years.

Lifetime Hard Drive Stats

The table below shows the failure rates for the hard drive models we had in service as of September 30, 2017. This is over the period beginning in April 2013 and ending September 30, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

Cumulative Hard Drive Failure Rates

Note 1: The “+ / – Change” column reflects the change in the annualized failure rate from the previous quarter. Down is good, up is bad.
Note 2: You can download the data on this chart and the data from the “Hard Drive Failure Rates for Q3 2017” chart shown earlier in this review. The downloaded ZIP file contains one MSExcel spreadsheet.

The annualized failure rate for all of the drive models listed above is 2.07%; this is the higher than the 1.97% for the previous quarter. The primary driver behind this was the retirement of all of the HGST 3 TB drives (model: HDS5C3030ALA630) in Q3. Those drives had over 6 million drive days and an annualized failure rate of 0.82% — well below the average for the entire set of drives. Those drives now are gone and no longer part of the results.

Consumer Versus Enterprise Drives

The comparison of the consumer and enterprise Seagate 8 TB drives continues. Both of the drive models, Enterprise: ST8000NM0055 and Consumer: ST8000DM002, saw their annualized failure rates decrease from the previous quarter. In the case of the enterprise drives, this occurred even though we added 8,350 new drives in Q3. This brings the total number of Seagate 8 TB enterprise drives to 14,404, which have accumulated nearly 1.4 million drive days.

A comparison of the two drive models shows the annualized failure rates being very similar:

  • 8 TB Consumer Drives: 1.1% Annualized Failure Rate
  • 8 TB Enterprise Drives: 1.2% Annualized Failure Rate

Given that the failure rates for the two drive models appears to be similar, are the Seagate 8 TB enterprise drives worth any premium you might have to pay for them? As we have previously documented, the Seagate enterprise drives load data faster and have a number of features such as the PowerChoiceTM technology that can be very useful. In addition, enterprise drives typically have a 5 year warranty versus a 2 year warranty for the consumer drives. While drive price and availability are our primary considerations, you may decide other factors are more important.

We will continue to follow these drives, especially as they age over 2 years: the warranty point for the consumer drives.

Join the Drive Stats Webinar on Friday, November 3

We will be doing a deeper dive on this review in a webinar: “Q3 2017 Hard Drive Failure Stats” being held on Friday, November 3rd at 10:00 am Pacific Time. We’ll dig into what’s behind the numbers, including the enterprise vs consumer drive comparison. To sign up for the webinar, you will need to subscribe to the Backblaze BrightTALK channel if you haven’t already done so.

Wrapping Up

Our next drive stats post will be in January, when we’ll review the data for Q4 and all of 2017, and we’ll update our lifetime stats for all of the drives we have ever used. In addition, we’ll get our first real look at the 12 TB drives.

As a reminder, the hard drive data we use is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone: it is free.

Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q3 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Improved Search for Backblaze’s Blog

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/using-relevannssi-wordpress-search/

Improved Search for Backblaze's Blog
Search has become the most powerful method to find content on the Web, both for finding websites themselves and for discovering information within websites. Our blog readers find content in both ways — using Google, Bing, Yahoo, Ask, DuckDuckGo, and other search engines to follow search results directly to our blog, and using the site search function once on our blog to find content in the blog posts themselves.

There’s a Lot of Great Content on the Backblaze Blog

Backblaze’s CEO Gleb Budman wrote the first post for this blog in March of 2008. Since that post there have been 612 more. There’s a lot of great content on this blog, as evidenced by the more than two million page views we’ve had since the beginning of this year. We typically publish two blog posts per week on a variety of topics, but we focus primarily on cloud storage technology and data backup, company news, and how-to articles on how to use cloud storage and various hardware and software solutions.

Earlier this year we initiated a series of posts on entrepreneurship by our CEO and co-founder, Gleb Budman, which has proven tremendously popular. We also occasionally publish something a little lighter, such as our current Halloween video contest — there’s still time to enter!

Blog search box

The Site Search Box — Your gateway to Backblaze blog content

We Could do a Better Job of Helping You Find It

I joined Backblaze as Content Director in July of this year. During the application process, I spent quite a bit of time reading through the blog to understand the company, the market, and its customers. That’s a lot of reading. I used the site search many times to uncover topics and posts, and discovered that site search had a number of weaknesses that made it less-than-easy to find what I was looking for.

These site search weaknesses included:

Searches were case sensitive
Visitor could easily miss content capitalized differently than the search terms
Results showed no date or author information
Visitor couldn’t tell how recent the post was or who wrote it
Search terms were not highlighted in context
Visitor had to scrutinize the results to find the terms in the post
No indication of the number of results or number of pages of results
Visitor didn’t know how fruitful the search was
No record of search terms used by visitors
We couldn’t tell what our visitors were searching for!

I wanted to make it easier for blog visitors to find all the great content on the Backblaze blog and help me understand what our visitors are searching for. To do that, we needed to upgrade our site search.

I started with a list of goals I wanted for site search.

  1. Make it easier to find content on the blog
  2. Provide a summary of what was found
  3. Search the comments as well as the posts
  4. Highlight the search terms in the results to help find them in context
  5. Provide a record of searches to help me understand what interests our readers

I had the goals, now how could I find a solution to achieve them?

Our blog is built on WordPress, which has a built-in site search function that could be described as simply adequate. The most obvious of its limitations is that search results are listed chronologically, not based on “most popular,” most occurring,” or any other metric that might make the result more relevant to your interests.

The Search for Improved (Site) Search

An obvious choice to improve site search would be to adopt Google Site Search, as many websites and blogs have done. Unfortunately, I quickly discovered that Google is sunsetting Site Search by April of 2018. That left the choice among a number of third-party services or WordPress-specific solutions. My immediate inclination was to see what is available specifically for WordPress.

There are a handful of search plugins for WordPress. One stood out to me for the number of installations (100,000+) and overwhelmingly high reviews: Relevanssi. Still, I had a number of questions. The first question was whether the plugin retained any search data from our site — I wanted to make sure that the privacy of our visitors is maintained, and even harvesting anonymous search data would not be acceptable to Backblaze. I wrote to the developer and was pleased by the responsiveness from Relevanssi’s creator, Mikko Saari. He explained to me that Relevanssi doesn’t have access to any of the search data from the sites using his plugin. Receiving a quick response from a developer is always a good sign. Other signs of a good WordPress plugin are recent updates and an active support forum.

Our solution: Relevanssi for Site Search

The WordPress plugin Relevanssi met all of our criteria, so we installed the plugin and switched to using it for site search in September.

In addition to solving the problems listed above, our search results are now displayed based on relevance instead of date, which is the default behavior of WordPress search. That capability is very useful on our blog where a lot of the content from years ago is still valuable — often called evergreen content. The new site search also enables visitors to search using the boolean expressions AND and OR. For example, a visitor can search for “seagate AND drive,” and see results that only include both words. Alternatively, a visitor can search for “seagate OR drive” and see results that include either word.

screenshot of relevannssi wordpress search results

Search results showing total number of results, hits and their location, and highlighted search terms in context

Visitors can put search terms in quotation marks to search for an entire phrase. For example, a visitor can search for “2016 drive stats” and see results that include only that exact phrase. In addition, the site search results come with a summary, showing where the results were found (title, post, or comments). Search terms are highlighted in yellow in the content, showing exactly where the search result was found.

Here’s an example of a popular post that shows up in searches. Hard Drive Stats for Q1 2017 was published on May 9, 2017. Since September 4, it has shown up over 150 times in site searches and in the last 90 days in has been viewed over 53,000 times on our blog.

Hard Drive Stats for Q1 2017

The Results Tell the Story

Since initiating the new search on our blog on September 4, there have been almost 23,000 site searches conducted, so we know you are using it. We’ve implemented pagination for the blog feed and search results so you know how many pages of results there are and made it easier to navigate to them.

Now that we have this site search data, you likely are wondering which are the most popular search terms on our blog. Here are some of the top searches:

What Do You Search For?

Please tell us how you use site search and whether there are any other capabilities you’d like to see that would make it easier to find content on our blog.

The post Improved Search for Backblaze’s Blog appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Yes, Backblaze Just Ordered 100 Petabytes of Hard Drives

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/400-petabytes-cloud-storage/

10 Petabyt vault, 100 Petabytes ordered, 400 Petabytes stored

Backblaze just ordered a 100 petabytes’ worth of hard drives, and yes, we’ll use nearly all of them in Q4. In fact, we’ll begin the process of sourcing the Q1 hard drive order in the next few weeks.

What are we doing with all those hard drives? Let’s take a look.

Our First 10 Petabyte Backblaze Vault

Ken clicked the submit button and 10 Petabytes of Backblaze Cloud Storage came online ready to accept customer data. Ken (aka the Pod Whisperer), is one of our Datacenter Operations Managers at Backblaze and with that one click, he activated Backblaze Vault 1093, which was built with 1,200 Seagate 10 TB drives (model: ST10000NM0086). After formatting and configuration of the disks, there is 10.12 Petabytes of free space remaining for customer data. Back in 2011, when Ken started at Backblaze, he was amazed that we had amassed as much as 10 Petabytes of data storage.

The Seagate 10 TB drives we deployed in vault 1093 are helium-filled drives. We had previously deployed 45 HGST 8 TB helium-filled drives where we learned one of the benefits of using helium drives — they consume less power than traditional air-filled drives. Here’s a quick comparison of the power consumption of several high-density drive models we deploy:

MFR Model Fill Size Idle (1) Operating (2)
Seagate ST8000DM002 Air 8 TB 7.2 watts 9.0 watts
Seagate ST8000NM0055 Air 8 TB 7.6 watts 8.6 watts
HGST HUH728080ALE600 Helium 8 TB 5.1 watts 7.4 watts
Seagate ST10000NM0086 Helium 10 TB 4.8 watts 8.6 watts
(1) Idle: Average Idle in watts as reported by the manufacturer.
(2) Operating: The maximum operational consumption in watts as reported by the manufacturer — typically for read operations.

I’d like 100 Petabytes of Hard Drives To Go, Please

“100 Petabytes should get us through Q4.” — Tim Nufire, Chief Cloud Officer, Backblaze

The 1,200 Seagate 10 TB drives are just the beginning. The next Backblaze Vault will be configured with 12 TB drives which will give us 12.2 petabytes of storage in one vault. We are currently building and adding two to three Backblaze Vaults a month to our cloud storage system, so we are going to need more drives. When we did all of our “drive math,” we decided to place an order for 100 petabytes of hard drives comprised of 10 and 12 TB models. Gleb, our CEO and occasional blogger, exhaled mightily as he signed the biggest purchase order in company history. Wait until he sees the one for Q1.

Enough drives for a 10 petabyte vault

400 Petabytes of Cloud Storage

When we added Backblaze Vault 1093, we crossed over 400 Petabytes of total available storage. For those of you keeping score at home, we reached 350 Petabytes about 3 months ago as you can see in the chart below.

Petabytes of data stored by Backblaze

Backblaze Vault Primer

All of the storage capacity we’ve added in the last two years has been on our Backblaze Vault architecture, with vault 1093 being the 60th one we have placed into service. Each Backblaze Vault is comprised of 20 Backblaze Storage Pods logically grouped together into one storage system. Today, each Storage Pod contains sixty 3 ½” hard drives, giving each vault 1,200 drives. Early vaults were built on Storage Pods with 45 hard drives, for a total of 900 drives in a vault.

A Backblaze Vault accepts data directly from an authenticated user. Each data blob (object, file, group of files) is divided into 20 shards (17 data shards and 3 parity shards) using our erasure coding library. Each of the 20 shards is stored on a different Storage Pod in the vault. At any given time, several vaults stand ready to receive data storage requests.

Drive Stats for the New Drives

In our Q3 2017 Drive Stats report, due out in late October, we’ll start reporting on the 10 TB drives we are adding. It looks like the 12 TB drives will come online in Q4. We’ll also get a better look at the 8 TB consumer and enterprise drives we’ve been following. Stay tuned.

Other Big Data Clouds

We have always been transparent here at Backblaze, including about how much data we store, how we store it, even how much it costs to do so. Very few others do the same. But, if you have information on how much data a company or organization stores in the cloud, let us know in the comments. Please include the source and make sure the data is not considered proprietary. If we get enough tidbits we’ll publish a “big cloud” list.

The post Yes, Backblaze Just Ordered 100 Petabytes of Hard Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Cost Per Gigabyte

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/

Hard Drive Cost

For hard drive prices, the race to zero is over: nobody won. For the past 35+ years or so, hard drives prices have dropped, from around $500,000 per gigabyte in 1981 to less than $0.03 per gigabyte today. This includes the period of the Thailand drive crisis in 2012 that spiked hard drive prices. Matthew Komorowski has done an admirable job of documenting the hard drive price curve through March 2014 and we’d like to fill in the blanks with our own drive purchase data to complete the picture. As you’ll see, the hard drive pricing curve has flattened out.

75,000 New Hard Drives

We first looked at the cost per gigabyte of a hard drive in 2013 when we examined the effects of the Thailand Drive crisis on our business. When we wrote that post, the cost per gigabyte for a 4 TB hard drive was about $0.04 per gigabyte. Since then 5-, 6-, 8- and recently 10 TB hard drives have been introduced and during that period we have purchased nearly 75,000 drives. Below is a chart by drive size of the drives we purchased since that last report in 2013.

Hard Drive Cost Per GB by drive size

Observations

  1. We purchase drives in bulk, thousands at a time. The price you might get at Costco or BestBuy, or on Amazon will most likely be higher.
  2. The effect of the Thailand Drive crisis is clearly seen from October 2011 through mid-2013.

The 4 TB Drive Enigma

Up through the 4 TB drive models, the cost per gigabyte of a larger sized drive always became less than the smaller sized drives. In other words, the cost per gigabyte of a 2 TB drive was less than that of a 1 TB drive resulting in higher density at a lower cost per gigabyte. This changed with the introduction of 6- and 8 TB drives, especially as it relates to the 4 TB drives. As you can see in the chart above, the cost per gigabyte of the 6 TB drives did not fall below that of the 4 TB drives. You can also observe that the 8 TB drives are just approaching the cost per gigabyte of the 4 TB drives. The 4 TB drives are the price king as seen in the chart below of the current cost of Seagate consumer drives by size.

Seagate Hard Drive Prices By Size

Drive Size Model Price Cost/GB
1 TB ST1000DM010 $49.99 $0.050
2 TB ST2000DM006 $66.99 $0.033
3 TB ST3000DM008 $83.72 $0.028
4 TB ST4000DM005 $99.99 $0.025
6 TB ST6000DM004 $240.00 $0.040
8 TB ST8000DM005 $307.34 $0.038

The data on this chart was sourced from the current price of these drives on Amazon. The drive models selected were “consumer” drives, like those we typically use in our data centers.

The manufacturing and marketing efficiencies that drive the pricing of hard drives seems to have changed over time. For example, the 6 TB drives have been in the market at least 3 years, but are not even close to the cost per gigabyte of the 4 TB drives. Meanwhile, back in 2011, the 3 TB drives models fell below the cost per gigabyte of the 2 TB drives they “replaced” within a few months. Have we as consumers decided that 4 TB drives are “big enough” for our needs and we are not demanding (by purchasing) larger sized drives in the quantities needed to push down the unit cost?

Approaching Zero: There’s a Limit

The important aspect is the trend of the cost over time. While it has continued to move downward, the rate of change has slowed dramatically as observed in the chart below which represents our average quarterly cost per gigabyte over time.

Hard Drive Cost per GB over time

The change in the rate of the cost per gigabyte of a hard drive is declining. For example, from January 2009 to January 2011, our average cost for a hard drive decreased 45% from $0.11 to $0.06 – $0.05 per gigabyte. From January 2015 to January 2017, the average cost decreased 26% from $0.038 to $0.028 – just $0.01 per gigabyte. This means that the declining price of storage will become less relevant in driving the cost of providing storage.

Back in 2011, IDC predicted that the overall data will grow by 50 times by 2020, and in 2014, EMC estimated that by 2020, we will be creating 44 trillion gigabytes of data annually. That’s quite a challenge for the storage industry especially as the cost per gigabyte curve for hard drives is flattening out. Improvements in existing storage technologies (Helium, HAMR) along with future technologies (Quantum Storage, DNA), are on the way – we can’t wait. Of course we’d like these new storage devices to be 50% less expensive per gigabyte then today’s hard drives. That would be a good start.

The post Hard Drive Cost Per Gigabyte appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q1 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/

2017 hard drive stats

In this update, we’ll review the Q1 2017 and lifetime hard drive failure rates for all our current drive models, and we’ll look at a relatively new class of drives for us – “enterprise”. We’ll share our observations and insights, and as always, you can download the hard drive statistics data we use to create these reports.

Our Hard Drive Data Set

Backblaze has now recorded and saved daily hard drive statistics from the drives in our data centers for over 4 years. This data includes the SMART attributes reported by each drive, along with related information such a the drive serial number and failure status. As of March 31, 2017 we had 84,469 operational hard drives. Of that there were 1,800 boot drives and 82,669 data drives. For our review, we remove drive models of which we have less than 45 drives, leaving us to analyze 82,516 hard drives for this report. There are currently 17 different hard drives models, ranging in size from 3 to 8 TB in size. All of these models are 3½” drives.

Hard Drive Reliability Statistics for Q1 2017

Since our last report in Q4 2016, we have added 10,577 additional hard drives to bring us to the 82,516 drives we’ll focus on. We’ll start by looking at the statistics for the period of January 1, 2017 through March 31, 2017 – Q1 2017. This is for the drives that were operational during that period, ranging in size from 3 to 8 TB as listed below.

hard drive failure rates by model

Observations and Notes on the Q1 Review

You’ll notice that some of the drive models have a failure rate of “0” (zero). Here a failure rate of zero means there were no drive failures for that model during Q1 2017. Later, we will cover how these same drive models faired over their lifetime. Why is the quarterly data important? We use it to look for anything unusual. For example, in Q1 the 4 TB Seagate drive model: ST4000DX000, has a high failure rate of 35.88%, while the lifetime annualized failure rate for this model is much lower, 7.50%. In this case, we only have a 170 drives of this particular drive model, so the failure rate is not statistically significant, but such information could be useful if we were using several thousand drives of this particular model.

There were a total 375 drive failures in Q1. A drive is considered failed if one or more of the following conditions are met:

  • The drive will not spin up or connect to the OS.
  • The drive will not sync, or stay synced, in a RAID Array (see note below).
  • The Smart Stats we use show values above our thresholds.
  • Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our own open-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group.

The annualized hard drive failure rate for Q1 in our current population of drives is 2.11%. That’s a bit higher than previous quarters, but might be a function of us adding 10,577 new drives to our count in Q1. We’ve found that there is a slightly higher rate of drive failures early on, before the drives “get comfortable” in their new surroundings. This is seen in the drive failure rate “bathtub curve” we covered in a previous post.

10,577 More Drives

The additional 10,577 drives are really a combination of 11,002 added drives, less 425 drives that were removed. The removed drives were in addition to the 375 drives marked as failed, as those were replaced 1 for 1. The 425 drives were primarily removed from service due to migrations to higher density drives.

The table below shows the breakdown of the drives added in Q1 2017 by drive size.

drive counts by size

Lifetime Hard Drive Failure Rates for Current Drives

The table below shows the failure rates for the hard drive models we had in service as of March 31, 2017. This is over the period beginning in April 2013 and ending March 31, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

lifetime hard drive reliability rates

The annualized failure rate for the drive models listed above is 2.07%. This compares to 2.05% for the same collection of drive models as of the end of Q4 2016. The increase makes sense given the increase in Q1 2017 failure rate over previous quarters noted earlier. No new models were added during the current quarter and no old models exited the collection.

Backblaze is Using Enterprise Drives – Oh My!

Some of you may have noticed we now have a significant number of enterprise drives in our data center, namely 2,459 Seagate 8 TB drives, model: ST8000NM055. The HGST 8 TB drives were the first true enterprise drives we used as data drives in our data centers, but we only have 45 of them. So, why did we suddenly decide to purchase 2,400+ of the Seagate 8 TB enterprise drives? There was a very short period of time, as Seagate was introducing new and phasing out old drive models, that the cost per terabyte of the 8 TB enterprise drives fell within our budget. Previously we had purchased 60 of these drives to test in one Storage Pod and were satisfied they could work in our environment. When the opportunity arose to acquire the enterprise drives at a price we liked, we couldn’t resist.

Here’s a comparison of the 8 TB consumer drives versus the 8 TB enterprise drives to date:

enterprise vs. consumer hard drives

What have we learned so far…

  1. It is too early to compare failure rates – The oldest enterprise drives have only been in service for about 2 months, with most being placed into service just prior to the end of Q1. The Backblaze Vaults the enterprise drives reside in have yet to fill up with data. We’ll need at least 6 months before we could start comparing failure rates as the data is still too volatile. For example, if the current enterprise drives were to experience just 2 failures in Q2, their annualized failure rate would be about 0.57% lifetime.
  2. The enterprise drives load data faster – The Backblaze Vaults containing the enterprise drives, loaded data faster than the Backblaze Vaults containing consumer drives. The vaults with the enterprise drives loaded on average 140 TB per day, while the vaults with the consumer drives loaded on average 100 TB per day.
  3. The enterprise drives use more power – No surprise here as according to the Seagate specifications the enterprise drives use 9W average in idle and 10W average in operation. While the consumer drives use 7.2W average in idle and 9W average in operation. For a single drive this may seem insignificant, but when you put 60 drives in a 4U Storage Pod chassis and then 10 chassis in a rack, the difference adds up quickly.
  4. Enterprise drives have some nice features – The Seagate enterprise 8TB drives we used have PowerChoice™ technology that gives us the option to use less power. The data loading times noted above were recorded after we changed to a lower power mode. In short, the enterprise drive in a low power mode still stored 40% more data per day on average than the consumer drives.
  5. While it is great that the enterprise drives can load data faster, drive speed has never been a bottleneck in our system. A system that can load data faster will just “get in line” more often and fill up faster. There is always extra capacity when it comes to accepting data from customers.

    Wrapping Up

    We’ll continue to monitor the 8 TB enterprise drives and keep reporting our findings.

    If you’d like to hear more about our Hard Drive Stats, Backblaze will be presenting at the 33rd International Conference on Massive Storage Systems and Technology (MSST 2017) being held at Santa Clara University in Santa Clara California from May 15th – 19th. The conference will dedicate five days to computer-storage technology, including a day of tutorials, two days of invited papers, two days of peer-reviewed research papers, and a vendor exposition. Come join us.

    As a reminder, the hard drive data we use is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask is three things 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone, it is free.

    Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q1 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats for 2016

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-benchmark-stats-2016/

Backblaze drive stats
Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers since April 2013. At the end of 2016 we had 73,653 spinning hard drives. Of that number, there were 1,553 boot drives and 72,100 data drives. This post looks at the hard drive statistics of the data drives we monitor. We’ll first look at the stats for Q4 2016, then present the data for all of 2016, and finish with the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track. Along the way we’ll share observations and insights on the data presented. As always you can download our Hard Drive Test Data to examine and use.

Hard Drive Reliability Statistics for Q4 2016

At the end of Q4 2016 Backblaze was monitoring 72,100 data drives. For our evaluation we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 71,939 production hard drives. The table below is for the period of Q4 2016.

Hard Drive Annualized Failure Rates for Q4 2016
Notes:

  1. The failure rate listed is for just Q4 2016. If a drive model has a failure rate of 0%, it means there were no drive failures of that model during that quarter.
  2. 90 drives (2 storage pods) were used for testing purposes during the period. They contained Seagate 1.5TB and 1.0 TB WDC drives. These are not included in the results above.
  3. The most common reason we have less than 45 drives of one model is that we needed to replace a failed drive, but that drive model is no longer available. We use 45 drives as the minimum number to report quarterly and yearly statistics.

8 TB Hard Drive Performance

In Q4 2016 we introduced a third 8 TB drive model, the Seagate ST8000NM0055. This is an enterprise class drive. One 60-drive Storage Pod was deployed mid-Q4 and the initial results look promising as there have been no failures to date. Given our past disdain for overpaying for enterprise drives, it will be interesting to see how these drives perform.

We added 3,540 Seagate 8 TB drives, model ST8000DM002, giving us 8,660 of these drives. That’s 69 petabytes of raw storage, before formatting and encoding, or about 22% of our current data storage capacity. The failure rate for the quarter of these 8 TB drives was a very respectable 1.65%. That’s lower than the Q4 failure rate of 1.94% for all of the hard drives in the table above.

During the next couple of calendar quarters we’ll monitor how the new enterprise 8 TB drives compare to the consumer 8 TB drives. We’re interested to know which models deliver the best value and we bet you are too. We’ll let you know what we find.

2016 Hard Drive Performance Statistics

Looking back over 2016, we added 15,646 hard drives, and migrated 110 Storage Pods (4,950 drives) from 1-, 1.5-, and 2 TB drives to 4-, 6- and 8 TB drives. Below are the hard drive failure stats for 2016. As with the quarterly results, we have removed any non-production drives and any models that had less than 45 drives.

2016 Hard Drive Annualized Failure Rates
No Time For Failure

In 2016, three drives models ended the year with zero failures, albeit with a small number of drives. Both the 4 TB Toshiba and the 8 TB HGST models went the entire year without a drive failure. The 8 TB Seagate (ST8000NM0055) drives, which were deployed in November 2016, also recorded no failures.

The total number of failed drives was 1,225 for the year. That’s 3.36 drive failures per day or about 5 drives per workday, a very manageable workload. Of course, that’s easy for me to say, since I am not the one swapping out drives.

The overall hard drive failure rate for 2016 was 1.95%. That’s down from 2.47% in 2015 and well below the 6.39% failure rate for 2014.

Big Drives Rule

We increased storage density by moving to higher-capacity drives. That helped us end 2016 with 3 TB drives being the smallest density drives in our data centers. During 2017, we will begin migrating from the 3.0 TB drives to larger-sized drives. Here’s the distribution of our hard drives in our data centers by size for 2016.
2016 Distribution of Hard Drives by Size
Digging in a little further, below are the failure rates by drive size and vendor for 2016.

Hard Drive Failure Rates by Drive SizeHard Drive Failure Rates by Manufacturer

Computing the Failure Rate

Failure Rate, in the context we use it, is more accurately described as the Annualized Failure Rate. It is computed based on Drive Days and Drive Failures, not on the Drive Count. This may seem odd given we are looking at a one year period, 2016 in this case, so let’s take a look.

We start by dividing the Drive Failures by the Drive Count. For example if we use the statistics for 4 TB files, we get a “failure rate” of 1.92%, but the annualized failure rate shown on the chart for 4 TB drives is 2.06%. The trouble with just dividing Drive Failures by Drive Count is that the Drive Count constantly changes over the course of the year. By using Drive Count from a given day, you assume that each drive contributed the same amount of time over the year, but that’s not the case. Drives enter and leave the system all the time. By counting the number of days each drive is active as Drive Days, we can account for all the ins and outs over a given period of time.

Hard Drive Benchmark Statistics

As we noted earlier, we’ve been collecting and storing drive stats data since April 2013. In that time we have used 55 different hard drive models in our data center for data storage. We’ve omitted models from the table below that we didn’t have enough of to populate an entire storage pod (45 or fewer). That excludes 25 of those 55 models.

Annualized Hard Drive Failure Rates

Fun with Numbers

Since April 2013, there have been 5,380 hard drives failures. That works out to about 5 per day or about 7 per workday (200 workdays per year). As a point of reference, Backblaze only had 4,500 total hard drives in June 2010 when we racked our 100th Storage Pod to support our cloud backup service.

The 58,375,646 Drive Days translates to a little over 1.4 Billion Drive Hours. Going the other way we are measuring a mere 159,933 years of spinning hard drives.

You’ll also notice that we have used a total of 85,467 hard drives. But at the end of 2016 we had 71,939 hard drives. Are we missing 13,528 hard drives? Not really. While some drives failed, the remaining drives were removed from service due primarily to migrations from smaller to larger drives. The stats from the “migrated” drives, like Drive Hours, still count in establishing a failure rate, but they did not fail, they just stopped reporting data.

Failure Rates Over Time

The chart below shows the annualized failure rates of hard drives by drive size over time. The data points are the rates as of the end of each year shown. The “stars” mark the average annualized failure rate for all of the hard drives for each year.

Annualized Hard Drive Failures by Drive Size

Notes:

  1. The “8.0TB” failure rate of 4.9% for 2015 is comprised of 45 drives of which there were 2 failures during that year. In 2016 the number of 8 TB drives rose to 8,765 with 48 failures and an annualized failure rate of 1.6%.
  2. The “1.0TB” drives were 5+ years old on average when they were retired.
  3. There are only 45 of the “5.0TB” drives in operation.

Can’t Get Enough Hard Drive Stats?

We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2016” on Thursday February 2, 2017 at 10:00 Pacific time. The webinar will be recorded so you can watch it over and over again. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up for the webinar today.

As a reminder, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask is three things 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone, it is free. If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats for 2016 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A History of Hard Drives

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/history-hard-drives/

history-of-hard-drive
2016 marks the 60th anniversary of the venerable Hard Disk Drive (HDD). While new computers increasingly turn to Solid State Disks (SSDs) for main storage, HDDs remain the champions of low-cost, high-capacity data storage. That’s a big reason why we still use them in our Storage Pods. Let’s take a spin the Wayback Machine and take a look at the history of hard drives. Let’s also think about what the future might hold.

It Started With RAMAC

brl61-ibm_305_ramac

IBM made the first commercial hard disk drive-based computer and called it RAMAC – short for “Random Access Method of Accounting And Control.” Its storage system was called the IBM 350. RAMAC was big – it required an entire room to operate. The hard disk drive storage system alone was about the size of two refrigerators. Inside were stacked 50 24-inch platters.

For that, RAMAC customers ended up with less than 5 MB – that’s right, megabytes of storage. IBM’s marketing people didn’t want to make RAMAC store any more data than that. They had no idea how to convince customers they’d need more storage than that.

IBM customers forked over $3,200 for the privilege of accessing and storing that information. A MONTH. (IBM leased its systems.) That’s equivalent to almost $28,000 per month in 2016.

Sixty years ago, data storage cost $640 per megabyte, per month. At IBM’s 1956 rates for storage, a new iPhone 7 would cost you about $20.5 million a month. RAMAC was a lot harder to stick in your pocket, too.

Plug and Play

These days you can fit 2 TB onto an SD card the size of a postage stamp, but half a century ago, it was a very different story. IBM continued to refine early hard disk drive storage, but systems were still big and bulky.

By the early 1960s, IBM’s mainframe customers were hungry for more storage capacity, but they simply didn’t have the room to keep installing refrigerator-sized storage devices. So the smart folks at IBM came up with a solution: Removable storage.

The IBM 1311 Disk Storage Drive, introduced in 1962, gave rise to the use of IBM 1316 “Disk Packs” that let IBM’s mainframe customers expand their storage capacity as much as they needed (or could afford). IBM shrank the size of the disks dramatically, from 24 inches in diameter down to 14 inches. The 9-pound disk packs fit into a device about the size of a modern washing machine. Each pack could hold about 2 MB.

ibm_1311-1

For my part, I remember touring a data center as a kid in the mid-1970s and seeing removable IBM disk packs up close. They looked about the same size and dimensions that you’d use to carry a birthday cake: Large, sealed plastic containers with handles on the top.

Computers had pivoted from expensive curiosities in the business world to increasingly essential devices needed to get work done. IBM’s System/360 proved to be an enormously popular and influential mainframe computer. IBM created different models but needed flexible storage across the 360 product line. So IBM created a standard hard disk device interconnect. Other manufacturers adopted the technology, and a cottage industry was born: Third-party hard disk drive storage.

The PC Revolution

Up until the 1970s, computers were huge, expensive, very specialized devices only the biggest businesses, universities and government institutions could afford. The dropping price of electronic components, the increasing density of memory chips and other factors gave rise to a brand new industry: The personal computer.

Initially, personal computers had very limited, almost negligible storage capabilities. Some used perforated paper tape for storage. Others used audio cassettes. Eventually, personal computers would write data to floppy disk drives. And over time, the cost of hard disk drives fell enough that PC users could have one, too.

winchester-festplatte

In 1980, a young upstart company named Shugart Technology introduced a 5 MB hard disk drive designed to fit into personal computers of the day. It was a scant 5.25 inches in diameter. The drive cost $1,500. It would prove popular enough to become a de facto standard for PCs throughout the 1980s. Shugart changed its name to Seagate Technology. Yep. That Seagate.

In the space of 25 years, hard drive technology had shrunk from a device the size of a refrigerator to something less than 6 inches in diameter. And that would be nothing compared to what was to come in the next 25 years.

The Advent of RAID

An important chapter in Backblaze’s backstory appears in the late 1980s when three computer scientists from U.C. Berkeley coined the term “RAID” in a research paper presented at the SIGMOD conference, an annual event which still happens today.

RAID is an acronym that stands for “Redundant Array of Inexpensive Disks.” The idea is that you can take several discrete storage devices – hard disk drives, in this case – and combine them into a single logical unit. Dividing the work of writing and reading data between multiple devices can make data move faster. It can also reduce the likelihood that you’ll lose data.

The Berkeley researchers weren’t the first to come up with the idea, which had bounced around since the 1970s. They did coin the acronym that we still use today.

blog_60_drives

RAID is vitally important for Backblaze. RAID is how we build our Storage Pods. Our latest Storage Pod design incorporates 60 individual hard drives assembled in 4 RAID arrays. Backblaze then took the concept a further by implementing our own Reed-Solomon erasure coding mechanism to work across our Backblaze Vaults.

With our latest Storage Pod design we’ve been able to squeeze 480 TB into a single chassis that occupies 4U of rack space, or about 7 inches of vertical height in an equipment rack. That’s a far cry from RAMAC’s 5 MB of refrigerator-sized storage. 96 million times more storage, in fact.

Bigger, Better, Faster, More

Throughout the 1980s and 1990s, hard drive and PC makers innovated and changed the market irrevocably. 5.25-inch drives soon gave way to 3.5-inch drives (we at Backblaze still use 3.5-inch drives designed for modern desktop computers in our Storage Pods). When laptops gained in popularity, drives shrunk again to 2.5 inches. If you’re using a laptop that has a hard drive today, chances are it’s a 2.5-inch model.

The need for better, faster, more reliable and flexible storage also gave rise to different interfaces: IDE, SCSI, ATA, SATA, PCIe. Drive makers improved performance by increasing the spindle speed. the speed of the motor that turns the hard drive. 5,400 revolutions per minute (RPM) was standard, but 7,200 yielded better performance. Seagate, Western Digital, and others upped the ante by introducing 10,0000-RPM and eventually 15,000-RPM drives.

IBM pioneered the commercial hard drive and brought countless hard disk drive innovations to market over the decades. In 2003, IBM sold its storage division to Hitachi. The many Hitachi drives we use here at Backblaze can trace their lineage back to IBM.

Solid State Drives

Even as hard drives found a place in early computer systems, RAM-based storage systems were also being created. The prohibitively high cost of computer memory, its complexity, size, and requirement to stay powered to work prevented memory-based storage from catching on in any meaningful way. Though very specialized, expensive systems found use in the supercomputing and mainframe computer markets.

Eventually non-volatile RAM became fast, reliable and inexpensive enough that SSDs could be mass-produced, but it was still by degrees. They were incredibly expensive. By the early 1990s, you could buy a 20 MB SSD for a PC for $1,000, or about $50 per megabyte. By comparison, the cost of a spinning hard drive had dropped below $1 per megabyte, and would plummet even further.

blog-ssd-closeup

The real breakthrough happened with the introduction of flash-based SSDs. By the mid-2000s, Samsung, SanDisk and others brought to market flash SSDs that acted as drop-in replacements for hard disk drives. SSDs have gotten faster, smaller and more plentiful. Now PCs and Macs and smartphones all include flash storage of all shapes and sizes and will continue to move in that direction. SSDs provide better performance, better power efficiency, and enable thinner, lighter computer designs, so it’s little wonder.

The venerable spinning hard drive, now 60 years old, still rules the roost when it comes to cost per gigabyte. SSD makers are getting closer to parity with hard drives, but they’re still years away from hitting that point. An old fashioned spinning hard drive still gives you the best bang for your buck.

We can dream, though. Over the summer our Andy Klein got to wondering what Seagate’s new 60 TB SSD might look like in one of our Storage Pods. He had to guess at the price but based on current market estimates, an SSD-based 60-drive Storage Pod would cost Backblaze about $1.2 million.

Andy didn’t make any friends in Backblaze’s Accounting department with that news, so it’s probably not going to happen any time soon.

The Future

As computers and mobile devices have pivoted from hard drives to SSDs, it’s easy to discount the hard drive as a legacy technology that will soon be by the wayside. I’d encourage some circumspection, though. It seems every few years, someone declares the hard drive dead. Meanwhile hard drive makers keep finding ways to stay relevant.

There’s no question that the hard drive market is in a period of decline and transition. Hard disk drive sales are down year-over-year. Consumers switch to SSD or move away from Macs and PCs altogether and do more of their work on mobile devices.

Regardless, Innovation and development of hard drives continue apace. We’re populating our own Storage Pods with 8 TB hard drives. 10 TB hard drives are already shipping, and even higher-capacity 3.5-inch drives are on the horizon.

Hard drive makers constantly improve areal density – the amount of information you can physically cram onto a disk. They’ve also found ways to get more platters into a single drive mechanism then filling it with helium. This sadly does not make the drive float, dashing my fantasies of creating a Backblaze data center blimp.

So is SSD the only future for data storage? Not for a while. Seagate still firmly believes in the future of hard drives. Its CFO estimates that hard drives will be around for another 15-20 years. Researchers predict that hard drives coming to market over the next decade will store an order of magnitude more data than they do now – 100 TB or more.

Think it’s out of the question? Imagine handing a 10 TB hard drive to a RAMAC operator in 1956 and telling them that the 3.5-inch device in their hands holds two million times more data than that big box in front of them. They’d think you were nuts.

The post A History of Hard Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q3 2016: Less is More

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q3-2016/

Hard Drive Stats - 8TB Drives

In our last report for Q2 2016, we counted 68,813 spinning hard drives in operation. For Q3 2016 we have 67,642 drives, or 1,171 fewer hard drives. Stop, put down that Twitter account, Backblaze is not shrinking. In fact, we’re growing very nicely and are approaching 300 petabytes of data under our management. We have fewer drives because over the last quarter we swapped out more than 3,500 2 terabyte (TB) HGST and WDC hard drives for 2,400 8 TB Seagate drives. So we have fewer drives, but more data. Lots more data! We’ll get into the specifics a little later on, but first, let’s take a look at our Q3 2016 drive stats.

Backblaze hard drive reliability stats for Q3 2016

Below is the hard drive failure data for Q3 2016. This chart is just for the period of Q3 2016. The hard drive models listed below are data drives, not boot drives. We only list drive models that have 45 or more of that model deployed.

Q3 2016 hard drive failure rate chart

A couple of comments on the chart:

  • The models that have an annualized failure rate of 0.00% had zero hard drive failures in Q3 2016.
  • The “annualized failure rate” is computed as follows: ((Failures)/(Drive Days/365)) * 100. Therefore, consider the number of “Failures” and “Drive Days” before reaching any conclusions about the failure rate.

Less is more: The move to 8 TB drives

In our Q2 2016 drive stats post we covered the beginning of our process to migrate the data on our aging 2 TB hard drives to new 8 TB hard drives. At the end of Q2, the migration was still in process. All of the 2 TB drives were still in operation, along with 2,720 of the new 8 TB drives – the migration target. In early Q3, that stage of the migration project was completed and the “empty” 2 TB hard drives were removed from service.

We then kicked off a second wave of migrations. This wave was smaller but continued the process of moving data from the remaining 2 TB hard drives to the 8 TB based systems. As each migration finished we decommissioned the 2 TB drives and they stopped reporting daily drive stats. By the end of Q3, we had only 180 of the 2 TB drives left – four Storage Pods with 45 drives each.

The following table summarizes the shift over the 2nd and 3rd quarters.

Migration from 2TB hard drives to 8TB hard drives

As you can see, during Q3 we “lost” over 1,100 hard drives from Q2, but we gained about 12 petabytes of storage. Over the entire migration project (Q2 and Q3) we added about 900 total drives while gaining 32 petabytes of storage.

Drive migration and hard drive failure rates

A four-fold storage density increase takes care of much of the math in justifying the migration project. Even after factoring drive cost, migration costs, drive recycling, electricity, and all the other incidentals, the migration still made economic sense. The only wildcard was the failure rates of the hard drives in question. Why? The 2 TB HGST drives had performed very well. Drive failure is to be expected, but our costs go up if the new drives fail at twice or three times the rate of the 2 TB drives. With that in mind let’s take a look at the failure rates of the drives involved in the migration project.

Comparing Drive Failure Rates

The Seagate 8 TB drives are doing very well. Their annualized failure rate compares favorably to the HGST 2 TB hard drives. With the average age of the HGST drives being 66 months, their failure rate was likely to rise, simply because of normal wear and tear. The average age of the Seagate 8 TB hard drives is just 3 months, but their 1.6% failure rate during the first few months bodes well for a continued low failure rate going forward.

What about the 60 drive Storage Pods?

In Q3 we deployed 2,400 eight TB drives into two Backblaze Vaults. We used 60 drive Storage Pods in each vault. In other words, each Backblaze Vault had 1,200 hard drives and each hard drive was 8 TB. That’s 9.6 petabytes of storage in one Backblaze Vault.

Each Backblaze Vault has 9.6 petabytes of storage

As a reminder, each Backblaze Vault consists of 20 Storage Pods logically grouped together to act as one storage system. Storage Pods are spread out across a data center in different cabinets, on different circuits and on different network switches to maximize data durability. Backblaze Vaults are the backbone that powers both our cloud backup and B2 cloud storage services.

60 drive storage pod

Our Q3 switch to 60-drive Storage Pods signals the end of the line for our 45-drive systems. They’ve had a good long run. We put together the history of our Storage Pods for anyone who is interested. Over the next couple of years, all of our 45 drive Storage Pods will be replaced by 60 drive systems. Most likely this will be done as we migrate from 3 TB and 4 TB drives to larger hard drives. I hear 60 TB HAMR drives are just around the corner, although we might have to wait for the price to drop a bit.

Cumulative hard drive failure rates by model

Regardless of the drive size or the Storage Pod used, we’ll continue to track and publish our data on our hard drive test data web page. If you’re not into wading through several million rows of hard drive data, the table below shows the annualized drive failure rate over the lifetime of each of the data drive models we currently have under management. This is based on data from April 2013 through September 2016 for all data drive models with active drives as of September 30, 2016.

Drive Failure Rates as of Q3 2016

Hard drive stats webinar: Join Us!

Want more details on our Q3 drive stats? Join us for the webinar: “Hard Drive Reliability Stats: Q3 2016” on the Backblaze BrightTALK channel on Friday November 18th at 9:00am Pacific. You’ll need to subscribe to the Backblaze channel to view the webinar, but you’ll only have to do that once. From then on you’ll get invited to all future Backblaze webinars. We’ll keep the webinar to 45 minutes or less, including a few questions – sign up today.

Recap

Less is more! The migrations are finished for the moment, although we are evaluating the migration from 3 TB drives to 10 TB drives. First through, we’d like to give our data center team a chance to catch their breath. The early returns on the Seagate 8 TB drives look good. The 1.6% failure rate at the 3-month point is the best we’ve seen from any Seagate drive we’ve used at the same average age. We’ll continue to track this going forward.

Next time we’ll cover our Q4 drive stats, along with a recap of the lifetime performance of every data drive we’ve used past and present. That should be fun.

Looking for the tables, charts, and images from this post? You can download them from Backblaze B2 as a ZIP file (2.3 MB).

The post Hard Drive Stats for Q3 2016: Less is More appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How Heavy is the Backblaze Cloud?

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/how-heavy-is-backblaze-cloud/

Backblaze Cloud
The question “How heavy is the Backblaze Cloud?” came up in a marketing meeting one Monday afternoon. “One million pounds,” someone guessed, “A hundred thousand pounds,” someone else said trying to sound sure. The truth is we had no idea. “Well, how much does a real cloud weigh?” someone else asked, and at that point we knew we had to find out more…

The Backblaze Cloud

Over the last few years, the folks at our Sacramento data center have gotten used to me asking “odd” questions. For example, how loud is the data center or what are those sticky mats on the floor. See our “A Day in the Life of a Datacenter” post if you’re curious.

With that in mind, I sent over a list of questions to our data center manager, including the following:

  1. What does a 45-drive Backblaze Storage Pod weigh?
  2. What does a 60-drive Backblaze Storage Pod weigh?
  3. How much does an empty rack weigh?

I also asked for the weight of our networking equipment, non-storage servers, spare drives, and wiring. I did not ask about Guido, his friend Luigi, or other maintenance equipment. I just needed enough information to calculate the weight of the hardware we use to acquire and store data.

Normally the data center team in Sacramento is very responsive to my oddball requests. This time they were silent. I assumed they were busy. I tried again about a month later and this time I told them that I wanted to calculate the weight of the Backblaze Cloud. More silence. I decided to go visit one of data center team members in person with my request. After he stopped laughing, he “slacked” the data center manager, who at that moment realized I was actually serious and not playing a stupid marketing trick. I had my answers an hour later.

A 45-drive Backblaze Storage Pod weighs 140 pounds and the 60-drive Storage Pod weighs 158 pounds and so on. Based on the information provided, I calculated that the hardware we use to power the Backblaze Cloud weighs a little over 250,000 pounds, or about 113,400 kilograms.

The Backblaze Cloud weighs over 250,000 pounds

One Terabyte Per Pound

That 250,000 pounds of Backblaze Cloud hardware is used to store a little over 250 Petabytes of data. That means it takes one pound of hardware to store one terabyte of data.

Let’s take a minute and give this weight-to-data relationship the “sniff-test” by looking at the weight of a 1TB hard drive. Since we use 3 ½” drives to store data here are a few 3 ½” 1TB drives we found.

Drive Type Model Weight
Seagate 1TB Desktop Internal ST1000DM003 14.1 ounces
WD Red 1TB NAS Internal WD10EFRX 15.8 ounces
Seagate 1TB FreeAgent GoFlex External STAC1000103 2.4 pounds
WD 1TB My Book External WDBACW0010HBK-NESN 2.6 pounds

Our 1 TB per pound weight calculation fits nicely between the weight of the internal drive models and the weight of the external drive models. Also, given that 45 (or 60) internal drives share the weight of one enclosure (Storage Pod) it makes sense our weight calculation is closer to that of the internal drives. Sniff-test passed!

Comparing Clouds

Perhaps you’ve been asked by an inquisitive nine-year old, “How much does a cloud weigh?” If you answered, “its heavy” and then changed the subject perhaps we can help, although there seems to be very different answers to this supposedly simple question.

How much does a cloud weigh?
a. 216,000 pounds (NPR)
b. 1.1 million pounds (mental_floss)
c. 8.8 million pounds (Zidbits)
d. All of the above.

Technically a cloud has little or no weight, otherwise you’d be crushed to death the next time the fog rolled in. The question should be, “How much is the mass of a cloud?”, but most nine-year old children haven’t learned about mass yet.

Back to the question at hand, the correct answer is “d.” How can all three be correct? Simple, they used different types of clouds, different sized clouds, or both. For example, a white fluffy cumulus cloud holds about 1/2 gram of water per cubic meter, while a cumulonimbus cloud can hold up to 5 grams of water per cubic meter – 10 times as much in the same volume.

Based on what we’ve learned so far, we can actually determine how big a fluffy cumulus cloud would have to be to have the same weight (250,000 pounds/ 113,400 kg) as the Backblaze Cloud:

  • Cumulus cloud weight = 0.5 grams per cubic meter
  • Backblaze Cloud weight = 113,400 Kilograms or 113,400,000 grams
  • Cumulus cloud volume = 113,400,000 grams / (0.5 grams * 1 cubic meter)
  • Cumulus cloud volume = 226,800,000 cubic meters

Based on this, if we had a cumulus cloud that was 16.5 feet (5 meters) high, about the height of a data center, it would cover an area of 45,360,000 square meters. That’s 45.36 square kilometers or 17.51 square miles.

Comparing clouds, the Backblaze Cloud is equal to a fluffy white cumulus cloud that is around 16.5 feet high and 17.5 miles square. Thank goodness our data center team uses their space a little more efficiently.

Afterthoughts

If nothing else, the next time an inquisitive nine-year old asks, “How much does a cloud weigh?” you’ll have an answer, or two, or three. Of course the next question could be “Why don’t clouds fall from the sky?” and you’ll have to talk about weight and mass and volume and thermal updrafts and all that, or you can just change the subject…

The one thing I do know is that no two clouds are alike. That’s true for real clouds or data storage clouds. That means our cloudy math could be perfect for sunny days in Hawaii and completely wrong for rainy days in Brazil, or vice-versa. If you can shed some light on the weight of a cloud – the real kind or the data storage kind – let us know in the comments.

The post How Heavy is the Backblaze Cloud? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Malware Infects Network Hard Drives

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/09/malware_infects.html

The malware “Mal/Miner-C” infects Internet-exposed Seagate Central Network Attached Storage (NAS) devices, and from there takes over connected computers to mine for cryptocurrency. About 77% of all drives have been infected.

Slashdot thread.

EDITED TO ADD (9/13): More news.

Seagate Introduces a 60TB SSD – Is a 3.6PB Storage Pod Next?

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/seagate-60tb-ssd-36pb-storage-pod-next/

60TB SSD

Seagate just introduced a 60TB SDD. Wow. As Backblaze scurries about upgrading from 2TB to 8TB hard drives in our Storage Pods, we just have to stop for the moment and consider what a 3.6PB Storage Pod would look like and how much it would cost. Let’s dive in to the details…

What we know about the Seagate 60TB SDD

A number of sources (engadget, Computerworld, Mashable, and Tom’s Hardware, to name a few) covered the news. From the Backblaze Storage Pod point of view here are some important things to know. The Seagate 60TB SSD comes in a 3.5-inch form factor. It uses a 12 Gbps SAS interface and consumes 15 watts of power (average while active.) There are a few other fun facts in the articles above, but these will work for now as we design our hypothetical 3.6PB Storage Pod.

What we don’t know today

We don’t know the price. Seagate is calling this enterprise storage, which in Backblaze vernacular translates to spending more money for the same thing. Let’s see if we can at least estimate a list price so we can do our math later on. We’ll start with the Samsung 16TB SSD drive recently introduced. As they make their way to the market, their price is roughly $7,000 each. Using that number, simple math would get us a price for the Seagate 60TB drives to be $26,250. That’s seems high, even for enterprise storage, so let’s give a discount of 25% for scalability bringing us to $19,687.50 each. Applying marketing math, I’ll round that up to $19,995.00 each Seagate 60TB SSD. That’s as good a WAG as any, let’s use that for the price.

Our 3.6PB Storage Pod design

The most economical way for us to proceed would be to use our current 60-drive chassis (Storage Pod 6.0) with as few modifications as possible. On the plus side, the Seagate 60TB drive has a 3.5” form factor, meaning it will fit very nicely into our 60-drive Storage Pod chassis. On the minus side, we currently use SATA backplanes, cables and boards throughout, so there’s a bit of work switching over to SAS with the hard part being 5-port SAS backplanes. In a very quick search, we could only locate one 5-port SATA backplane and we weren’t sure it was being made anymore. Also, we’d need to update the mother-board, CPU, memory, and probably convert to 100Gb network cards, but since these are all readily available parts, that’s fairly straight-forward (says the guy who is not the designer.)

We do have the time to redesign the entire Storage Pod to work with SAS drives given that Seagate doesn’t expect to deliver the 60TB drives until 2017, but before we do anything radical, let’s figure out if it’s worth it.

Drive math

Currently the Seagate 8TB drives we use (model: ST8000DM002) today lists for $295.95 on Amazon. That’s about $0.037/GB or $37/TB. Using our $19,995 price for the Seagate 60TB SSD, we get about $0.333/GB or $333/TB. That’s 9 times the cost, meaning a 60-drive Storage Pod filled with the Seagate 60TB drives would cost $1.2M each.

Let’s look at it another way. The Seagate 60TB drives give us 3.6PB in one Storage Pod. What would it cost for us to have 3.6PB of storage using just 8TB hard drives in a Storage Pod? To start, it would take roughly 7.5 Storage Pods full of 8TB drives to give us 3.6PB. Each 8TB Storage Pod costs us about $20,000 each or $150,000 for the 7.5 Storage Pods needed to get us 3.6PB. Since we can’t have a half of a Storage Pod, let’s go with 8 Storage Pods at a total cost of $160,000, which is still a bit lower than the $1.2 Million price tag for the 60TB units.

What about the increase in Storage density? In simple terms, 1 rack of Storage Pods filled with 60TB SSDs would replace 8 racks of storage using the 8TB Storage Pods. I won’t go into the details of the math here, but getting 8 times the storage for 9 times the cost doesn’t work out well. Other factors, including the 70% increase in electrical draw per rack, make the increase in storage density currently moot.

If you build it…

There is one more thing to consider in building a Storage Pod full of Seagate 60TB drives, Backblaze Vaults. As a reminder, a Backblaze Vault consists of 20 Storage Pods that act as a single storage unit. Data is spread across the 20 Storage Pods to improve durability and overall performance. To populate a Backblaze Vault requires 1,200 drives. Using our $19,995 price per drive that’s roughly $24M to populate one Backblaze Vault. Of course that would give us 72PB of storage in one Vault. Given we’re adding about 25PB of storage a quarter right now, it would give us 3 quarters of storage runway. Then we’d get to do it again. On the bright side, our Ops folks would have to deploy only one Backblaze Vault every 8 or 9 months.

Breaking News

Toshiba just announced a 100TB SSD, due out in 2017. No pricing is available yet and I’m tired of doing math for the moment but a 6PB Backblaze Storage Pod sounds amazing.

In the mean time, if either Seagate or Toshiba need someone to test say 1,200 of their new SSD drives for free we might be interested, just saying.

The post Seagate Introduces a 60TB SSD – Is a 3.6PB Storage Pod Next? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q2 2016

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q2-2016/

Hard Drive Reliability

Q2 2016 saw Backblaze: introduce 8TB drives into our drive mix, kickoff a pod-to-vault migration of over 6.5 Petabytes of data, cross over 250 Petabytes of data stored, and deploy another 7,290 drives into the data center for a total of 68,813 spinning hard drives under management. With all the ins-and-outs let’s take a look at how our hard drives fared in Q2 2016.

Backblaze hard drive reliability for Q2 2016

Below is the hard drive failure data for Q2 2016. This chart is just for the period of Q2 2016. The hard drive models listed below are data drives (not boot drives), and we only list models which have 45 or more drives of that model deployed.

Q2 2016 Hard Drive Failures RatesA couple of observations on the chart:

  1. The models that have an annualized failure rate of 0.00% had zero hard drive failures in Q2 2016.
  2. The annualized failure rate is computed as follows: ((Failures)/(Drive Days/365)) * 100. Therefore consider the number of “Failures” and “Drive Days” before reaching any conclusions about the failure rate.

Later in this post we’ll review the cumulative statistics for all of our drives over time, but first let’s take a look at the new drives on the block.

The 8TB hard drives have arrived

For the last year or so we kept saying we were going to deploy 8TB drives in quantity. We did deploy 45 8TB HGST drives, but deploying these drives en masse did not make economic sense for us. Over the past quarter, 8TB drives from Seagate became available at a reasonable price, so we purchased and deployed over 2,700 in Q2 with more to come in Q3. All of these drives were deployed in Backblaze Vaults with each vault using 900 drives, that’s 45 drives in each of the 20 Storage Pods that form a Backblaze Vault.

Yes, we said 45 drives in each storage pod, so what happened to our 60 drive Storage Pods? In short, we wanted to use the remaining stock of 45 drive Storage Pods before we started using the 60 drive pods. We have built two Backblaze Vaults using the 60 drive pods, but we filled them with 4- and 6TB drives. The first 60 drive Storage Pod filled with 8TB drives (total 480TB) will be deployed shortly.

Hard Drive Migration – 85 Pods to 1 Vault

One of the reasons that we made the move to 8TB drives was to optimize storage density. We’ve done data migrations before, for example, from 1TB pods to 3TB and 4TB pods. These migrations were done one or two Storage Pods at a time. It was time to “up our game.” We decided to migrate from individual Storage Pods filled with HGST 2TB drives, average age 64 months, to a Backblaze Vault filled with 900 8TB drives.

Backblaze Data Migration

We identified and tagged 85 individual Storage Pods to migrate from. Yes, 85. The total amount of data to be migrated was about 6.5PB. It was a bit sad to see the 2TB HGST drives go as they have been really good over the years, but getting 4 times as much data into the same space was just too hard to resist.

The first step is to stop all data writes on the donor HGST 2TB Storage Pods. We then kicked off the migration by starting with 10 Storage Pods. We then added 10 to 20 donor pods to the migration every few hours until we got to 85 pods. The migration process is purposely slow as we want to ensure that we can still quickly read files from the 85 donor pods so that data restores are not impacted. The process is to copy a given RAID-array from a Storage Pod to a specific “Tome” in a Backblaze Vault. Once all the data in a given RAID-array has been copied to a Tome, we move on to the next RAID-array awaiting migration and continue the process. This happens in parallel across the 45 Tomes in a Backblaze Vault.

We’re about 50% of the way through the migration with little trouble. We did have a Storage Pod in the Backblaze Vault go down. That didn’t stop the migration, as vaults are designed to continue to operate under such conditions, but more on that in another post.

250 Petabytes of data stored

Recently we took a look at the growth of data and the future of cloud storage. Given the explosive growth in data as a whole it’s not surprising that Backblaze added another 50PB of customer data over the last 2 quarters and that by mid-June we had passed the 250 Petabyte mark in total data stored. You can see our data storage growth below:

Backblaze Data Managed

Back in December 2015, we crossed over the 200 Petabyte mark and at that time predicted we would cross 250PB in early Q3 2016. So we’re a few weeks early. We also predicted we would cross 300PB in late 2016. Given how much data we are adding with B2, it will probably be sooner, we’ll see.

Cumulative hard drive failure rates by model

In the table below we’ve computed the annualized drive failure rate for each drive model. This is based on data from April 2013 through June 2016.

Q2 2016 Cumulative Hard Drive Failure Rates

Some people question the usefulness of the cumulative Annualized Failure Rate. This is usually based on the idea that drives entering or leaving during the cumulative period skew the results because they are not there for the entire period. This is one of the reasons we compute the Annualized Failure Rate using “Drive Days”. A Drive Day is only recorded if the drive is present in the system. For example, if a drive is installed on July 1st and fails on August 31st, it adds 62 drive days and 1 drive failure to the overall results. A drive can be removed from the system because it fails or perhaps it is removed from service after a migration like the 2TB HGST drives we’ve covered earlier. In either case, the drive stops adding Drive Days to the total, allowing us to compute an Annualized Failure Rate over the cumulative period based on what each of the drives contributed during that period.

As always, we’ve published the Q2 2016 data we used to compute these drive stats. You can find the data files along with the associated documentation on our hard drive test data page.

Which hard drives do we use?

We’ve written previously about our difficulties in getting drives from Toshiba and Western Digital. Whether it’s poor availability or an unexplained desire not to sell us drives, we don’t have many drives from either manufacturer. So we use a lot of Seagate drives and they are doing the job very nicely. The table below shows the distribution of the hard drives we are currently using in our data center.

Q2 2016 Hard Drive Distribution

Recap

The Seagate 8TB drives are here and are looking good. Sadly we’ll be saying goodbye to the HGST 2TB drives, but we need the space. We’ll miss those drives, they were rock stars for us. The 4TB Seagate drives are our workhorse drives today and their 2.8% annualized failure rate is more than acceptable for us. Their low failure rate roughly translates to an average of one drive failure per Storage Pod per year. Over the next few months expect more on our migrations, a look at the day in the life of a data center tech, and an update of the “bathtub” curve, i.e. hard drive failure over time.

The post Hard Drive Stats for Q2 2016 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-reliability-stats-q1-2016/

Q1 2016 hard Drive Stats

For Q1 2016 we are reporting on 61,590 operational hard drives used to store encrypted customer data in our data center. There are 9.5% more hard drives in this review versus our last review when we evaluated 56,224 drives. In Q1 2016, the hard drives in our data center, past and present, totaled over one billion hours in operation to date. That’s nearly 42 million days or 114,155 years worth of spinning hard drives. Let’s take a look at what these hard drives have been up to.

Backblaze hard drive reliability for Q1 2016

Below are the hard drive failure rates for Q1 2016. These are just for Q1 and are not cumulative, that chart is later.

Q1 2016 Hard Drive Stats

Some observations on the chart:

  1. The list totals 61,523 hard drives, not 61,590 noted above. We don’t list drive models in this chart of which we have less than 45 drives.
  2. Several models have an annual failure rate of 0.00%. They had zero hard drive failures in Q1 2016.
  3. Failure rates with a small number of failures can be misleading. For example, the 8.65% failure rate of the Toshiba 3TB drives is based on one failure. That’s not enough data to make a decision.
  4. The overall Annual Failure Rate of 1.84% is the lowest quarterly number we’ve ever seen.

Cumulative hard drive reliability rates

We started collecting the data used in these hard drive reports on April 10, 2013, just about three years ago. The table below is cumulative as of 3/31 for each year since 4/10/2013.

Cumulative Q1 2016 Hard Drive Failure Rates

One billion hours of spinning hard drives

Let’s take a look at what the hard drives we own have been doing for one billion hours. The one billion hours is a sum of all the data drives, past and present, in our data center. For example, it includes the WDC 1.0TB drives that were recently retired from service after an average of 6 years in operation. Below is a chart of hours in service to date ordered by drive hours:

Q1 2016 Hard Drive Service Hours

The “Others” line accounts for the drives that are not listed because there are or were fewer than 45 drives in service.

In the table above, the Seagate 4TB drive leads in “hours in service” but which manufacturer has the most hours in service? The chart below sheds some light on this topic:
Hard Drive Service Hours by Manufacturer

The early HGST drives, especially the 2- and 3TB drives, have lasted a long time and have provided excellent service over the past several years. This “time-in-service” currently outweighs the sheer quantity of Seagate 4 TB drives we have purchased and placed into service the last year or so.

Another way to look at drive hours is to see which drives, by size, have the most hours. You can see that in the chart below.
Hard Drive Service Hours by Drive Size

The 4TB drives have been spinning for over 580 million hours. There are 48,041 4TB drives which means each drive on average had 503 drive days of service, or 1.38 years. The annualized failure rate for all 4TB drives lifetime is 2.12%.

Hard Drive Reliability by Manufacturer

The drives in our data center come from four manufacturers. As noted above, most of them are from HGST and Seagate. With that in mind, here’s the hard drive failure rates by manufacturer, we’ve combined all of the drives, regardless of size, for a given manufacturer. The results are divided into one-year periods ending on 3/31 of 2014, 2015, and 2016.
Hard Drive Failure Rates by Manufacturer

Why are there less than 45 drives?

A couple of times we’ve noted that we don’t display drive models with fewer than 45 drives. Why would we have less than 45 drives given we need 45 drives to fill a Storage pod? Here are few of the reasons:

  1. We once had 45 or more drives, but some failed and we couldn’t get replacements of that model and now we have less than 45.
  2. They were sent to us as part of our Drive Farming efforts a few years back and we only got a few of a given model. We needed drives and while we liked using the same model, we utilized what we had.
  3. We built a few Frankenpods that contained drives that were the same size in terabytes but had different models and manufacturers. We kept all the drives in a RAID array the same model, but there could be different models in each of the 3 RAID arrays in a given Frankenpod.

Regardless of the reason, if we have less than 45 drives of the same model, we don’t display them in the drive stats. We do however include their information in any “grand total” calculations such as drive space available, hours in service, failures, etc.

Buying drives from Toshiba and Western Digital

We often get asked why we don’t buy more WDC and Toshiba drives. The short answer is that we’ve tried. These days we need to purchase drives in reasonably large quantities, 5,000 to 10,000 at a time. We do this to keep the unit cost down and so we can reliably forecast our drive cost into the future. For Toshiba we have not been able to find their drives in sufficient quantities at a reasonable price. For WDC, we sometimes get offered a good price for the quantities we need, but before the deal gets done something goes sideways and the deal doesn’t happen. This has happened to us multiple times, as recently as last month. We would be happy to buy more drives from Toshiba and WDC, if we could, until then we’ll continue to buy our drives from Seagate and HGST.

What about using 6-, 8- and 10TB drives?

Another question that comes up is why the bulk of the drives we buy are 4TB versus the 5-, 6-, 8- and 10TB drives now on the market. The primary reason is that the price/TB for the larger drives is still too high, even when considering storage density. Another reason is availability of larger quantities of drives. To fill a Backblaze Vault built from 20 Storage Pod 6.0 servers, we need 1,200 hard drives. We are filling 3+Backblaze Vaults a month, but the larger size drives are hard to find in quantity. In short, 4TB drives are readily available at the right price, with 6- and 8TB drives getting close on price, but still limited in the quantities we need.

What is a failed hard drive?

For Backblaze there are three reasons a drive is considered to have “failed”:

  1. The drive will not spin up or connect to the OS.
  2. The drive will not sync, or stay synced, in a RAID Array (see note below).
  3. The Smart Stats we use show values above our thresholds.

Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our own open-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group

A different look at Hard Drive Stats

We publish the hard drive stats data on our website with the Q1 2016 results there as well. Over the years thousands of people have downloaded the files. One of the folks who downloaded the data was Ross Lazarus, a self-described grumpy computational biologist. He analyzed the data using Kaplan-Meier statistics and plots, a technique typically used for survivability analysis. His charts and analysis present a different way to look at the data and we appreciate Mr. Lazarus taking the time to put this together. If you’ve done similar analysis of our data, please let us know in the comments section below – thanks.

The post One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats appeared first on Backblaze Blog | The Life of a Cloud Backup Company.

Storage Pod 6.0: Building a 60 Drive 480TB Storage Server

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/open-source-data-storage-server/

Storage Pod 6.0
Storage Pod 6.0 deploys 60 off-the-shelf hard drives in a 4U chassis to lower the cost of our latest data storage server to just $0.036/GB. That’s 22 percent less than our Storage Pod 5.0 storage server that used 45 drives to store data for $0.044/GB. The Storage Pod 6.0 hardware design is, as always, open source so we’ve included the blueprints, STEP files, wiring diagrams, build instructions and a parts list so you can build your very own Storage Pod. Your cost may be a bit more, but it is possible for you to build a 4U server with 480TB of data storage for less than a nickel ($0.05) a gigabyte – read on.

A little Storage Pod history

In 2009, Storage Pod 1.0 changed the landscape in data storage servers by delivering 67.5TB of storage in a 4U box for just $0.11/GB – that was up to 10 times lower than comparable systems on the market at the time. We also open-sourced the hardware design of Storage Pod 1.0 and companies, universities, and even weekend hobbyist started building their own Storage Pods.

Over the years we introduced updates to the Storage Pod design, driving down the cost while improving the reliability and durability with each iteration. Storage Pod 5.0 marked our initial use of the Agile manufacturing and design methodology which helped identify and squeeze out more costs, driving our cost per GB of storage below $0.05. Agile also enabled us to manage a rapid design prototyping process that allowed us stretch the Storage Pod chassis to include 60 drives then produce 2-D and 3-D specifications, a build book, a bill of materials and update our manufacturing and assembly processes for the new design – Storage Pod 6.0. All of this in about 6 months.

What’s new in Storage Pod 6.0

60 drive storage server

What’s new is 60 drives in a 4U chassis. That’s a 33 percent increase to the storage density in the same rack space. Using 4TB drives in a 60-drive Storage Pod increases the amount of storage in a standard 40U rack from 1.8 to 2.4 Petabytes. Of course, by using 8TB drives you’d get a 480TB data storage server in 4U server and 4.8 Petabytes in a standard rack.

When looking at what’s new in Storage Pod 6.0 it would easy to say it has 60 drives and stop there. After all, the Motherboard, CPU, memory, SATA cards, and backplanes we use didn’t change from 5.0. But expanding to 60 drives created all kinds of things to consider, for example:

  • How long do you make the chassis before it is too long for the rack?
  • Will we need more cooling?
  • Will the power supplies need to be upgraded?
  • Will the SATA cables be too long? The maximum spec’d length is 1 meter.
  • Can the SATA cards keep up with the 15 more drives? Or will we need to upgrade them?
  • Will the CPU and the motherboard be able to handle the additional data load of 15 more drives?
  • Will more or faster memory be required?
  • Will the overall Storage Pod be correctly balanced between CPU, memory, storage and other components so that nothing is over/under-spec’ed?
  • What hard drives will work with this configuration? Would we have to use enterprise drives? Just kidding!

Rapidly iterating to the right design

As part of the prototyping effort we built multiple configurations and Backblaze Labs put each configuration through its paces. To do this we assembled a Backblaze Vault with 20 prototype Storage Pods in three different configurations. Since each Storage Pod in a Backblaze Vault is expected to perform similarly, we monitored and detected those Storage Pods that were lagging as well as those that were “bored”. By doing this were were able to determine that most of the components in Storage Pod 6.0 did not need to be upgraded to achieve optimal performanace in Backblaze Vaults utlizing 60 drive Storage Pods.

We did make some changes to Storage Pod 6.0 however:

  • Increased the chassis by 5 ½” from 28 1/16” to 33 9/16” in length. Server racks are typically 29” in depth, more on that later.
  • Increased the length of the backplane tray to support 12 backplanes.
  • Added 1 additional drive bracket to handle another row of 15 drives.
  • Added 3 more backplanes and 1 more SATA card.
  • Added 3 more SATA cables.
  • Changed the routing to of the SATA-3 cables to stay within the 1-meter length spec.
  • Updated the pigtail cable design so we could power the three additional backplanes.
  • Changed the routing of the power cables on the backplane tray.
  • Changed the on/off switch retiring the ele-302 and replacing it with the Chill-22.
  • Increased the length of the lid over the drive bay 22 7/8”.

That last item, increasing the length of the drive bay lid, led to a redesign of both lids. Why?

Lids and Tabs

The lid from Storage Pod 5.0 (on the left above) proved to be difficult to remove when it was stretched another 4+ inches. The tabs didn’t provide enough leverage to easily open the longer drive lid. As a consequence Storage Pod 6.0 has a new design (shown on the right above) which provides much better leverage. The design in the middle was one of the prototype designs we tried, but in the end the “flame” kept catching the fingers of the ops folks when they opened or closed the lid.

Too long for the server rack?

The 6.0 chassis is 33 9/16” in length and 35 1/16” with the lids on. A rack is typically 29” in depth, leaving 4+ inches of Storage Pod chassis “hanging out.” We decided to keep the front (Backblaze logo side) aligned to the front of the rack and let the excess hang off the back in the warm aisle of the datacenter. A majority of a pod’s weight is in the front (60 drives!) so the rails support this weight. The overhang is on the back side of the rack, but there’s plenty of room between the rows of racks, so there’s no issue with space. We’re pointing out the overhang so if you end up building your own Storage Pod 6.0 server, you’ll leave enough space behind, or in front, of your rack for the overhang.

The cost in dollars

There are actually three different prices for a Storage Pod. Below are the costs of each of these scenarios to build a 180TB Storage Pod 6.0 storage server with 4TB hard drives:

How Built Total Cost Description
Backblaze $8,733.73 The cost for Backblaze given that we purchase 500+ Storage Pods and 20,000+ hard drives per year. This includes materials, assembly, and testing.
You Build It $10,398.57 The cost for you to build one Storage Pod 6.0 server by buying the parts and assembling it yourself.
You Buy It $12,849.40 The cost for you to purchase one already assembled Storage Pod 6.0 server from a third-party supplier and then purchase and install 4TB hard drives yourself.
These prices do not include packaging, shipping, taxes, VAT, etc.

Since we increased the number of drives from 45 to 60, comparing the total cost of Storage Pod 6.0 to previous the 45-drive versions isn’t appropriate. Instead we can compare them using the “Cost per GB” of storage.

The Cost per GB of storage

Using the Backblaze cost for comparison, below is the Cost per GB of building the different Storage Pod versions.

Storage Pod version

As you can see in the table, the cost in actual dollars increased by $760 with Storage Pod 6.0, but the Cost per GB decreased nearly a penny ($0.008) given the increased number of drives and some chassis design optimizations.

Saving $0.008 per GB may not seem very innovative, but think about what happens when that trivial amount is multiplied across the hundreds of Petabytes of data our B2 Cloud Storage service will store over the coming months and years. A little innovation goes a long way.

Building your own Storage Pod 6.0 server

You can build your own Storage Pod. Here’s what you need to get started:

Chassis – We’ve provided all the drawings you should need to build (or to have built) your own chassis. We’ve had multiple metal bending shops use these files to make a Storage Pod chassis. You get to pick the color.

Parts – In Appendix A we’ve listed all the parts you’ll need for a Storage Pod. Most of the parts can be purchased online via Amazon, Newegg, etc. As noted on the parts list, some parts are purchased either through a distributor or from the contract assemblers.

Wiring – You can purchase the power wiring harness and pigtails as noted on the parts list, but you can also build your own. Whether you build or buy, you’ll want to download the instructions on how to route the cables in the backplane tray.

Build Book – Once you’ve gathered all the parts, you’ll need the Build Book for step-by-step assembly instructions.

As a reminder, Backblaze does not sell Storage Pods, and the design is open source, so we don’t provide support or warranty for people who choose to build their own Storage Pod. That said, if you do build your own, we’d like to hear from you.

Building a 480TB Storage Pod for less than a $0.05 per GB

We’ve used 4TB drives in this post for consistency, but we have in fact built Storage Pods with 5-, 6- and even 8-TB drives. If you are building a Storage Pod 6.0 storage server, you can certainly use higher capacity drives. To make it easy, the chart below is your estimated cost if you were to build your own Storage Pod using the drives noted. We used the lowest “Street Price” from Amazon or Newegg for the price of the 60 hard drives. The list is sorted by the Cost per GB (lowest to highest). The (*) indicates we use this drive model in our datacenter.
Storage Pod Cost per GB
As you can see there are multiple drive models and capacities you can use to achieve a Cost per GB of $0.05 or less. Of course we aren’t counting your sweat-equity in building a Storage Pod, nor do we include the software you are planning to run. If you are looking for capacity, think about using the Seagate 8TB drives to get nearly a half a petabyte of storage in a 4U footprint (albeit with a 4” overhang) for just $0.047 a GB. Total cost: $22,600.

What about SMR drives?

Depending on your particular needs, you might consider using SMR hard drives. An SMR drive stores data more densely on each disk platter surface by “overlapping” tracks of data. This lowers the cost to store data. The downside is that when data is deleted, the newly freed space can be extremely slow to reuse. As such SMR drives are generally used for archiving duties where data is written sequentially to a drive with few, and preferably no, deletions. If this type of capability fits your application, you will find SMR hard drives to very inexpensive. For example, a Seagate 8TB Archive drive (model: ST8000AS0002) is $214.99, making the total cost for a 480TB Storage Pod 6.0 storage server only $16,364.07 or a very impressive $0.034 per GB. By the way, if you’re looking for off-site data archive storage, Backblaze B2 will store your data for just $0.005/GB/month.

Buying a Storage Pod

Backblaze does not sell Storage Pods or parts. If you are interested in buying a Storage Pod 6.0 storage server (without drives), you can check out the folks at Backuppods. They have partnered with Evolve Manufacturing to deliver Backblaze-inspired Storage Pods. Evolve Manufacturing is the contract manufacturer used by Backblaze to manufacture and assemble Storage Pod versions 4.5, 5.0 and now 6.0. Backuppods.com offers a fully assembled and tested Storage Pod 6.0 server (less drives) for $5,950.00 plus shipping, handling and tax. They also sell older Storage Pod versions. Please check out their website for the models and configurations they are currently offering.

Appendix A: Storage Pod 6.0 Parts List

Below is the list of parts you’ll need to build your own Storage Pod 6.0. The prices listed are “street” prices. You should be able to find these items online or from the manufacturer in quantities sufficient to build one Storage Pod. Good luck and happy building.

Item
Qty
Price
Total
Notes
4U Custom Chassis
Includes case, supports, trays, etc.
1
$995.00
$995.00
1
Power Supply
EVGA Supernova NEX750G
2
$119.90
$239.98
On/Off Switch & Cable
Primochill 120-G1-0750-XR (Chill-22)
1
$14.95
$14.95
Case Fan
FAN AXIAL 120X25MM VAPO 12VDC
3
$10.60
$31.80
Dampener Kits
Power Supply Vibration Dampener
2
$4.45
$8.90
Soft Fan Mount
AFM03B (2 tab ends)
12
$0.42
$4.99
Motherboard
Supermicro MBD-X9SRH-7TF-O (MicroATX)
1
$539.50
$539.50
CPU Fan
DYNATRON R13 1U Server CPU FAN
1
$45.71
$45.71
CPU
Intel XEON E5 -1620 V2 (Quad Core)
1
$343.94
$343.94
8GB RAM
PC3-12800 DDR3-1600MHz 240-Pin
4
$89.49
$357.96
Port Multiplier Backplanes
5 Port Backplane (Marvell 9715 chipset)
12
$45.68
$548.10
2, 1
SATA III Card
4-post PCIe Express (Marvell 9235 chipset)
3
$57.10
$171.30
2, 1
SATA III Cable
SATA cables RA-to-STR 1M locking
12
$3.33
$39.90
3, 1
Cable Harness – PSU1
24-pin – Backblaze to Pigtail
1
$33.00
$33.00
1
Cable Harness – PSU2
20-pin – Backblaze to Pigtail
1
$31.84
$31.84
1
Cable Pigtail
24-pin – EVGA NEX750G Connector
2
$16.43
$16.43
1
Screw: 6-32 X 1/4 Phillips PAN ZPS
12
$0.015
$1.83
4
Screw: 4-40 X 5/16 Phillips PAN ZPS ROHS
60
$0.015
$1.20
4
Screw: 6-32 X 1/4 Phillips 100D Flat ZPS
39
$0.20
$7.76
4
Screw: M3 X 5MM Long Phillips, HD
4
$0.95
$3.81
Standoff: M3 X 5MM Long Hex, SS
4
$0.69
$2.74
Foam strip for fan plate – 1/2″ x 17″ x 3/4″
1
$0.55
$0.55
Cable Tie, 8.3″ x 0.225″
4
$0.25
$1.00
Cable Tie, 4″ length
2
$0.03
$0.06
Plastic Drive Guides
120
$0.25
$30.00
1
Label,Serial-Model,Transducer, Blnk
30
$0.20
$6.00
Total
$3,494.67

NOTES:

  • May be able to be purchased from backuppods.com, price may vary.
  • Sunrich and CFI make the recommended backplanes and Sunrich and Syba make the recommended SATA Cards.
  • Nippon Labs makes the recommended SATA cables, but others may work.
  • Sold in packages of 100, used 100 package price for Extended Cost.

 

The post Storage Pod 6.0: Building a 60 Drive 480TB Storage Server appeared first on Backblaze Blog | The Life of a Cloud Backup Company.