In November 2013, the first commercially available helium-filled hard drive was introduced by HGST, a Western Digital subsidiary. The 6 TB drive was not only unique in being helium-filled, it was for the moment, the highest capacity hard drive available. Fast forward a little over 4 years later and 12 TB helium-filled drives are readily available, 14 TB drives can be found, and 16 TB helium-filled drives are arriving soon.
Backblaze has been purchasing and deploying helium-filled hard drives over the past year and we thought it was time to start looking at their failure rates compared to traditional air-filled drives. This post will provide an overview, then we’ll continue the comparison on a regular basis over the coming months.
The Promise and Challenge of Helium Filled Drives
We all know that helium is lighter than air — that’s why helium-filled balloons float. Inside of an air-filled hard drive there are rapidly spinning disk platters that rotate at a given speed, 7200 rpm for example. The air inside adds an appreciable amount of drag on the platters that in turn requires an appreciable amount of additional energy to spin the platters. Replacing the air inside of a hard drive with helium reduces the amount of drag, thereby reducing the amount of energy needed to spin the platters, typically by 20%.
We also know that after a few days, a helium-filled balloon sinks to the ground. This was one of the key challenges in using helium inside of a hard drive: helium escapes from most containers, even if they are well sealed. It took years for hard drive manufacturers to create containers that could contain helium while still functioning as a hard drive. This container innovation allows helium-filled drives to function at spec over the course of their lifetime.
Checking for Leaks
Three years ago, we identified SMART 22 as the attribute assigned to recording the status of helium inside of a hard drive. We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute. It appears the normalized and raw values for SMART 22 currently report the same value, which starts at 100 and goes down.
To date only one HGST drive has reported a value of less than 100, with multiple readings between 94 and 99. That drive continues to perform fine, with no other errors or any correlating changes in temperature, so we are not sure whether the change in value is trying to tell us something or if it is just a wonky sensor.
Helium versus Air-Filled Hard Drives
There are several different ways to compare these two types of drives. Below we decided to use just our 8, 10, and 12 TB drives in the comparison. We did this since we have helium-filled drives in those sizes. We left out of the comparison all of the drives that are 6 TB and smaller as none of the drive models we use are helium-filled. We are open to trying different comparisons. This just seemed to be the best place to start.
The most obvious observation is that there seems to be little difference in the Annualized Failure Rate (AFR) based on whether they contain helium or air. One conclusion, given this evidence, is that helium doesn’t affect the AFR of hard drives versus air-filled drives. My prediction is that the helium drives will eventually prove to have a lower AFR. Why? Drive Days.
Let’s go back in time to Q1 2017 when the air-filled drives listed in the table above had a similar number of Drive Days to the current number of Drive Days for the helium drives. We find that the failure rate for the air-filled drives at the time (Q1 2017) was 1.61%. In other words, when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.
Helium or Air?
My hypothesis is that after normalizing the data so that the helium and air-filled drives have the same (or similar) usage (Drive Days), the helium-filled drives we use will continue to have a lower Annualized Failure Rate versus the air-filled drives we use. I expect this trend to continue for the next year at least. What side do you come down on? Will the Annualized Failure Rate for helium-filled drives be better than air-filled drives or vice-versa? Or do you think the two technologies will be eventually produce the same AFR over time? Pick a side and we’ll document the results over the next year and see where the data takes us.
In Part 2, we take a deeper look at the differences between HDDs and SSDs, how both HDD and SSD technologies are evolving, and how Backblaze takes advantage of SSDs in our operations and data centers.
The first time you booted a computer or opened an app on a computer with a solid-state-drive (SSD), you likely were delighted. I know I was. I loved the speed, silence, and just the wow factor of this new technology that seemed better in just about every way compared to hard drives.
I was ready to fully embrace the promise of SSDs. And I have. My desktop uses an SSD for booting, applications, and for working files. My laptop has a single 512GB SSD. I still use hard drives, however. The second, third, and fourth drives in my desktop computer are HDDs. The external USB RAID I use for local backup uses HDDs in four drive bays. When my laptop is at my desk it is attached to a 1.5TB USB backup hard drive. HDDs still have a place in my personal computing environment, as they likely do in yours.
Nothing stays the same for long, however, especially in the fast-changing world of computing, so we are certain to see new storage technologies coming to the fore, perhaps with even more wow factor.
Before we get to what’s coming, let’s review the primary differences between HDDs and SSDs in a little more detail in the following table.
A Comparison of HDDs to SSDs
Power Draw/Battery Life
More power draw, averages 6–7 watts and therefore uses more battery
Less power draw, averages 2–3 watts, resulting in 30+ minute battery boost
Only around $0.03 per gigabyte, very cheap (buying a 4TB model)
Expensive, roughly $0.20- $0.30 per gigabyte (based on buying a 1TB drive)
Typically around 500GB and 2TB maximum for notebook size drives; 10TB max for desktops
Typically not larger than 1TB for notebook size drives; 4TB for desktops
Operating System Boot Time
Around 30-40 seconds average bootup time
Around 8-13 seconds average bootup time
Audible clicks and spinning platters can be heard
There are no moving parts, hence no sound
The spinning of the platters can sometimes result in vibration
No vibration as there are no moving parts
HDD doesn’t produce much heat, but it will have a measurable amount more heat than an SSD due to moving parts and higher power draw
Lower power draw and no moving parts so little heat is produced
Mean time between failure rate of 1.5 million hours
Mean time between failure rate of 2.0 million hours
File Copy / Write Speed
The range can be anywhere from 50–120MB/s
Generally above 200 MB/s and up to 550 MB/s for cutting edge drives
Full Disk Encryption (FDE) Supported on some models
Full Disk Encryption (FDE) Supported on some models
The HDD has an amazing history of improvement and innovation. From its inception in 1956 the hard drive has decreased in size 57,000 times, increased storage 1 million times, and decreased cost 2,000 times. In other words, the cost per gigabyte has decreased by 2 billion times in about 60 years.
Hard drive manufacturers made these dramatic advances by reducing the size, and consequently the seek times, of platters while increasing their density, improving disk reading technologies, adding multiple arms and read/write heads, developing better bus interfaces, and increasing spin speed and reducing friction with techniques such as filling drives with helium.
In 2005, the drive industry introduced perpendicular recording technology to replace the older longitudinal recording technology, which enabled areal density to reach more than 100 gigabits per square inch. Longitudinal recording aligns data bits horizontally in relation to the drive’s spinning platter, parallel to the surface of the disk, while perpendicular recording aligns bits vertically, perpendicular to the disk surface.
Other technologies such as bit patterned media recording (BPMR) are contributing to increased densities, as well. Introduced by Toshiba in 2010, BPMR is a proposed hard disk drive technology that could succeed perpendicular recording. It records data using nanolithography in magnetic islands, with one bit per island. This contrasts with current disk drive technology where each bit is stored in 20 to 30 magnetic grains within a continuous magnetic film.
Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in HDDs to increase storage density and overall per-drive storage capacity. Shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing for higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because physical limitations prevent recording magnetic heads from having the same width as reading heads, leaving recording heads wider.
Track Spacing Enabled by SMR Technology (Seagate)
To increase the amount of data stored on a drive’s platter requires cramming the magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other. In 2002, Seagate successfully demoed heat-assisted magnetic recording (HAMR). HAMR records magnetically using laser-thermal assistance that ultimately could lead to a 20 terabyte drive by 2019. (See our post on HAMR by Seagate’s CTO Mark Re, What is HAMR and How Does It Enable the High-Capacity Needs of the Future?)
Western Digital claims that its competing microwave-assisted magnetic recording (MAMR) could enable drive capacity to increase up to 40TB by the year 2025. Some industry watchers and drive manufacturers predict increases in areal density from today’s .86 tbpsi terabit-per-square-inch (TBPSI) to 10 tbpsi by 2025 resulting in as much as 100TB drive capacity in the next decade.
The future certainly does look bright for HDDs continuing to be with us for a while.
The Outlook for SSDs
SSDs are also in for some amazing advances.
SATA (Serial Advanced Technology Attachment) is the common hardware interface that allows the transfer of data to and from HDDs and SSDs. SATA SSDs are fine for the majority of home users, as they are generally cheaper, operate at a lower speed, and have a lower write life.
While fine for everyday computing, in a RAID (Redundant Array of Independent Disks), server array or data center environment, often a better alternative has been to use ‘SAS’ drives, which stands for Serial Attached SCSI. This is another type of interface that, again, is usable either with HDDs or SSDs. ‘SCSI’ stands for Small Computer System Interface (which is why SAS drives are sometimes referred to as ‘scuzzy’ drives). SAS has increased IOPS (Inputs Outputs Per Second) over SATA, meaning it has the ability to read and write data faster. This has made SAS an optimal choice for systems that require high performance and availability.
On an enterprise level, SAS prevails over SATA, as SAS supports over-provisioning to prolong write life and has been specifically designed to run in environments that require constant drive usage.
PCIe (Peripheral Component Interconnect Express) is a high speed serial computer expansion bus standard that supports drastically higher data transfer rates over SATA or SAS interfaces due to the fact that there are more channels available for the flow of data.
Many leading drive manufacturers have been adopting PCIe as the standard for new home and enterprise storage and some peripherals. For example, you’ll see that the latest Apple Macbooks ship with PCIe-based flash storage, something that Apple has been adopting over the years with their consumer devices.
PCIe can also be used within data centers for RAID systems and to create high-speed networking capabilities, increasing overall performance and supporting the newer and higher capacity HDDs.
As we covered in Part 1, SSDs are based on a type of non-volatile flash memory called NAND.The latest trend in NAND flash is quad-level-cell (QLC) NAND. NAND is subdivided into types based on how many bits of data are stored in each physical memory cell. SLC (single-level-cell) stores one bit, MLC (multi-level-cell) stores two, TLC (triple-level cell) stores three, and QLC (quad-level-cell) stores four.
Storing more data per cell makes NAND more dense, but it also makes the memory slower — it takes more time to read and write data when so much additional information (and so many more charge states) are stored within the same cell of memory.
QLC NAND memory is built on older process nodes with larger cells that can more easily store multiple bits of data. The new NAND tech has higher overall reliability with higher total number of program / erase cycles (P/E cycles).
QLC NAND wafer from which individual microcircuits are made
QLC NAND promises to produce faster and denser SSDs. The effect on price also could be dramatic. Tom’s Hardware is predicting that the advent of QLC could push 512GB SSDs down to $100.
Beyond HDDs and SSDs
There is significant work being done that is pushing the bounds of data storage beyond what is possible with spinning platters and microcircuits. A team at Harvard University has used genome-editing to encode video into live bacteria.
We’ve already discussed the benefits of SSDs. The benefits of SSDs that apply particularly to the data center are:
Low power consumption — When you are running lots of drives, power usage adds up. Anywhere you can conserve power is a win.
Speed — Data can be accessed faster, which is especially beneficial for caching databases and other data affecting overall application or system performance.
Lack of vibration — Reducing vibration improves reliability thereby reducing problems and maintenance. Racks don’t need the size and structural rigidity housing SSDs that they need housing HDDs.
Low noise — Data centers will become quieter as more SSDs are deployed.
Low heat production — The less heat generated the less cooling and power required in the data center.
Faster booting — The faster a storage chassis can get online or a critical server can be rebooted after maintenance or a problem, the better.
Greater areal density — Data centers will be able to store more data in less space, which increases efficiency in all areas (power, cooling, etc.)
The top drive manufacturers say that they expect HDDs and SSDs to coexist for the foreseeable future in all areas — home, business, and data center, with customers choosing which technology and product will best fit their application.
How Backblaze Uses SSDs
In just about all respects, SSDs are superior to HDDs. So why don’t we replace the 100,000+ hard drives we have spinning in our data centers with SSDs?
Our operations team takes advantage of the benefits and savings of SSDs wherever they can, using them in every place that’s appropriate other than primary data storage. They’re particularly useful in our caching and restore layers, where we use them strategically to speed up data transfers. SSDs also speed up access to B2 Cloud Storage metadata. Our operations teams is considering moving to SSDs to boot our Storage Pods, where the cost of a small SSD is competitive with hard drives, and their other attributes (small size, lack of vibration, speed, low-power consumption, reliability) are all pluses.
A Future with Both HDDs and SSDs
IDC predicts that total data created will grow from approximately 33 zettabytes in 2018 to about 160 zettabytes in 2025. (See What’s a Byte? if you’d like help understanding the size of a zettabyte.)
Annual Size of the Global Datasphere
Over 90% of enterprise drive shipments today are HDD, according to IDC. By 2025, SSDs will comprise almost 20% of drive shipments. SSDs will gain share, but total growth in data created will result in massive sales of both HDDs and SSDs.
Enterprise Byte Shipments: NDD and SSD
As both HDD and SSD sales grow, so does the capacity of both technologies. Given the benefits of SSDs in many applications, we’re likely going to see SSDs replacing HDDs in all but the highest capacity uses.
It’s clear that there are merits to both HDDs and SSDs. If you’re not running a data center, and don’t have more than one or two terabytes of data to store on your home or business computer, your first choice likely should be an SSD. They provide a noticeable improvement in performance during boot-up and data transfer, and are smaller, quieter, and more reliable as well. Save the HDDs for secondary drives, NAS, RAID, and local backup devices in your system.
Perhaps some day we’ll look back at the days of spinning platters with the same nostalgia we look back at stereo LPs, and some of us will have an HDD paperweight on our floating anti-gravity desk as a conversation piece. Until the day that SSD’s performance, capacity, and finally, price, expel the last HDD out of the home and data center, we can expect to live in a world that contains both solid state SSDs and magnetic platter HDDs, and as users we will reap the benefits from both technologies.
Don’t miss future posts on HDDs, SSDs, and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.
2016 marks the 60th anniversary of the venerable Hard Disk Drive (HDD). While new computers increasingly turn to Solid State Disks (SSDs) for main storage, HDDs remain the champions of low-cost, high-capacity data storage. That’s a big reason why we still use them in our Storage Pods. Let’s take a spin the Wayback Machine and take a look at the history of hard drives. Let’s also think about what the future might hold.
It Started With RAMAC
IBM made the first commercial hard disk drive-based computer and called it RAMAC – short for “Random Access Method of Accounting And Control.” Its storage system was called the IBM 350. RAMAC was big – it required an entire room to operate. The hard disk drive storage system alone was about the size of two refrigerators. Inside were stacked 50 24-inch platters.
For that, RAMAC customers ended up with less than 5 MB – that’s right, megabytes of storage. IBM’s marketing people didn’t want to make RAMAC store any more data than that. They had no idea how to convince customers they’d need more storage than that.
IBM customers forked over $3,200 for the privilege of accessing and storing that information. A MONTH. (IBM leased its systems.) That’s equivalent to almost $28,000 per month in 2016.
Sixty years ago, data storage cost $640 per megabyte, per month. At IBM’s 1956 rates for storage, a new iPhone 7 would cost you about $20.5 million a month. RAMAC was a lot harder to stick in your pocket, too.
Plug and Play
These days you can fit 2 TB onto an SD card the size of a postage stamp, but half a century ago, it was a very different story. IBM continued to refine early hard disk drive storage, but systems were still big and bulky.
By the early 1960s, IBM’s mainframe customers were hungry for more storage capacity, but they simply didn’t have the room to keep installing refrigerator-sized storage devices. So the smart folks at IBM came up with a solution: Removable storage.
The IBM 1311 Disk Storage Drive, introduced in 1962, gave rise to the use of IBM 1316 “Disk Packs” that let IBM’s mainframe customers expand their storage capacity as much as they needed (or could afford). IBM shrank the size of the disks dramatically, from 24 inches in diameter down to 14 inches. The 9-pound disk packs fit into a device about the size of a modern washing machine. Each pack could hold about 2 MB.
For my part, I remember touring a data center as a kid in the mid-1970s and seeing removable IBM disk packs up close. They looked about the same size and dimensions that you’d use to carry a birthday cake: Large, sealed plastic containers with handles on the top.
Computers had pivoted from expensive curiosities in the business world to increasingly essential devices needed to get work done. IBM’s System/360 proved to be an enormously popular and influential mainframe computer. IBM created different models but needed flexible storage across the 360 product line. So IBM created a standard hard disk device interconnect. Other manufacturers adopted the technology, and a cottage industry was born: Third-party hard disk drive storage.
The PC Revolution
Up until the 1970s, computers were huge, expensive, very specialized devices only the biggest businesses, universities and government institutions could afford. The dropping price of electronic components, the increasing density of memory chips and other factors gave rise to a brand new industry: The personal computer.
Initially, personal computers had very limited, almost negligible storage capabilities. Some used perforated paper tape for storage. Others used audio cassettes. Eventually, personal computers would write data to floppy disk drives. And over time, the cost of hard disk drives fell enough that PC users could have one, too.
In 1980, a young upstart company named Shugart Technology introduced a 5 MB hard disk drive designed to fit into personal computers of the day. It was a scant 5.25 inches in diameter. The drive cost $1,500. It would prove popular enough to become a de facto standard for PCs throughout the 1980s. Shugart changed its name to Seagate Technology. Yep. That Seagate.
In the space of 25 years, hard drive technology had shrunk from a device the size of a refrigerator to something less than 6 inches in diameter. And that would be nothing compared to what was to come in the next 25 years.
The Advent of RAID
An important chapter in Backblaze’s backstory appears in the late 1980s when three computer scientists from U.C. Berkeley coined the term “RAID” in a research paper presented at the SIGMOD conference, an annual event which still happens today.
RAID is an acronym that stands for “Redundant Array of Inexpensive Disks.” The idea is that you can take several discrete storage devices – hard disk drives, in this case – and combine them into a single logical unit. Dividing the work of writing and reading data between multiple devices can make data move faster. It can also reduce the likelihood that you’ll lose data.
The Berkeley researchers weren’t the first to come up with the idea, which had bounced around since the 1970s. They did coin the acronym that we still use today.
RAID is vitally important for Backblaze. RAID is how we build our Storage Pods. Our latest Storage Pod design incorporates 60 individual hard drives assembled in 4 RAID arrays. Backblaze then took the concept a further by implementing our own Reed-Solomon erasure coding mechanism to work across our Backblaze Vaults.
With our latest Storage Pod design we’ve been able to squeeze 480 TB into a single chassis that occupies 4U of rack space, or about 7 inches of vertical height in an equipment rack. That’s a far cry from RAMAC’s 5 MB of refrigerator-sized storage. 96 million times more storage, in fact.
Bigger, Better, Faster, More
Throughout the 1980s and 1990s, hard drive and PC makers innovated and changed the market irrevocably. 5.25-inch drives soon gave way to 3.5-inch drives (we at Backblaze still use 3.5-inch drives designed for modern desktop computers in our Storage Pods). When laptops gained in popularity, drives shrunk again to 2.5 inches. If you’re using a laptop that has a hard drive today, chances are it’s a 2.5-inch model.
The need for better, faster, more reliable and flexible storage also gave rise to different interfaces: IDE, SCSI, ATA, SATA, PCIe. Drive makers improved performance by increasing the spindle speed. the speed of the motor that turns the hard drive. 5,400 revolutions per minute (RPM) was standard, but 7,200 yielded better performance. Seagate, Western Digital, and others upped the ante by introducing 10,0000-RPM and eventually 15,000-RPM drives.
IBM pioneered the commercial hard drive and brought countless hard disk drive innovations to market over the decades. In 2003, IBM sold its storage division to Hitachi. The many Hitachi drives we use here at Backblaze can trace their lineage back to IBM.
Solid State Drives
Even as hard drives found a place in early computer systems, RAM-based storage systems were also being created. The prohibitively high cost of computer memory, its complexity, size, and requirement to stay powered to work prevented memory-based storage from catching on in any meaningful way. Though very specialized, expensive systems found use in the supercomputing and mainframe computer markets.
Eventually non-volatile RAM became fast, reliable and inexpensive enough that SSDs could be mass-produced, but it was still by degrees. They were incredibly expensive. By the early 1990s, you could buy a 20 MB SSD for a PC for $1,000, or about $50 per megabyte. By comparison, the cost of a spinning hard drive had dropped below $1 per megabyte, and would plummet even further.
The real breakthrough happened with the introduction of flash-based SSDs. By the mid-2000s, Samsung, SanDisk and others brought to market flash SSDs that acted as drop-in replacements for hard disk drives. SSDs have gotten faster, smaller and more plentiful. Now PCs and Macs and smartphones all include flash storage of all shapes and sizes and will continue to move in that direction. SSDs provide better performance, better power efficiency, and enable thinner, lighter computer designs, so it’s little wonder.
The venerable spinning hard drive, now 60 years old, still rules the roost when it comes to cost per gigabyte. SSD makers are getting closer to parity with hard drives, but they’re still years away from hitting that point. An old fashioned spinning hard drive still gives you the best bang for your buck.
We can dream, though. Over the summer our Andy Klein got to wondering what Seagate’s new 60 TB SSD might look like in one of our Storage Pods. He had to guess at the price but based on current market estimates, an SSD-based 60-drive Storage Pod would cost Backblaze about $1.2 million.
Andy didn’t make any friends in Backblaze’s Accounting department with that news, so it’s probably not going to happen any time soon.
As computers and mobile devices have pivoted from hard drives to SSDs, it’s easy to discount the hard drive as a legacy technology that will soon be by the wayside. I’d encourage some circumspection, though. It seems every few years, someone declares the hard drive dead. Meanwhile hard drive makers keep finding ways to stay relevant.
There’s no question that the hard drive market is in a period of decline and transition. Hard disk drive sales are down year-over-year. Consumers switch to SSD or move away from Macs and PCs altogether and do more of their work on mobile devices.
Regardless, Innovation and development of hard drives continue apace. We’re populating our own Storage Pods with 8 TB hard drives. 10 TB hard drives are already shipping, and even higher-capacity 3.5-inch drives are on the horizon.
Hard drive makers constantly improve areal density – the amount of information you can physically cram onto a disk. They’ve also found ways to get more platters into a single drive mechanism then filling it with helium. This sadly does not make the drive float, dashing my fantasies of creating a Backblaze data center blimp.
So is SSD the only future for data storage? Not for a while. Seagate still firmly believes in the future of hard drives. Its CFO estimates that hard drives will be around for another 15-20 years. Researchers predict that hard drives coming to market over the next decade will store an order of magnitude more data than they do now – 100 TB or more.
Think it’s out of the question? Imagine handing a 10 TB hard drive to a RAMAC operator in 1956 and telling them that the 3.5-inch device in their hands holds two million times more data than that big box in front of them. They’d think you were nuts.
Q2 2016 saw Backblaze: introduce 8TB drives into our drive mix, kickoff a pod-to-vault migration of over 6.5 Petabytes of data, cross over 250 Petabytes of data stored, and deploy another 7,290 drives into the data center for a total of 68,813 spinning hard drives under management. With all the ins-and-outs let’s take a look at how our hard drives fared in Q2 2016.
Backblaze hard drive reliability for Q2 2016
Below is the hard drive failure data for Q2 2016. This chart is just for the period of Q2 2016. The hard drive models listed below are data drives (not boot drives), and we only list models which have 45 or more drives of that model deployed.
A couple of observations on the chart:
The models that have an annualized failure rate of 0.00% had zero hard drive failures in Q2 2016.
The annualized failure rate is computed as follows: ((Failures)/(Drive Days/365)) * 100. Therefore consider the number of “Failures” and “Drive Days” before reaching any conclusions about the failure rate.
Later in this post we’ll review the cumulative statistics for all of our drives over time, but first let’s take a look at the new drives on the block.
The 8TB hard drives have arrived
For the last year or so we kept saying we were going to deploy 8TB drives in quantity. We did deploy 45 8TB HGST drives, but deploying these drives en masse did not make economic sense for us. Over the past quarter, 8TB drives from Seagate became available at a reasonable price, so we purchased and deployed over 2,700 in Q2 with more to come in Q3. All of these drives were deployed in Backblaze Vaults with each vault using 900 drives, that’s 45 drives in each of the 20 Storage Pods that form a Backblaze Vault.
Yes, we said 45 drives in each storage pod, so what happened to our 60 drive Storage Pods? In short, we wanted to use the remaining stock of 45 drive Storage Pods before we started using the 60 drive pods. We have built two Backblaze Vaults using the 60 drive pods, but we filled them with 4- and 6TB drives. The first 60 drive Storage Pod filled with 8TB drives (total 480TB) will be deployed shortly.
Hard Drive Migration – 85 Pods to 1 Vault
One of the reasons that we made the move to 8TB drives was to optimize storage density. We’ve done data migrations before, for example, from 1TB pods to 3TB and 4TB pods. These migrations were done one or two Storage Pods at a time. It was time to “up our game.” We decided to migrate from individual Storage Pods filled with HGST 2TB drives, average age 64 months, to a Backblaze Vault filled with 900 8TB drives.
We identified and tagged 85 individual Storage Pods to migrate from. Yes, 85. The total amount of data to be migrated was about 6.5PB. It was a bit sad to see the 2TB HGST drives go as they have been really good over the years, but getting 4 times as much data into the same space was just too hard to resist.
The first step is to stop all data writes on the donor HGST 2TB Storage Pods. We then kicked off the migration by starting with 10 Storage Pods. We then added 10 to 20 donor pods to the migration every few hours until we got to 85 pods. The migration process is purposely slow as we want to ensure that we can still quickly read files from the 85 donor pods so that data restores are not impacted. The process is to copy a given RAID-array from a Storage Pod to a specific “Tome” in a Backblaze Vault. Once all the data in a given RAID-array has been copied to a Tome, we move on to the next RAID-array awaiting migration and continue the process. This happens in parallel across the 45 Tomes in a Backblaze Vault.
We’re about 50% of the way through the migration with little trouble. We did have a Storage Pod in the Backblaze Vault go down. That didn’t stop the migration, as vaults are designed to continue to operate under such conditions, but more on that in another post.
250 Petabytes of data stored
Recently we took a look at the growth of data and the future of cloud storage. Given the explosive growth in data as a whole it’s not surprising that Backblaze added another 50PB of customer data over the last 2 quarters and that by mid-June we had passed the 250 Petabyte mark in total data stored. You can see our data storage growth below:
Back in December 2015, we crossed over the 200 Petabyte mark and at that time predicted we would cross 250PB in early Q3 2016. So we’re a few weeks early. We also predicted we would cross 300PB in late 2016. Given how much data we are adding with B2, it will probably be sooner, we’ll see.
Cumulative hard drive failure rates by model
In the table below we’ve computed the annualized drive failure rate for each drive model. This is based on data from April 2013 through June 2016.
Some people question the usefulness of the cumulative Annualized Failure Rate. This is usually based on the idea that drives entering or leaving during the cumulative period skew the results because they are not there for the entire period. This is one of the reasons we compute the Annualized Failure Rate using “Drive Days”. A Drive Day is only recorded if the drive is present in the system. For example, if a drive is installed on July 1st and fails on August 31st, it adds 62 drive days and 1 drive failure to the overall results. A drive can be removed from the system because it fails or perhaps it is removed from service after a migration like the 2TB HGST drives we’ve covered earlier. In either case, the drive stops adding Drive Days to the total, allowing us to compute an Annualized Failure Rate over the cumulative period based on what each of the drives contributed during that period.
As always, we’ve published the Q2 2016 data we used to compute these drive stats. You can find the data files along with the associated documentation on our hard drive test data page.
Which hard drives do we use?
We’ve written previously about our difficulties in getting drives from Toshiba and Western Digital. Whether it’s poor availability or an unexplained desire not to sell us drives, we don’t have many drives from either manufacturer. So we use a lot of Seagate drives and they are doing the job very nicely. The table below shows the distribution of the hard drives we are currently using in our data center.
The Seagate 8TB drives are here and are looking good. Sadly we’ll be saying goodbye to the HGST 2TB drives, but we need the space. We’ll miss those drives, they were rock stars for us. The 4TB Seagate drives are our workhorse drives today and their 2.8% annualized failure rate is more than acceptable for us. Their low failure rate roughly translates to an average of one drive failure per Storage Pod per year. Over the next few months expect more on our migrations, a look at the day in the life of a data center tech, and an update of the “bathtub” curve, i.e. hard drive failure over time.
Bees are important. I find myself saying this a lot and, slowly but surely, the media seems to be coming to this realisation too. The plight of the bee is finally being brought to our attention with increasing urgency.
Welcome to the house of buzz.
In the UK, bee colonies are suffering mass losses. Due to the use of bee-killing fertilisers and pesticides within the farming industry, the decline of pollen-rich plants, the destruction of hives by mites, and Colony Collapse Disorder (CCD), bees are in decline at a worrying pace.
When you find the perfect GIF…
One hint of a silver lining is that increasing awareness of the crisis has led to a rise in the number of beekeeping hobbyists. As getting your hands on some bees is now as simple as ordering a box from the internet, keeping bees in your garden is a much less daunting venture than it once was.
Taking this one step further, beekeepers are now using tech to monitor the conditions of their bees, improving conditions for their buzzy workforce while also recording data which can then feed into studies attempting to lessen the decline of the bee.
WDLabs recently donated a PiDrive to the Honey Bee Gardens Project in order to help beekeeper David Ammons and computer programmer Graham Total create The Hive Project, an electric beehive colony that monitors real-time bee data.
The setup records colony size, honey production, and bee health to help combat CCD.
Colony Collapse Disorder (CCD) is decidedly mysterious. Colonies hit by the disease seem to simply disappear. The hive itself often remains completely intact, full of honey at the perfect temperature, but… no bees. Dead or alive, the bees are nowhere to be found.
To try to combat this phenomenon, the electric hive offers 24/7 video coverage of the inner hive, while tracking the conditions of the hive population.
This is from the first live day of our instrumented beehive. This was the only bee we spotted all day that brought any pollen into the hive.
Ultimately, the team aim for the data to be crowdsourced, enabling researchers and keepers to gain the valuable information needed to fight CCD via a network of electric hives. While many people blame the aforementioned pollen decline and chemical influence for the rise of CCD, without the empirical information gathered from builds such as The Hive Project, the source of the problem, and therefore the solution, can’t be found.
It has been brought to our attention that the picture here previously was of a wasp doing bee things. We have swapped it out for a bee.
Ammons and Total researched existing projects around the use of digital tech within beekeeping, and they soon understood that a broad analysis of bee conditions didn’t exist. While many were tracking hive weight, temperature, or honey population, there was no system in place for integrating such data collection into one place. This realisation spurred them on further.
“We couldn’t find any one project that took a broad overview of the whole area. Even if we don’t end up being the people who implement it, we intend to create a plan for a networked system of low-cost monitors that will assist both research and commercial beekeeping.”
With their mission statement firmly in place, the duo looked toward the Raspberry Pi as the brain of their colony. Finding the device small enough to fit within the hive without disruption, the power of the Pi allowed them to monitor multiple factors while also using the Pi Camera Module to record all video to the 314GB storage of the Western Digital PiDrive.
Data recorded by The Hive Project is vital to the survival of the bee, the growth of colony population, and an understanding of the conditions of the hive in changing climates. These are issues which affect us all. The honey bee is responsible for approximately 80% of pollination in the UK, and is essential to biodiversity. Here, I should hand over to a ‘real’ bee to explain more about the importance of bee-ing…
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.