Tag Archives: Featured-Cloud Storage

The Life and Times of a Backblaze Hard Drive

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/life-and-times-of-a-backblaze-hard-drive/

Seagate 12 TB hard drive

Backblaze likes to talk about hard drive failures — a lot. What we haven’t talked much about is how we deal with those failures: the daily dance of temp drives, replacement drives, and all the clones that it takes to keep over 100,000 drives healthy. Let’s go behind the scenes and take a look at that dance from the eyes of one Backblaze hard drive.

After sitting still for what seemed like forever, ZCH007BZ was on the move. ZCH007BZ, let’s call him Zach, is a Seagate 12 TB hard drive. For the last few weeks, Zach and over 6,000 friends were securely sealed inside their protective cases in the ready storage area of a Backblaze data center. Being a hard disk drive, Zach’s modest dream was to be installed in a system, spin merrily, and store data for many years to come. And now the wait was nearly over, or was it?

Hard drives in wrappers

The Life of Zach

Zach was born in a factory in Singapore and shipped to the US, eventually finding his way to Backblaze, but he didn’t know that. He had sat sealed in the dark for weeks. Now Zach and boxes of other drives were removed from their protective cases and gently stacked on a cart. Zack was near the bottom of the pile, but even he could see endless columns of beautiful red boxes stacked seemingly to the sky. “Backblaze!” one of the drives on the cart whispered. All the other drives gasped with recognition. Thank goodness the noise-cancelling headphones worn by all Backblaze Data Center Techs covered the drives’ collective excitement.

While sitting in the dark, the drives had gossiped about where they were: a data center, a distribution warehouse, a Costco, or Best Buy. Backblaze came up a few times, but that was squashed — they couldn’t be that lucky. After all, Backblaze was the only place where a drive could be famous. Before Backblaze, hard drives labored in anonymity. Occasionally, one or two would be seen in a hard drive tear down article, but even that sort of exposure had died out a couple of years ago. But Backblaze publishes everything about their drives, their model numbers, their serial numbers, heck even their S.M.A.R.T. statistics. There was a rumor that hard drives worked extra hard at Backblaze because they knew they would be in the public eye. With red Backblaze Storage Pods as far as the eye could see, Zach and friends were about to find out.

Drive with guideThe cart Zach and his friends were on glided to a stop at the production build facility. This is where storage pods are filled with drives and tested before being deployed. The cart stopped by the first of twenty V6.0 Backblaze Storage Pods that together would form a Backblaze Vault. At each Storage Pod station 60 drives were unloaded from the cart. The serial number of each drive was recorded along with the Storage Pod ID and drive location in the pod. Finally, each drive was fitted with a pair of drive guides and slid into its new home as a production drive in a Backblaze Storage Pod. “Spin long and prosper,” Zach said quietly each time the lid of a Storage Pod snapped in place covering the 60 giddy hard drives inside. The process was repeated for the remaining 19 Storage Pods, and when it was done Zach remained on the cart. He would not be installed in a production system today.

The Clone Room

Zach and the remaining drives on the cart were slowly wheeled down the hall. Bewildered, they were rolled in the clone room. “What’s a clone room,” Zach asked to himself? The drives on the cart were divided into two groups, with one group being placed on the clone table, and the other being placed on the test table. Zach was on the test table.

Almost as soon as Zach was placed on the test table, the DC Tech picked him up again and placed him and several other drives into a machine. He was about to get formatted. The entire formatting process only took a few minutes for Zach, as it did for all of the other drives on the test table. Zach counted 25 drives, including himself.

Still confused and a little sore from the formatting, Zach and two other drives were picked up from the bench by a different DC Tech. She recorded their vitals — serial number, manufacturer, and model — and left the clone room with all three drives on a different cart.

Dreams of a Test Drive

Luigi, Storage Pod liftThe three drives were back on the data center floor with red Storage Pods all around. The DC Tech had maneuvered Luigi, the local Storage Pod lift unit, to hold a Storage Pod she was sliding from a data center rack. The lid was opened, the tech attached a grounding clip, and then removed one of the drives in the Storage Pod. She recorded the vitals of the removed drive. While she was doing so, Zach could hear the removed drive breathlessly mumble something about media errors, but before Zach could respond, the tech picked him up, attached drive guides to his frame and gently slide him into the Storage Pod. The tech updated her records, closed the lid, and slide the pod back into place. A few seconds later, Zach felt a jolt of electricity pass through his circuits and he and 59 other drives spun to life. Zach was now part of a production Backblaze Storage Pod.

First, Zach was introduced to the other 19 members of his tome. There are 20 drives in a tome, with each living in a separate Storage Pod. Files are divided (sharded) across these 20 drives using Backblaze’s open-sourced erasure code algorithm.

Zach’s first task was to rebuild all of the files that were stored on the drive he replaced. He’d do this by asking for pieces (shards) of all the files from the 19 other drives in his tome. He only needed 17 of the pieces to rebuild a file, but he asked everyone in case there was a problem. Rebuilding was hard work, and the other drives were often busy with reading files, performing shard integrity checks, and so on. Depending on how busy the system was, and how full the drives were, it might take Zach a couple of weeks to rebuild the files and get him up to speed with his contemporaries.

Nightmares of a Test Drive

Little did he know, but at this point, Zach was still considered a temp replacement drive. The dysfunctional drive that he replaced was making its way back to the clone room where a pair of cloning units, named Harold and Maude in this case, waited. The tech would attempt to clone the contents of the failed drive to a new drive assigned to the clone table. The primary reason for trying to clone a failed drive was recovery speed. A drive can be cloned in a couple of days, but as noted above, it can take up to a couple of weeks to rebuild a drive, especially large drives on busy systems. In short, a successful clone would speed up the recovery process.

For nearly two days straight, Zach was rebuilding. He barely had time to meet his pod neighbors, Cheryl and Carlos. Since they were not rebuilding, they had plenty of time to marvel at how hard Zach was working. He was 25 % done and going strong when the Storage Pod powered down. Moments later, the pod was slid out of the rack and the lid popped open. Zach assumed that another drive in the pod had failed, when he felt the spindly, cold fingers of the tech grab him and yank firmly. He was being replaced.

Storage Pod in Backblaze data center

Zach had done nothing wrong. It was just that the clone was successful, with nearly all the files being copied from the previous drive to the smiling clone drive that was putting on Zach’s drive guides and gently being inserted in Zach’s old slot. “Goodbye,” he managed to eek out as he was placed on the cart and watched the tech bring the Storage Pod back to life. Confused, angry, and mostly exhausted, Zach quickly fell asleep.

Zach woke up just in time to see he was in the formatting machine again. The data he had worked so hard to rebuild was being ripped from his platters and replaced randomly with ones and zeroes. This happened multiple times and just as Zach was ready to scream, it stopped, and he was removed from his torture and stacked neatly with a few other drives.

After a while he looked around, and once the lights went out the stories started. Zach wasn’t alone. Several of the other temp drives had pretty much the same story; they thought they had found a home, only to be replaced by some uppity clone drive. One of the temp drives, Lin, said she had been in three different systems only to be replaced each time by a clone drive. No one wanted to believe her, but no one knew what was next either.

The Day the Clone Died

Zach found out the truth a few days later when he was selected, inspected, and injected as a temp drive into another Storage Pod. Then three days later he was removed, wiped, reformatted, and placed back in the temp pool. He began to resign himself to life as a temp drive. Not exactly glamorous, but he did get his serial number in the Backblaze Drive Stats data tables while he was a temp. That was more than the millions of other drives in the world that would forever be unknown.

On his third temp drive stint, he was barely in the pod a day when the lid opened and he was unceremoniously removed. This was the life of temp drive, and when the lid opened on the fourth day of his fourth temp drive shift, he just closed his eyes and waited for his dream to end again. Except, this time, the tech’s hand reached past him and grabbed a drive a few slots away. That unfortunate drive had passed the night before, a full-fledged crash. Zach, like all the other drives nearby, had heard the screams.

Another temp drive Zach knew from the temp table replaced the dead drive, then the lid was closed, the pod slid back into place, and power was restored. With that Zach, doubled down on getting rebuilt — maybe if he could get done before the clone was finished then he could stay. What Zach didn’t know was that the clone process for the drive he had replaced had failed. This happens about half the time. Zach was home free; he just didn’t know it.

In a couple of days, Zach was finished rebuilding and become a real member of a production Backblaze Storage Pod. He now spends his days storing and retrieving data, getting his bits tested by shard integrity checks, and having his S.M.A.R.T. stats logged for the Backblaze Drive Stats. His hard drive life is better than he ever dreamed.

The post The Life and Times of a Backblaze Hard Drive appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Petabytes on a Budget: 10 Years and Counting

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/petabytes-on-a-budget-10-years-and-counting/

A Decade of the Pod

This post is for all of the storage geeks out there who have followed the adventures of Backblaze and our Storage Pods over the years. The rest of you are welcome to come along for the ride.

It has been 10 years since Backblaze introduced our Storage Pod to the world. In September 2009, we announced our hulking, eye-catching, red 4U storage server equipped with 45 hard drives delivering 67 terabytes of storage for just $7,867 — that was about $0.11 a gigabyte. As part of that announcement, we open-sourced the design for what we dubbed Storage Pods, telling you and everyone like you how to build one, and many of you did.

Backblaze Storage Pod version 1 was announced on our blog with little fanfare. We thought it would be interesting to a handful of folks — readers like you. In fact, it wasn’t even called version 1, as no one had ever considered there would be a version 2, much less a version 3, 4, 4.5, 5, or 6. We were wrong. The Backblaze Storage Pod struck a chord with many IT and storage folks who were offended by having to pay a king’s ransom for a high density storage system. “I can build that for a tenth of the price,” you could almost hear them muttering to themselves. Mutter or not, we thought the same thing, and version 1 was born.

The Podfather

Tim, the “Podfather” as we know him, was the Backblaze lead in creating the first Storage Pod. He had design help from our friends at Protocase, who built the first three generations of Storage Pods for Backblaze and also spun out a company named 45 Drives to sell their own versions of the Storage Pod — that’s open source at its best. Before we decided on the version 1 design, there were a few experiments along the way:

Wooden pod
Octopod

The original Storage Pod was prototyped by building a wooden pod or two. We needed to test the software while the first metal pods were being constructed.

The Octopod was a quick and dirty response to receiving the wrong SATA cables — ones that were too long and glowed. Yes, there are holes drilled in the bottom of the pod.

Pre-1 Storage Pod
Early not-red Storage Pod

The original faceplate shown above was used on about 10 pre-1.0 Storage Pods. It was updated to the three circle design just prior to Storage Pod 1.0.

Why are Storage Pods red? When we had the first ones built, the manufacturer had a batch of red paint left over that could be used on our pods, and it was free.

Back in 2007, when we started Backblaze, there wasn’t a whole lot of affordable choices for storing large quantities of data. Our goal was to charge $5/month for unlimited data storage for one computer. We decided to build our own storage servers when it became apparent that, if we were to use the other solutions available, we’d have to charge a whole lot more money. Storage Pod 1.0 allowed us to store one petabyte of data for about $81,000. Today we’ve lowered that to about $35,000 with Storage Pod 6.0. When you take into account that the average amount of data per user has nearly tripled in that same time period and our price is now $6/month for unlimited storage, the math works out about the same today as it did in 2009.

We Must Have Done Something Right

The Backblaze Storage Pod was more than just affordable data storage. Version 1.0 introduced or popularized three fundamental changes to storage design: 1) You could build a system out of commodity parts and it would work, 2) You could mount hard drives vertically and they would still spin, and 3) You could use consumer hard drives in the system. It’s hard to determine which of these three features offended and/or excited more people. It is fair to say that ten years out, things worked out in our favor, as we currently have about 900 petabytes of storage in production on the platform.

Over the last 10 years, people have warmed up to our design, or at least elements of the design. Starting with 45 Drives, multitudes of companies have worked on and introduced various designs for high density storage systems ranging from 45 to 102 drives in a 4U chassis, so today the list of high-density storage systems that use vertically mounted drives is pretty impressive:

CompanyServerDrive Count
45 DrivesStorinator S4545
45 DrivesStorinator XL6060
ChenbroRM4316060
ChenbroRM43699100
DellDSS 700090
HPECloudline CL520080
HPECloudline CL5800100
NetGear ReadyNAS 4360X60
NewisysNDS 445060
QuantaQuantaGrid D51PL-4U102
QuantaQuantaPlex T21P-4U70
Seagate Exos AP 4U10096
SupermicroSuperStorage 6049P-E1CR60L60
SupermicroSuperStorage 6049P-E1CR45L45
TyanThunder SX FA100-B7118100
Viking Enterprise SolutionsNSS-460260
Viking Enterprise SolutionsNDS-490090
Viking Enterprise SolutionsNSS-41000100
Western DigitalUltrastar Serv60+860
WiwynnSV7000G272

Another driver in the development of some of these systems is the Open Compute Project (OCP). Formed in 2011, they gather and share ideas and designs for data storage, rack designs, and related technologies. The group is managed by The Open Compute Project Foundation as a 501(c)(6) and counts many industry luminaries in the storage business as members.

What Have We Done Lately?

In technology land, 10 years of anything is a long time. What was exciting then is expected now. And the same thing has happened to our beloved Storage Pod. We have introduced updates and upgrades over the years twisting the usual dials: cost down, speed up, capacity up, vibration down, and so on. All good things. But, we can’t fool you, especially if you’ve read this far. You know that Storage Pod 6.0 was introduced in April 2016 and quite frankly it’s been crickets ever since as it relates to Storage Pods. Three plus years of non-innovation. Why?

  1. If it ain’t broke, don’t fix it. Storage Pod 6.0 is built in the US by Equus Compute Solutions, our contract manufacturer, and it works great. Production costs are well understood, performance is fine, and the new higher density drives perform quite well in the 6.0 chassis.
  2. Disk migrations kept us busy. From Q2 2016 through Q2 2019 we migrated over 53,000 drives. We replaced 2, 3, and 4 terabyte drives with 8, 10, and 12 terabyte drives, doubling, tripling and sometimes quadrupling the storage density of a storage pod.
  3. Pod upgrades kept us busy. From Q2 2016 through Q1 2019, we upgraded our older V2, V3, and V4.5 storage pods to V6.0. Then we crushed a few of the older ones with a MegaBot and gave a bunch more away. Today there are no longer any stand-alone storage pods; they are all members of a Backblaze Vault.
  4. Lots of data kept us busy. In Q2 2016, we had 250 petabytes of data storage in production. Today, we have 900 petabytes. That’s a lot of data you folks gave us (thank you by the way) and a lot of new systems to deploy. The chart below shows the challenge our data center techs faced.

Petabytes Stored vs Headcount vs Millions Raised

In other words, our data center folks were really, really busy, and not interested in shiny new things. Now that we’ve hired a bunch more DC techs, let’s talk about what’s next.

Storage Pod Version 7.0 — Almost

Yes, there is a Backblaze Storage Pod 7.0 on the drawing board. Here is a short list of some of the features we are looking at:

  • Updating the motherboard
  • Upgrade the CPU and consider using an AMD CPU
  • Updating the power supply units, perhaps moving to one unit
  • Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
  • Upgrading the SATA cards
  • Modifying the tool-less lid design

The timeframe is still being decided, but early 2020 is a good time to ask us about it.

“That’s nice,” you say out loud, but what you are really thinking is, “Is that it? Where’s the Backblaze in all this?” And that’s where you come in.

The Next Generation Backblaze Storage Pod

We are not out of ideas, but one of the things that we realized over the years is that many of you are really clever. From the moment we open sourced the Storage Pod design back in 2009, we’ve received countless interesting, well thought out, and occasionally odd ideas to improve the design. As we look to the future, we’d be stupid not to ask for your thoughts. Besides, you’ll tell us anyway on Reddit or HackerNews or wherever you’re reading this post, so let’s just cut to the chase.

Build or Buy

The two basic choices are: We design and build our own storage servers or we buy them from someone else. Here are some of the criteria as we think about this:

  1. Cost: We’d like the cost of a storage server to be about $0.030 – $0.035 per gigabyte of storage (or less of course). That includes the server and the drives inside. For example, using off-the-shelf Seagate 12 TB drives (model: ST12000NM0007) in a 6.0 Storage Pod costs about $0.032-$0.034/gigabyte depending on the price of the drives on a given day.
  2. International: Now that we have a data center in Amsterdam, we need to be able to ship these servers anywhere.
  3. Maintenance: Things should be easy to fix or replace — especially the drives.
  4. Commodity Parts: Wherever possible, the parts should be easy to purchase, ideally from multiple vendors.
  5. Racks: We’d prefer to keep using 42” deep cabinets, but make a good case for something deeper and we’ll consider it.
  6. Possible Today: No DNA drives or other wistful technologies. We need to store data today, not in the year 2061.
  7. Scale: Nothing in the solution should limit the ability to scale the systems. For example, we should be able to upgrade drives to higher densities over the next 5-7 years.

Other than that there are no limitations. Any of the following acronyms, words, and phrases could be part of your proposed solution and we won’t be offended: SAS, JBOD, IOPS, SSD, redundancy, compute node, 2U chassis, 3U chassis, horizontal mounted drives, direct wire, caching layers, appliance, edge storage units, PCIe, fibre channel, SDS, etc.

The solution does not have to be a Backblaze one. As the list from earlier in this post shows, Dell, HP, and many others make high density storage platforms we could leverage. Make a good case for any of those units, or any others you like, and we’ll take a look.

What Will We Do With All Your Input?

We’ve already started by cranking up Backblaze Labs again and have tried a few experiments. Over the coming months we’ll share with you what’s happening as we move this project forward. Maybe we’ll introduce Storage Pod X or perhaps take some of those Storage Pod knockoffs for a spin. Regardless, we’ll keep you posted. Thanks in advance for your ideas and thanks for all your support over the past ten years.

The post Petabytes on a Budget: 10 Years and Counting appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A Toast to Our Partners in Europe at IBC

Post Syndicated from Janet Lafleur original https://www.backblaze.com/blog/a-toast-to-our-partners-in-europe-at-ibc/

Join us at IBC

Prost! Skål! Cheers! Celebrate with us as we travel to Amsterdam for IBC, the premier conference and expo for media and entertainment technology in Europe. The show gives us a chance to raise a glass with our partners, customers, and future customers across the pond. And we’re especially pleased that IBC coincides with the opening of our new European data center.

How will we celebrate? With the Backblaze Partner Crawl, a rolling series of parties on the show floor from 13-16 September. Four of our Europe-based integration partners have graciously invited us to co-host drinks and bites in their stands throughout the show.

If you can make the trip to IBC, you’re invited to toast us with a skål! with our Swedish friends at Cantemo on Friday, a prost! with our German friends at Archiware on Saturday, or a cheers! with UK-based friends at Ortana and GB Labs on Sunday or Monday, respectively. Or drop in every day and keep the Backblaze Partner Crawl rolling. And if you can’t make it to IBC this time, we encourage you to raise a glass and toast anyway.

Skål! on Friday With Cantemo

Cantemo’s iconik media management makes sharing and collaborating on media effortless, regardless of wherever you want to do business. Cantemo announced the integration of iconik with Backblaze’s B2 Cloud Storage last fall, and since then we’ve been amazed by customers like Everwell, who replaced all their on-premises storage with a fully cloud-based production workflow. For existing Backblaze customers, iconik can speed up your deployment by ingesting content already uploaded to B2 without having to download files and upload them again.
You can also stop by the Cantemo booth anytime during IBC to see a live demo of iconik and Backblaze in action. Or schedule an appointment and we’ll have a special gift waiting for you.

Join us at Cantemo on Friday 13 September from 16:30-18:00 at Hall 7 — 7.D67

Prost! on Saturday With Archiware

With the latest release of their P5 Archive featuring B2 support, Archiware makes archiving to the cloud even easier. Archiware customers with large existing archives can use the Backblaze Fireball to rapidly import archived content directly to their B2 account. At IBC, we’re also unveiling our latest joint customer, Baron & Baron, a creative agency that turned to P2 and B2 to back up and archive their dazzling array of fashion and luxury brand content.

Join us at Archiware on Saturday 14 September from 16:30-18:00 at Hall 7 — 7.D35

Cheers! on Sunday With Ortana

Ortana integrated their Cubix media asset management and orchestration platform with B2 way back in 2016 during B2’s beta period, making them among our first media workflow partners. More recently, Ortana joined our Migrate or Die webinar and blog series, detailing strategies for how you can migrate archived content from legacy platforms before they go extinct.

Join us at Ortana on Sunday 15 September from 16:30-18:00 at Hall 7 — 7.C63

Cheers! on Monday With GB Labs

If you were at the NAB Show last April, you may have heard GB Labs was integrating their automation tools with B2. It’s official now, as detailed in their announcement in June. GB Labs’ automation allows you to streamline tasks that would otherwise require tedious and repetitive manual processes, and now supports moving files to and from your B2 account.

Join us at GB Labs Monday 16 September from 17:00-18:00 at Hall 7 — 7.B26

Say Hello Anytime to Our Friends at CatDV

CatDV media asset management helps teams organize, communicate, and collaborate effectively, including archiving content to B2. CatDV has been integrated with B2 for over two years, allowing us to serve customers like UC Silicon Valley, who built an end-to-end collaborative workflow for a 22 member team creating online learning videos.

Stop by CatDV anytime at Hall 7 — 7.A51

But we’re not the only ones making a long trek to Amsterdam for IBC. While you’re roaming around Hall 7, be sure to stop by our other partners traveling from near and far to learn what our joint solutions can do for you:

  • EditShare (shared storage with MAM) Hall 7 — 7.A35
  • ProMax (shared storage with MAM) Hall 7 — 7.D55
  • StorageDNA (smart migration and storage) Hall 7 — 7.A32
  • FileCatalyst (large file transfer) Hall 7 — 7.D18
  • eMAM (web-based DAM) Hall 7 — 7.D27
  • Facilis Technology (shared storage) Hall 7 — 7.B48
  • GrayMeta (metadata extraction and insight) Hall 7 — 7.D25
  • Hedge (backup software) Hall 7 — 7.A56
  • axle ai (asset management) Hall 7 — 7.D33
  • Tiger Technology (tiered data management) Hall 7 — 7.B58

We’re hoping you’ll join us for one or more of our Partner Crawl parties. If you want a quieter place and time to discuss how B2 can streamline your workflow, please schedule an appointment with us so we can give you the attention you need.

Finally, if you can’t join us in Amsterdam, open a beer, pour a glass of wine or other drink, and toast to our new European data center, wherever you are, in whatever language you speak. As we say here in the States, Bottoms up!

The post A Toast to Our Partners in Europe at IBC appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcing Our First European Data Center

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/announcing-our-first-european-data-center/

city view of Amsterdam, Netherlands

Big news: Our first European data center, in Amsterdam, is open and accepting customer data!

This is our fourth data center (DC) location and the first outside of the western United States. As longtime readers know, we have two DCs in the Sacramento, California area and one in the Phoenix, Arizona area. As part of this launch, we are also introducing the concept of regions.

When creating a Backblaze account, customers can choose whether that account’s data will be stored in the EU Central or US West region. The choice made at account creation time will dictate where all of that account’s data is stored, regardless of product choice (Computer Backup or B2 Cloud Storage). For customers wanting to store data in multiple regions, please read this knowledge base article on how to control multiple Backblaze accounts using our (free) Groups feature.

Whether you choose EU Central or US West, your pricing for our products will be unchanged:

  • For B2 Cloud Storage — it’s $0.005/GB/Month. For comparison, storing your data in Amazon S3’s Ireland region will cost ~4.5x more
  • For Computer Backup — $60/Year/Computer is the monthly cost of our industry leading, unlimited data backup for desktops/laptops

Later this week we will be publishing more details on the process we undertook to get to this launch. Here’s a sneak preview:

  • Wednesday, August 28: Getting Ready to Go (to Europe). How do you even begin to think about opening a DC that isn’t within any definition of driving distance? For the vast majority of companies on the planet, simply figuring out how to get started is a massive undertaking. We’ll be sharing a little more on how we thought about our requirements, gathered information, and the importance of NATO in the whole equation.
  • Thursday, August 29: The Great European (Non) Vacation. With all the requirements done, research gathered, and preliminary negotiations held, there comes a time when you need to jump on a plane and go meet your potential partners. For John & Chris, that meant 10 data center tours in 72 hours across three countries — not exactly a relaxing summer holiday, but vitally important!
  • Friday, August 30: Making a Decision. After an extensive search, we are very pleased to have found our partner in Interxion! We’ll share a little more about the process of narrowing down the final group of candidates and selecting our newest partner.
If you’re interested in learning more about the physical process of opening up a data center, check out our post on the seven days prior to opening our Phoenix DC.

New Data Center FAQs:

Q: Does the new DC mean Backblaze has multi-region storage?
A: Yes, by leveraging our Groups functionality. When creating an account, users choose where their data will be stored. The default option will store data in US West, but to choose EU Central, simply select that option in the pull-down menu.

Region selector
Choose EU Central for data storage

If you create a new account with EU Central selected and have an existing account that’s in US West, you can put both of them in a Group, and manage them from there! Learn more about that in our Knowledge Base article.

Q: I’m an existing customer and want to move my data to Europe. How do I do that?
A: At this time, we do not support moving existing data within Backblaze regions. While it is something on our roadmap to support, we do not have an estimated release date for that functionality. However, any customer can create a new account and upload data to Europe. Customers with multiple accounts can administer those accounts via our Groups feature. For more details on how to do that, please see this Knowledge Base article. Existing customers can create a new account in the EU Central region and then upload data to it; they can then either keep or delete the previous Backblaze account in US West.

Q: Finally! I’ve been waiting for this and am ready to get started. Can I use your rapid ingest device, the B2 Fireball?
A: Yes! However, as of the publication of this post, all Fireballs will ship back to one of our U.S. facilities for secure upload (regardless of account location). By the end of the year, we hope to offer Fireball support natively in Europe (so a Fireball with a European customer’s data will never leave the EU).

Q: Does this mean that my data will never leave the EU?
A: Any data uploaded by the customer does not leave the region it was uploaded to unless at the explicit direction of the customer. For example, restores and snapshots of data stored in Europe can be downloaded directly from Europe. However, customers requesting an encrypted hard drive with their data on it will have that drive prepared from a secure U.S. location. In addition, certain metadata about customer accounts (e.g. email address for your account) reside in the U.S. For more information on our privacy practices, please read our Privacy Policy.

Q: What are my payment options?
A: All payments to Backblaze are made in U.S. dollars. To get started, you can enter your credit card within your account.

Q: What’s next?
A: We’re actively working on region selection for individual B2 Buckets (instead of Backblaze region selection on an account basis), which should open up a lot more interesting workflows! For example, customers who want can create geographic redundancy for data within one B2 account (and for those who don’t want to set that up, they can sleep well knowing they have 11 nines of durability).

We like to develop the features and functionality that our customers want. The decision to open up a data center in Europe is directly related to customer interest. If you have requests or questions, please feel free to put them in the comment section below.

The post Announcing Our First European Data Center appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats Q2 2019

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-q2-2019/

Backblaze Drive Stats Q2 2019
< ve models that have been around for several years, take a look at how our 14 TB Toshiba drives are doing (spoiler alert: great), and along the way we’ll provide a handful of insights and observations from inside our storage cloud. As always, we’ll publish the data we use in these reports on our Hard Drive Test Data web page and we look forward to your comments.

Hard Drive Failure Stats for Q2 2019

At the end of Q2 2019, Backblaze was using 108,660 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 108,461 hard drives. The table below covers what happened in Q2 2019.

Backblaze Q2 2019 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0 percent, it means there were no drive failures of that model during Q2 2019 — lifetime failure rates are later in this report. The two drives listed with zero failures in Q2 were the 4 TB and 14 TB Toshiba models. The Toshiba 4 TB drive doesn’t have a large enough number of drives or drive days to be statistically reliable, but only one drive of that model has failed in the last three years. We’ll dig into the 14 TB Toshiba drive stats a little later in the report.

There were 199 drives (108,660 minus 108,461) that were not included in the list above because they were used as testing drives or we did not have at least 60 of a given drive model. We now use 60 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics as there are 60 drives in all newly deployed Storage Pods — older Storage Pod models had a minimum of 45.

2,000 Backblaze Storage Pods? Almost…

We currently have 1,980 Storage Pods in operation. All are version 5 or version 6 as we recently gave away nearly all of the older Storage Pods to folks who stopped by our Sacramento storage facility. Nearly all, as we have a couple in our Storage Pod museum. There are currently 544 version 5 pods each containing 45 data drives, and there are 1436 version 6 pods each containing 60 data drives. The next time we add a Backblaze Vault, which consists of 20 Storage Pods, we will have 2,000 Backblaze Storage Pods in operation.

Goodbye Western Digital

In Q2 2019, the last of the Western Digital 6 TB drives were retired from service. The average age of the drives was 50 months. These were the last of our Western Digital branded data drives. When Backblaze was first starting out, the first data drives we deployed en masse were Western Digital Green 1 TB drives. So, it is with a bit of sadness to see our Western Digital data drive count go to zero. We hope to see them again in the future.

WD Ultrastar 14 TB DC HC530

Hello “Western Digital”

While the Western Digital brand is gone, the HGST brand (owned by Western Digital) is going strong as we still have plenty of the HGST branded drives, about 20 percent of our farm, ranging in size from 4 to 12 TB. In fact, we added over 4,700 HGST 12 TB drives in this quarter.

This just in; rumor has it there are twenty 14 TB Western Digital Ultrastar drives getting readied for deployment and testing in one of our data centers. It appears Western Digital has returned: stay tuned.

Goodbye 5 TB Drives

Back in Q1 2015, we deployed 45 Toshiba 5 TB drives. They were the only 5 TB drives we deployed as the manufacturers quickly moved on to larger capacity drives, and so did we. Yet, during their four plus years of deployment only two failed, with no failures since Q2 of 2016 — three years ago. This made it hard to say goodbye, but buying, stocking, and keeping track of a couple of 5 TB spare drives was not optimal, especially since these spares could not be used anywhere else. So yes, the Toshiba 5 TB drives were the odd ducks on our farm, but they were so good they got to stay for over four years.

Hello Again Toshiba 14 TB Toshiba Drives

We’ve mentioned the Toshiba 14 TB drives in previous reports, now we can dig in a little deeper given that they have been deployed almost nine months and we have some experience working with them. These drives got off to a bit of a rocky start, with six failures in the first three months of being deployed. Since then, there has been only one additional failure, with no failures reported in Q2 2019. The result is that the lifetime annualized failure rate for the Toshiba 14 TB drives has decreased to a very respectable 0.78% as shown in the lifetime table in the following section.

Lifetime Hard Drive Stats

The table below shows the lifetime failure rates for the hard drive models we had in service as of June 30, 2019. This is over the period beginning in April 2013 and ending June 30, 2019.

Backblaze Lifetime Hard Drive Annualized Failure Rates

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know if you find anything interesting.

If you just want the tables we used to create the charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.

The post Backblaze Hard Drive Stats Q2 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Shocking Truth — Managing for Hard Drive Failure and Data Corruption

Post Syndicated from Skip Levens original https://www.backblaze.com/blog/managing-for-hard-drive-failures-data-corruption/

hard disk drive covered in 0s, 1s, ?s

Ah, the iconic 3.5″ hard drive, now approaching a massive 16TB of storage capacity. Backblaze storage pods fit 60 of these drives in a single pod, and with well over 750 petabytes of customer data under management in our data centers, we have a lot of hard drives under management.

Yet most of us have just one, or only a few of these massive drives at a time storing our most valuable data. Just how safe are those hard drives in your office or studio? Have you ever thought about all the awful, terrible things that can happen to a hard drive? And what are they, exactly?

It turns out there are a host of obvious physical dangers, but also other, less obvious, errors that can affect the data stored on your hard drives, as well.

Dividing by One

It’s tempting to store all of your content on a single hard drive. After all, the capacity of these drives gets larger and larger, and they offer great performance of up to 150 MB/s. It’s true that flash-based hard drives are far faster, but the dollars per gigabyte price is also higher, so for now, the traditional 3.5″ hard drive holds most data today.

However, having all of your precious content on a single, spinning hard drive is a true tightrope without a net experience. Here’s why.

Drivesaver Failure Analysis by the Numbers

Drive failures by possible external force

I asked our friends at Drivesavers, specialists in recovering data from drives and other storage devices, for some analysis of the hard drives brought into their labs for recovery. What were the primary causes of failure?

Reason One: Media Damage

The number one reason, accounting for 70 percent of failures, is media damage, including full head crashes.

Modern hard drives stuff multiple, ultra thin platters inside that 3.5 inch metal package. These platters spin furiously at 5400 or 7200 revolutions per minute — that’s 90 or 120 revolutions per second! The heads that read and write magnetic data on them sweep back and forth only 6.3 micrometers above the surface of those platters. That gap is about 1/12th the width of a human hair and a miracle of modern technology to be sure. As you can imagine, a system with such close tolerances is vulnerable to sudden shock, as evidenced by Drivesavers’ results.

This damage occurs when the platters receive shock, i.e. physical damage from impact to the drive itself. Platters have been known to shatter, or have damage to their surfaces, including a phenomenon called head crash, where the flying heads slam into the surface of the platters. Whatever the cause, the thin platters holding 1s and 0s can’t be read.

It takes a surprisingly small amount of force to generate a lot of shock energy to a hard drive. I’ve seen drives fail after simply tipping over when stood on end. More typically, drives are accidentally pushed off of a desktop, or dropped while being carried around.

A drive might look fine after a drop, but the damage may have been done. Due to their rigid construction, heavy weight, and how often they’re dropped on hard, unforgiving surfaces, these drops can easily generate the equivalent of hundreds of g-forces to the delicate internals of a hard drive.

To paraphrase an old (and morbid) parachutist joke, it’s not the fall that gets you, it’s the sudden stop!

Reason Two: PCB Failure

The next largest cause is circuit board failure, accounting for 18 percent of failed drives. Printed circuit boards (PCBs), those tiny green boards seen on the underside of hard drives, can fail in the presence of moisture or static electric discharge like any other circuit board.

Reason Three: Stiction

Next up is stiction (a portmanteau of friction and sticking), which occurs when the armatures that drive those flying heads actually get stuck in place and refuse to operate, usually after a long period of disuse. Drivesavers found that stuck armatures accounted for 11 percent of hard drive failures.

It seems counterintuitive that hard drives sitting quietly in a dark drawer might actually contribute to its failure, but I’ve seen many older hard drives pulled from a drawer and popped into a drive carrier or connected to power just go thunk. It does appear that hard drives like to be connected to power and constantly spinning and the numbers seem to bear this out.

Reason Four: Motor Failure

The last, and least common cause of hard drive failure, is hard drive motor failure, accounting for only 1 percent of failures, testament again to modern manufacturing precision and reliability.

Mitigating Hard Drive Failure Risk

So now that you’ve seen the gory numbers, here are a few recommendations to guard against the physical causes of hard drive failure.

1. Have a physical drive handling plan and follow it rigorously

If you must keep content on single hard drives in your location, make sure your team follows a few guidelines to protect against moisture, static electricity, and drops during drive handling. Keeping the drives in a dry location, storing the drives in static bags, using static discharge mats and wristbands, and putting rubber mats under areas where you’re likely to accidentally drop drives can all help.

It’s worth reviewing how you physically store drives, as well. Drivesavers tells us that the sudden impact of a heavy drawer of hard drives slamming home or yanked open quickly might possibly damage hard drives!

2. Spread failure risk across more drives and systems

Improving physical hard drive handling procedures is only a small part of a good risk-reducing strategy. You can immediately reduce the exposure of a single hard drive failure by simply keeping a copy of that valuable content on another drive.This is a common approach for videographers moving content from cameras shooting in the field back to their editing environment. By simply copying content over from one fast drive to another, the odds of both drives failing at once are less likely. This is certainly better than keeping content on only a single drive, but definitely not a great long-term solution.

Multiple drive NAS and RAID systems reduce the impact of failing drives even further. A RAID 6 system composed of eight drives not only has much faster read and write performance than a single drive, but two of its drives can fail and still serve your files, giving you time to replace those failed drives.

Mitigating Data Corruption Risk

The Risk of Bit Flips

Beyond physical damage, there’s another threat to the files stored on hard disks: small, silent bit flip errors often called data corruption or bit rot.

Bit rot errors occur when individual bits in a stream of data in files change from one state to another (positive or negative, 0 to 1, and vice versa). These errors can happen to hard drive and flash storage systems at rest, or be introduced as a file is copied from one hard drive to another.

While hard drives automatically correct single-bit flips on the fly, larger bit flips can introduce a number of errors. This can either cause the program accessing them to halt or throw an error, or perhaps worse, lead you to think that the file with the errors is fine!

Bit Flip Errors by the Book

In a landmark study of data failures in large systems, Disk failures in the real world:
What does an MTTF of 1,000,000 hours mean to you?
, Bianca Schroeder and Garth A. Gibson reported that “a large number of the problems attributed to CPU and memory failures were triggered by parity errors, i.e. the number of errors is too large for the embedded error correcting code to correct them.”

Flash drives are not immune either. Bianca Shroeder recently published a similar study of flash drives, Flash Reliability in Production: The Expected and the Unexpected, and found that “…between 20-63% of drives experienced at least one of the (unrecoverable read errors) during the time it was in production. In addition, between 2-6 out of 1,000 drive days were affected.”

“These UREs are almost exclusively due to bit corruptions that ECC cannot correct. If a drive encounters a URE, the stored data cannot be read. This either results in a failed read in the user’s code, or if the drives are in a RAID group that has replication, then the data is read from a different drive.”

Exactly how prevalent bit flips are is a controversial subject, but if you’ve ever retrieved a file from an old hard drive or RAID system and see sparkles in video, corrupt document files, or lines or distortions in pictures, you’ve seen the results of these errors.

Protecting Against Bit Flip Errors

There are many approaches to catching and correcting bit flip errors. From a system designer standpoint they usually involve some combination of multiple disk storage systems, multiple copies of content, data integrity checks and corrections, including error-correcting code memory, physical component redundancy, and a file system that can tie it all together.

Backblaze has built such a system, and uses a number of techniques to detect and correct file degradation due to bit flips and deliver extremely high data durability and integrity, often in conjunction with Reed-Solomon erasure codes.

Thanks to the way object storage and Backblaze B2 works, files written to B2 are always retrieved exactly as you originally wrote them. If a file ever changes from the time you’ve written it, say, due to bit flip errors, it will either be reproduced from a redundant copy of your file, or even mathematically reconstructed with erasure codes.

So the simplest, and certainly least expensive way to get bit flip protection for the content sitting on your hard drives is to simply have another copy on cloud storage.

Resources:

The Ideal Solution — Performance and Protection

With some thought, you can apply these protection steps to your environment and get the best of both worlds: the performance of your content on fast, local hard drives, and the protection of having a copy on object storage offsite with the ultimate data integrity.

The post The Shocking Truth — Managing for Hard Drive Failure and Data Corruption appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Vaults: Zettabyte-Scale Cloud Storage Architecture

Post Syndicated from Brian Beach original https://www.backblaze.com/blog/vault-cloud-storage-architecture/

A lot has changed in the four years since Brian Beach wrote a post announcing Backblaze Vaults, our software architecture for cloud data storage. Just looking at how the major statistics have changed, we now have over 100,000 hard drives in our data centers instead of the 41,000 mentioned in the post video. We have three data centers (soon four) instead of one data center. We’re approaching one exabyte of data stored for our customers (almost seven times the 150 petabytes back then), and we’ve recovered over 41 billion files for our customers, up from the 10 billion in the 2015 post.

In the original post, we discussed having durability of seven nines. Shortly thereafter, it was upped to eight nines. In July of 2018, we took a deep dive into the calculation and found our durability closer to eleven nines (and went into detail on the calculations used to arrive at that number). And, as followers of our Hard Drive Stats reports will be interested in knowing, we’ve just started using our first 16 TB drives, which are twice the size of the biggest drives we used back at the time of this post — then a whopping eight TB.

We’ve updated the details here and there in the text from the original post that was published on our blog on March 11, 2015. We’ve left the original 135 comments intact, although some of them might be non sequiturs after the changes to the post. We trust that you will be able to sort out the old from the new and make sense of what’s changed. If not, please add a comment and we’ll be happy to address your questions.

— Editor

Storage Vaults form the core of Backblaze’s cloud services. Backblaze Vaults are not only incredibly durable, scalable, and performant, but they dramatically improve availability and operability, while still being incredibly cost-efficient at storing data. Back in 2009, we shared the design of the original Storage Pod hardware we developed; here we’ll share the architecture and approach of the cloud storage software that makes up a Backblaze Vault.

Backblaze Vault Architecture for Cloud Storage

The Vault design follows the overriding design principle that Backblaze has always followed: keep it simple. As with the Storage Pods themselves, the new Vault storage software relies on tried and true technologies used in a straightforward way to build a simple, reliable, and inexpensive system.

A Backblaze Vault is the combination of the Backblaze Vault cloud storage software and the Backblaze Storage Pod hardware.

Putting The Intelligence in the Software

Another design principle for Backblaze is to anticipate that all hardware will fail and build intelligence into our cloud storage management software so that customer data is protected from hardware failure. The original Storage Pod systems provided good protection for data and Vaults continue that tradition while adding another layer of protection. In addition to leveraging our low-cost Storage Pods, Vaults take advantage of the cost advantage of consumer-grade hard drives and cleanly handle their common failure modes.

Distributing Data Across 20 Storage Pods

A Backblaze Vault is comprised of 20 Storage Pods, with the data evenly spread across all 20 pods. Each Storage Pod in a given vault has the same number of drives, and the drives are all the same size.

Drives in the same drive position in each of the 20 Storage Pods are grouped together into a storage unit we call a tome. Each file is stored in one tome and is spread out across the tome for reliability and availability.

20 hard drives create 1 tome that share parts of a file.

Every file uploaded to a Vault is divided into pieces before being stored. Each of those pieces is called a shard. Parity shards are computed to add redundancy, so that a file can be fetched from a vault even if some of the pieces are not available.

Each file is stored as 20 shards: 17 data shards and three parity shards. Because those shards are distributed across 20 Storage Pods, the Vault is resilient to the failure of a Storage Pod.

Files can be written to the Vault when one pod is down and still have two parity shards to protect the data. Even in the extreme and unlikely case where three Storage Pods in a Vault lose power, the files in the vault are still available because they can be reconstructed from any of the 17 pods that are available.

Storing Shards

Each of the drives in a Vault has a standard Linux file system, ext4, on it. This is where the shards are stored. There are fancier file systems out there, but we don’t need them for Vaults. All that is needed is a way to write files to disk and read them back. Ext4 is good at handling power failure on a single drive cleanly without losing any files. It’s also good at storing lots of files on a single drive and providing efficient access to them.

Compared to a conventional RAID, we have swapped the layers here by putting the file systems under the replication. Usually, RAID puts the file system on top of the replication, which means that a file system corruption can lose data. With the file system below the replication, a Vault can recover from a file system corruption because a single corrupt file system can lose at most one shard of each file.

Creating Flexible and Optimized Reed-Solomon Erasure Coding

Just like RAID implementations, the Vault software uses Reed-Solomon erasure coding to create the parity shards. But, unlike Linux software RAID, which offers just one or two parity blocks, our Vault software allows for an arbitrary mix of data and parity. We are currently using 17 data shards plus three parity shards, but this could be changed on new vaults in the future with a simple configuration update.

Vault Row of Storage Pods

For Backblaze Vaults, we threw out the Linux RAID software we had been using and wrote a Reed-Solomon implementation from scratch, which we wrote about in Backblaze Open Sources Reed-Solomon Erasure Coding Source Code. It was exciting to be able to use our group theory and matrix algebra from college.

The beauty of Reed-Solomon is that we can then re-create the original file from any 17 of the shards. If one of the original data shards is unavailable, it can be re-computed from the other 16 original shards, plus one of the parity shards. Even if three of the original data shards are not available, they can be re-created from the other 17 data and parity shards. Matrix algebra is awesome!

Handling Drive Failures

The reason for distributing the data across multiple Storage Pods and using erasure coding to compute parity is to keep the data safe and available. How are different failures handled?

If a disk drive just up and dies, refusing to read or write any data, the Vault will continue to work. Data can be written to the other 19 drives in the tome, because the policy setting allows files to be written as long as there are two parity shards. All of the files that were on the dead drive are still available and can be read from the other 19 drives in the tome.

Building a Backblaze Vault Storage Pod

When a dead drive is replaced, the Vault software will automatically populate the new drive with the shards that should be there; they can be recomputed from the contents of the other 19 drives.

A Vault can lose up to three drives in the same tome at the same moment without losing any data, and the contents of the drives will be re-created when the drives are replaced.

Handling Data Corruption

Disk drives try hard to correctly return the data stored on them, but once in a while they return the wrong data, or are just unable to read a given sector.

Every shard stored in a Vault has a checksum, so that the software can tell if it has been corrupted. When that happens, the bad shard is recomputed from the other shards and then re-written to disk. Similarly, if a shard just can’t be read from a drive, it is recomputed and re-written.

Conventional RAID can reconstruct a drive that dies, but does not deal well with corrupted data because it doesn’t checksum the data.

Scaling Horizontally

Each vault is assigned a number. We carefully designed the numbering scheme to allow for a lot of vaults to be deployed, and designed the management software to handle scaling up to that level in the Backblaze data centers.

The overall design scales very well because file uploads (and downloads) go straight to a vault, without having to go through a central point that could become a bottleneck.

There is an authority server that assigns incoming files to specific Vaults. Once that assignment has been made, the client then uploads data directly to the Vault. As the data center scales out and adds more Vaults, the capacity to handle incoming traffic keeps going up. This is horizontal scaling at its best.

We could deploy a new data center with 10,000 Vaults holding 16TB drives and it could accept uploads fast enough to reach its full capacity of 160 exabytes in about two months!

Backblaze Vault Benefits

The Backblaze Vault architecture has six benefits:

1. Extremely Durable

The Vault architecture is designed for 99.999999% (eight nines) annual durability (now 11 nines — Editor). At cloud-scale, you have to assume hard drives die on a regular basis, and we replace about 10 drives every day. We have published a variety of articles sharing our hard drive failure rates.

The beauty with Vaults is that not only does the software protect against hard drive failures, it also protects against the loss of entire Storage Pods or even entire racks. A single Vault can have three Storage Pods — a full 180 hard drives — die at the exact same moment without a single byte of data being lost or even becoming unavailable.

2. Infinitely Scalable

A Backblaze Vault is comprised of 20 Storage Pods, each with 60 disk drives, for a total of 1200 drives. Depending on the size of the hard drive, each vault will hold:

12TB hard drives => 12.1 petabytes/vault (Deploying today.)
14TB hard drives => 14.2 petabytes/vault (Deploying today.)
16TB hard drives => 16.2 petabytes/vault (Small-scale testing.)
18TB hard drives => 18.2 petabytes/vault (Announced by WD & Toshiba)
20TB hard drives => 20.2 petabytes/vault (Announced by Seagate)

Backblaze Data Center

At our current growth rate, Backblaze deploys one to three Vaults each month. As the growth rate increases, the deployment rate will also increase. We can incrementally add more storage by adding more and more Vaults. Without changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 90 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 90 zettabytes! (Also knowWithout changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 160 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 160 zettabytes! (Also known as 160,000,000,000,000 GB.)

3. Always Available

Data backups have always been highly available: if a Storage Pod was in maintenance, the Backblaze online backup application would contact another Storage Pod to store data. Previously, however, if a Storage Pod was unavailable, some restores would pause. For large restores this was not an issue since the software would simply skip the Storage Pod that was unavailable, prepare the rest of the restore, and come back later. However, for individual file restores and remote access via the Backblaze iPhone and Android apps, it became increasingly important to have all data be highly available at all times.

The Backblaze Vault architecture enables both data backups and restores to be highly available.

With the Vault arrangement of 17 data shards plus three parity shards for each file, all of the data is available as long as 17 of the 20 Storage Pods in the Vault are available. This keeps the data available while allowing for normal maintenance and rare expected failures.

4. Highly Performant

The original Backblaze Storage Pods could individually accept 950 Mbps (megabits per second) of data for storage.

The new Vault pods have more overhead, because they must break each file into pieces, distribute the pieces across the local network to the other Storage Pods in the vault, and then write them to disk. In spite of this extra overhead, the Vault is able to achieve 1,000 Mbps of data arriving at each of the 20 pods.

Backblaze Vault Networking

This capacity required a new type of Storage Pod that could handle this volume. The net of this: a single Vault can accept a whopping 20 Gbps of data.

Because there is no central bottleneck, adding more Vaults linearly adds more bandwidth.

5. Operationally Easier

When Backblaze launched in 2008 with a single Storage Pod, many of the operational analyses (e.g. how to balance load) could be done on a simple spreadsheet and manual tasks (e.g. swapping a hard drive) could be done by a single person. As Backblaze grew to nearly 1,000 Storage Pods and over 40,000 hard drives, the systems we developed to streamline and operationalize the cloud storage became more and more advanced. However, because our system relied on Linux RAID, there were certain things we simply could not control.

With the new Vault software, we have direct access to all of the drives and can monitor their individual performance and any indications of upcoming failure. And, when those indications say that maintenance is needed, we can shut down one of the pods in the Vault without interrupting any service.

6. Astoundingly Cost Efficient

Even with all of these wonderful benefits that Backblaze Vaults provide, if they raised costs significantly, it would be nearly impossible for us to deploy them since we are committed to keeping our online backup service affordable for completely unlimited data. However, the Vault architecture is nearly cost neutral while providing all these benefits.

Backblaze Vault Cloud Storage

When we were running on Linux RAID, we used RAID6 over 15 drives: 13 data drives plus two parity. That’s 15.4% storage overhead for parity.

With Backblaze Vaults, we wanted to be able to do maintenance on one pod in a vault and still have it be fully available, both for reading and writing. And, for safety, we weren’t willing to have fewer than two parity shards for every file uploaded. Using 17 data plus three parity drives raises the storage overhead just a little bit, to 17.6%, but still gives us two parity drives even in the infrequent times when one of the pods is in maintenance. In the normal case when all 20 pods in the Vault are running, we have three parity drives, which adds even more reliability.

Summary

Backblaze’s cloud storage Vaults deliver 99.999999% (eight nines) annual durability (now 11 nines — Editor), horizontal scalability, and 20 Gbps of per-Vault performance, while being operationally efficient and extremely cost effective. Driven from the same mindset that we brought to the storage market with Backblaze Storage Pods, Backblaze Vaults continue our singular focus of building the most cost-efficient cloud storage available anywhere.

•  •  •

Note: This post was updated from the original version posted on March 11, 2015.

The post Backblaze Vaults: Zettabyte-Scale Cloud Storage Architecture appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Connect Veeam to the B2 Cloud: Episode 4 — Using Morro Data CloudNAS

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/connect-veeam-to-the-b2-cloud-episode-4-using-morro-data-cloudnas/

Veeam backup to Backblaze B2 Episode 4 of Series

In the fourth post in our series on connecting Veeam with B2, we provide a guide on how to back up your VMs to Backblaze B2 using Veeam and Morro Data’s CloudNAS. In our previous posts, we covered how to connect Veeam to the B2 cloud using OpenDedupe, connect Veeam to the B2 cloud using Synology, and connect Veeam with B2 using StarWind VTL.

VM Backup to B2 Using Veeam Backup & Replication and Morro Data CloudNAS

We are glad to show how Veeam Backup & Replication can work with Morro Data CloudNAS to keep the more recent backups on premises for fast recovery while archiving all backups in B2 Cloud Storage. CloudNAS not only caches the more recent backup files, but also simplifies the management of B2 Cloud Storage with a network share or drive letter interface.

–Paul Tien, Founder & CEO, Morro Data

VM backup and recovery is a critical part of IT operations that supports business continuity. Traditionally, IT has deployed an array of purpose-built backup appliances and applications to protect against server, infrastructure, and security failures. As VMs continue to spread in production, development, and verification environments, the expanding VM backup repository has become a major challenge for system administrators.

Because the VM backup footprint is usually quite large, cloud storage is increasingly being deployed for VM backup. However, cloud storage does not achieve the same performance level as on-premises storage for recovery operation. For this reason, cloud storage has been used as tiered repository behind on-premises storage.

diagram of Veeam backing up to B2 using Cloudflare and Morro Data CloudNAS

In this best practice guide, VM Backup to B2 Using Veeam Backup & Replication and Morro Data CloudNAS, we will show how Veeam Backup & Replication can work with Morro Data CloudNAS to keep the most recent backups on premises for fast recovery while archiving all backups in the retention window in Backblaze B2 cloud storage. CloudNAS caching not only provides buffer for most recent backup files, but also simplifies the management of on-premises storage and cloud storage as an integral backup repository.

Tell Us How You’re Backing Up Your VMs

If you’re backing up VMs to B2 using one of the solutions we’ve written about in this series, we’d like to hear from you in the comments about how it’s going.

View all posts in the Veeam series.

The post Connect Veeam to the B2 Cloud: Episode 4 — Using Morro Data CloudNAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Survey Says: Cloud Storage Makes Strong Gains for Media & Entertainment

Post Syndicated from Janet Lafleur original https://www.backblaze.com/blog/cloud-storage-makes-strong-gains-in-media-entertainment/

Survey Reveals Growing Adoption of Cloud Storage by Media & Entertainment

Where Does the Media Industry Really Use Cloud Storage?

Our new cloud survey results might surprise you.

Predicting which promising new technologies will be adopted quickly, which ones will take longer, and which ones will fade away is not always easy. When the iPhone was introduced in 2007, only 6% of the US population had smartphones. In less than 10 years, over 80% of Americans owned smartphones. In contrast, video telephone calls demonstrated at the 1964 New York World’s Fair only became commonplace 45 years later with the advent of FaceTime. And those flying cars people have dreamed of since the 1950s? Don’t hold your breath.

What about cloud storage? Who is adopting it today and for what purposes?

“While M&E professionals are not abandoning existing storage alternatives, they increasingly see the public cloud in storage applications as simply another professional tool to achieve their production, distribution, and archiving goals. For the future, that trend looks to continue as the public cloud takes on an even greater share of their overall storage requirements.”

— Phil Kurz, contributing editor, TV Technology

At Backblaze, we have a front-line view of how customers use cloud for storage. And based on the media-oriented customers we’ve directly worked with to integrate cloud storage, we know they’re using cloud storage throughout the workflow: backing up files during content creation (UCSC Silicon Valley), managing production storage more efficiently (WunderVu), archiving of historical content libraries (Austin City Limits), hosting media files for download (American Public Television), and even editing cloud-based video (Everwell).

We wanted to understand more about how the broader industry uses cloud storage and their beliefs and concerns about it, so we could better serve the needs of our current customers and anticipate what their needs will be in the future.

We decided to sponsor an in-depth survey with TV Technology, a media company that for over 30 years has been an authority for news, analysis and trend reports serving the media and entertainment industries. While TV Technology had conducted a similar survey in 2015, we thought it’d be interesting to see how the industry outlook has evolved. Based on our 2019 results, it certainly has. As a quick example, security was a concern for 71% of respondents in 2015. This year, only 38% selected security as an issue at all.

Survey Methodology — 246 Respondents and 15 Detailed Questions

For the survey, TV Technology queried 246 respondents, primarily from production and post-production studios and broadcasters, but also other market segments including corporate video, government, and education. See chart below for the breakdown. Respondents were asked 15 questions about their cloud storage usage today and in the future, and for what purpose. The survey queried what motivated their move to the cloud, their expectations for access times and cost, and any obstacles that are preventing further cloud adoption.

Types of businesses responding to survey

Survey Insights — Half Use Public Cloud Today — Cloud the Top Choice for Archive

Overall, the survey reveals growing cloud adoption for media organizations who want to improve production efficiency and to reduce costs. Key findings from the report include:

  • On the whole, about half of the respondents from all organization types are using public cloud services. Sixty-four percent of production/post studio respondents say they currently use the cloud. Broadcasters report lower adoption, with only 26 percent using the public cloud.
  • Achieving greater efficiency in production was cited by all respondents as the top reason for adopting the cloud. However, while this is also important to broadcasters, their top motivator for cloud use is cost containment or internal savings programs.
  • Cloud storage is clearly the top choice for archiving media assets, with 70 percent choosing the public cloud for active, deep, or very deep archive needs.
  • Concerns over the security of assets stored in a public cloud remain, however they have been assuaged greatly compared to the 2015 report, so much so that they are no longer the top obstacle to cloud adoption. For 40%, pricing has replaced security as the top concern.

These insights only scratch the surface of the survey’s findings, so we’re making the full 12 page report available to everyone. To get a deeper look and compare your experiences to your peers as a content creator or content owner, download and read Cloud Storage Technologies Establish Their Place Among Alternatives for Media today.

How are you using cloud storage today? How do you think that will change three years from now? Please tell us in the comments.

The post Survey Says: Cloud Storage Makes Strong Gains for Media & Entertainment appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.