Everything that makes working at a creative agency exciting also makes it challenging. With each new client, creative teams are working on something different. One day they’re on site, shooting a video for a local business, the next they’re sifting through last year’s concert footage for highlights to promote this year’s event. When their juices are flowing, it’s as easy for them to lose track of the files they need as it is for them to lose track of time.
If you’re tasked with making sure a team’s content is protected every day, as well as ensuring that it’s organized and saved for the future, we have some tips to make your job easier. Because we know you’d rather be working on your own projects, not babysitting backups or fetching years-old content from a dusty archive closet.
Since we’re sure you’re not making obvious mistakes—like expecting creatives to manually archive their own content, or not having a 3-2-1 backup strategy—we’ll focus on the not-so-obvious tips. Many of these come straight from our own creative agency customers who learned the hard way, before they rolled out a cloud-based backup and archive solution.
Tip #1—Save everything when a client’s project is completed
For successful creative agencies, there’s no such thing as “former” clients, only clients that you haven’t worked with lately. That means your job managing client data isn’t over when the project is delivered. You need to properly archive everything: not just the finished videos, images or layouts, but all the individual assets created for the project and all the raw footage.
It’s not unusual for clients to request raw footage, even years after the project is complete. If you only saved master copies and can’t send them all of their source footage, your client may question how you manage their content, which could impact their trust in you for future projects.
The good news is that if you have an organized, accessible content archive, it’s easy to send a drive or even a download link to a client. It may even be possible for you to charge clients to retrieve and deliver their content to them.
Tip #2—Stop using external drives for backup or archive
If your agency uses external disk drives to back up or archive your projects, you’re not alone. Creative teams do it because it’s dead simple: you plug the drive in, copy project files to it, unplug the drive, and put it on a shelf or in a drawer. But there are some big problems with this.
First, since external drives are removable, they’re easily misplaced. It’s not unusual for someone to take a drive offsite to work on a project and forget to return it. Second, removable drives can fail over time after being damaged by physical impacts, water, magnetic fields, or even “bit rot” from just sitting on a shelf. Finally, locating client files in a stack of drives can be like finding a needle in a haystack, especially if the editor who worked on the project has left the agency.
Tip #3—Organize your archive for self-service access
Oh, the frustration of knowing you already have a clip that would be perfect for a new project, but… who knows where it is? With the right tools in place, a producer’s frustration doesn’t mean you’ll have to drop everything and join their search party. Even if you’re not sure you need a full-featured MAM, your time would be well-spent to find a solution that allows creatives to search and retrieve files from the archive on their own.
Look for software that lets them browse through thumbnails and proxies instead of file names, and allows them to search based on metadata. Your archive storage shouldn’t force you to be on site and instantly available to load LTO tapes and retrieve those clips the editor absolutely and positively has to have today.
Tip #4—Schedule regular tests for backup restores and archive retrievals
When you first set up your backup system, I’m sure you checked that the backups were firing off on schedule, and tested restoring files and folders. But have you done it lately? Since the last time you checked, any number of things could have changed that would break your backups.
Maybe you added another file share that wasn’t included in the initial set up. Perhaps your backup storage has reached capacity. Maybe an operating system upgrade on a workstation is incompatible with your backup software. Perhaps the automated bill payment for a backup vendor failed. Bad things can happen when you’re not looking, so it’s smart to schedule time at least once a month to test your backups and restores. Ditto for testing your archives.
Tip #5 – Plan for long-term archive media refresh
If your agency has been in business more than a handful of years, you probably have content stored on media that’s past its expiration date. (Raise your hand if you still have client content stored on Betacam.) Drive failures increase significantly after 4 years (see our data center’s latest hard drive stats), and tape starts to degrade around 15 years. Even if the media is intact, file formats and other technologies can become obsolete quicker than you can say LTO-8. The only way to ensure access to archived content is to migrate it to newer media and/or technologies. This unglamorous task sounds simple—reading the data off the old media and copying it to new media—but the devil is in the details.
Of course, if you backup or archive to Backblaze B2 cloud storage, we’ll migrate your data to newer disk drives for you as needed over time. It all happens behind the scenes so you don’t ever need to think about it. And it’s included free with our service.
Want to see how all these tips works together? Join our live webinar co-hosted with Archiware on Tuesday, December 10, and we’ll show you how Baron & Baron, the agency behind the world’s top luxury brands from Armani to Zara, solved their backup and archive challenges.
As of September 30, 2019, Backblaze had 115,151 spinning hard drives spread across four data centers on two continents. Of that number, there were 2,098 boot drives and 113,053 data drives. We’ll look at the lifetime hard drive failure rates of the data drive models currently in operation in our data centers, but first we’ll cover the events that occurred in Q3 that potentially affected the drive stats for that period. As always, we’ll publish the data we use in these reports on our Hard Drive Test Data web page and we look forward to your comments.
Hard Drive Stats for Q3 2019
At this point in prior hard drive stats reports we would reveal the quarterly hard drive stats table. This time we are only going to present the Lifetime Hard Drive Failure table, which you can see if you jump to the end of this report. For the Q3 table, the data which we typically use to create that report may have been indirectly affected by one of our utility programs which performs data integrity checks. While we don’t believe the long-term data is impacted, we felt you should know. Below, we will dig into the particulars in an attempt to explain what happened in Q3 and what we think it all means.
What is a Drive Failure?
Over the years we have stated that a drive failure occurs when a drive stops spinning, won’t stay as a member of a RAID array, or demonstrates continuous degradation over time as informed by SMART stats and other system checks. For example, a drive that reports a rapidly increasing or egregious number of media read errors is a candidate for being replaced as a failed drive. These types of errors are usually seen in the SMART stats we record as non-zero values for SMART 197 and 198 which log the discovery and correctability of bad disk sectors, typically due to media errors. We monitor other SMART stats as well, but these two are the most relevant to this discussion.
What might not be obvious is that changes in some SMART attributes only occur when specific actions occur. Using SMART 197 and 198 as examples again, these values are only affected when a read or write operation occurs on a disk sector whose media is damaged or otherwise won’t allow the operation. In short, SMART stats 197 and 198 that have a value of zero today will not change unless a bad sector is encountered during normal disk operations. These two SMART stats don’t cause read and writes to occur, they only log aberrant behavior from those operations.
Protecting Stored Data
When a file, or group of files, arrives at a Backblaze data center, the file is divided into pieces we call shards. For more information on how shards are created and used in the Backblaze architecture, please refer to Backblaze Vault and Backblaze Erasure Coding blog posts. For simplicity’s sake, let’s say a shard is a blob of data that resides on a disk in our system.
As each shard is stored on a hard drive, we create and store a one-way hash of the contents. For reasons ranging from media damage to bit rot to gamma rays, we check the integrity of these shards regularly by recomputing the hash and comparing it to the stored value. To recompute the shard hash value, a utility known as a shard integrity check reads the data in the shard. If there is an inconsistency between the newly computed and the stored hash values, we rebuild the shard using the other shards as described in the Backblaze Vault blog post.
Shard Integrity Checks
The shard integrity check utility runs as a utility task on each Storage Pod. In late June, we decided to increase the rate of the shard integrity checks across the data farm to cause the checks to run as often as possible on a given drive while still maintaining the drive’s performance. We increased the frequency of the shard integrity checks to account for the growing number of larger-capacity drives that had been deployed recently.
The Consequences for Drive Stats
Once we write data to a disk, that section of disk remains untouched until the data is read by the user, the data is read by the shard integrity check process to recompute the hash, or the data is deleted and written over. As a consequence, there are no updates regarding that section of disk sent to SMART stats until one of those three actions occur. By speeding up the frequency of the shard integrity checks on a disk, the disk is read more often. Errors discovered during the read operation of the shard integrity check utility are captured by the appropriate SMART attributes. Putting together the pieces, a problem that would have been discovered in the future—under our previous shard integrity check cadence—would now be captured by the SMART stats when the process reads that section of disk today.
By increasing the shard integrity check rate, we potentially moved failures that were going to be found in the future into Q3. While discovering potential problems earlier is a good thing, it is possible that the hard drive failures recorded in Q3 could then be artificially high as future failures were dragged forward into the quarter. Given that our Annualized Failure Rate calculation is based on Drive Days and Drive Failures, potentially moving up some number of failures into Q3 could cause an artificial spike in the Q3 Annualized Failure Rates. This is what we will be monitoring over the coming quarters.
There are a couple of things to note as we consider the effect of the accelerated shard integrity checks on the Q3 data for Drive Stats:
The number of drive failures over the lifetime of a given drive model should not increase. At best we just moved the failures around a bit.
It is possible that the shard integrity checks did nothing to increase the number of drive failures that occurred in Q3. The quarterly failure rates didn’t vary wildly from previous quarters, but we didn’t feel comfortable publishing them at this time given the discussion above.
Lifetime Hard Drive Stats through Q3 2019
Below are the lifetime failure rates for all of our drive models in service as of September 30, 2019.
The lifetime failure rate for the drive models in production rose slightly, from 1.70% at the end of Q2 to 1.73% at the end of Q3. This trivial increase would seem to indicate that the effect of the potential Q3 data issue noted above is minimal and well within a normal variation. However, we’re not satisfied that is true yet and we have a plan for making sure as we’ll see in the next section.
What’s Next for Drive Stats?
We will continue to publish our Hard Drive Stats each quarter, and next quarter we expect to include the quarterly (Q4) chart as well. For the foreseeable future, we will have a little extra work to do internally as we will be tracking two different groups of drives. One group will be the drives that “went through the wormhole,” so to speak, as they were present during the accelerated shard integrity checks. The other group will be those drives that were placed into production after the shard integrity check setting was reduced. We’ll compare these two datasets to see if there was indeed any effect of the increased shard integrity checks on the Q3 hard drive failure rates. We’ll let you know what we find in subsequent drive stats reports.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know what you find.
As always, we look forward to your thoughts and questions in the comments.
November 8th marks the celebration of National STEM day, calling attention to the importance of Science, Technology, Engineering, and Math (STEM) in young peoples’ education. Without these programs encouraging young minds to enter STEM fields, we would be hard-pressed to find the Backblaze staffers of the future. As such, a day like this is something we have to celebrate here in our office.
Many of our teammates missed out on the educational initiatives around tech that exist today, but they did not let age, gender, or socioeconomic status stop them from reaching the top of their respective fields. So in honor of the day, we decided to share some of their STEM-related stories with an eye toward inspiring you. Whether they rouse you to dig into your own mid-career shift, or to encourage your kids to consider STEM, or to send an application our way, we hope these stories add to your understanding of how helpful STEM can be in any life.
The STEMs of Backblaze
From Crash Bandicoot to Front End Developer
Steven, a front end developer at Backblaze, started out as a bellman at the Greenwich Hotel in New York City, but his love for digital media—beginning with games like Naughty Dog’s work of art, “Crash Bandicoot”—encouraged him to sign up for a boot camp for web development called Bloc. After 6 months, Steven launched into the work force with one mission: To make the world a better place, one product at a time.
“That love for video games became a love for technology and software,” said Steven, reflecting on his early years in tech. “As I grew older, played more games, discovered the joys of dial-up internet and witnessed cable TV move into the high definition era, my interest for what powered all of these sources of entertainment peaked.”
Taking the Leap to Director of Engineering
Not everyone is as confident as Steven, though. Would you just walk into a coding bootcamp with no experience? Intimidation can be a big reason why someone shies away from working in a STEM related field. Our Vice President of Engineering, Tina, believes that you should not shy away from intimidation. She believes that embracing your jump into the unknown will help you find the fun in your field of work.
“Never think that you are not good enough or smart enough to learn anything,” said Tina, when asked what advice she would give to young engineers. “I feel like there is a misconception that engineering is boring or for geeks or whatever and that’s not true at all. And also, just because it is a male dominated industry doesn’t mean other women shouldn’t just go for it. It’s actually really fun once you start doing it. Essentially, just don’t let the intimidation get to you. Anyone can learn it.”
And she speaks from experience: Tina was the only woman in her Electronic and Computer Engineering classes at California State Polytechnic University-Pomona, but now she’s running Backblaze’s engineering efforts. Without her, our engineering team would not run as smoothly and efficiently or have projects like extending your version history release on time!
From Microbiology to Backblaze Sales Operations and Enablement Manager
Tina makes a good point that most people tend to miss: the difference between the impossible and something that just makes you uncomfortable. Sona, our Sales Operations and Enablement Manager, spent her undergrad years preparing to attend med school. But when it came around to applying for post-grad programs or applying for jobs, she realized that her biological sciences degree qualified her for a range of different positions beyond the medical field.
“Do something that makes you uncomfortable,” said Sona, reflecting on what she would tell her younger self when starting out. “Because if you fail you fail, but if you don’t fail you might be opening up a door to something that you didn’t even know you wanted to do.” Embracing her discomfort allowed Sona to explore other careers, like working as a microbiologist at Shasta Beverages, Inc. and, now, being our go-to person for implementing software to supercharge our sales efforts here at Backblaze.
Experimenting with Technical Operations
This is true for our Director of Technical Operations, Chris, who started as a system administrator after stepping away from the college path because he felt more comfortable with a hands-on learning experience. Chris had always tinkered with computers and software during his summer breaks from school so he felt confident that this field was the right place for him.
“Try things. Experiment. The sooner you get plugged into a social network of some kind in your field, like a group or a club or an open-source project, the sooner the world will open up for you,” said Chris. “Because the people you meet will come from all different walks of life you will see all the different things they do and some of them might be interesting and different than what you initially thought of doing.”
Whether it’s toying with computer motherboards like Chris did, stargazing, app building, or even setting up your shot in basketball, chances are that your hobbies have roots in STEM. The average person indulges in these related activities without even knowing it.
Art, Dance, and Programming Sequences
Amanda is our Senior Accountant and her father, Brian, is our Distinguished Engineer. When she was younger her hobbies were art, dance, and… learning about programming sequences.
“I feel like [doing math] came very naturally,” said Amanda. “I grew up in a very math-friendly household. My dad started teaching me algebra really early. They were excited about it so they both did software engineering and programming. They would, ‘just for fun,’ teach me programming problems and I would be asking them about matrices and things like that.”
Of course, we don’t all have brilliant engineers for parents, but that doesn’t mean that Amanda’s experience isn’t useful. STEM programming is all about giving young people exposure to science, technology, engineering, and math in ways they can relate to. With Amanda, that was programming problems. But school systems all over have adapted STEM programs to their curriculum to build enthusiasm for kids who like to tinker with robots or treat math problems like a competition. You can even bring it home for your own family. The trick is to find fun, science-related activities that help kids continue to expand their excitement for these fields.
It might seem hokey, but it can be fun. For instance, at Backblaze, we still do STEM style experiments, like the day we used bubbles and a fog machine to test the air flow in the office (as seen in the video below).
Do you have an interesting story of how you came to be in the STEM field of work? Or resources for how you passed your enthusiasm on? Share them in the comments below!
Editor’s Note: At Backblaze, the entrepreneurial spirit is in our DNA: Our founders started out in a cramped Palo Alto apartment, worked two years without pay, and bootstrapped their way to the stable, thriving startup we are today. We’ve even written a series about some of the lessons we learned along the way.
But Backblaze is a sample size of one, so we periodically reach out to other experts to expand our startup test cases and offer more insights for you. In that regard, we’re happy to invite Lars Lofgren to the blog.
As the CEO of Quick Sprout—a business focused on helping founders launch and optimize their businesses—Lars is a wellspring of case studies on how businesses both do and do not succeed. We asked Lars for advice on one subject near and dear to our hearts: Business Backup. He’s boiled down his learnings into a quick guide for backing up your business today. We hope you find Lars’ guidance and stories useful—if you have any tips or experience with business backup please share them in the comments below.
How to Make Your Business Unbreakable
by Lars Lofgren, CEO, Quick Sprout
Launching a new business is thrilling. As someone who has been in on the ground floor of several startups, there’s nothing else like it: You’re eager to get started and growth seemingly can’t come soon enough.
But in the midst of all of this excitement, there’s a to-do list that’s one-thousand tasks deep. You’ve already gone through the tedious process of registering your business, applying for an EIN, opening new bank accounts, and launching your website. So many entrepreneurs want to dive right into generating revenue, but there’s still a ton to do.
Backing anything up is usually the last thing anyone wants to think about. But backups need to be a priority for every new business owner, because losing precious data or records could be beyond detrimental to your success. A computer accident, office fire, flood, ransomware attack, or some other unforeseen calamity could set you back months, years, or in many cases, end your business entirely. Earlier this week, I watched a fire in my neighborhood completely engulf a building. Three businesses went up in smoke and the fire department declared a total loss for all three. I hope they had backups.
Spending a little time on backups in the early stages of your launch won’t just save your company from disaster, it will make your business unbreakable.
Even if your company has been in business for a while, there’s still time for you to implement a data backup plan before it’s too late. And knowing what to back up will save you time, money, and countless headaches. Here are six simple steps to guide you:
Backing Up Hard Drives
Hard drives are like lightbulbs. It’s not a matter of if they will go out, it’s a matter of when.
As more time passes, there becomes a greater chance that your hard drives will fail. For those of you who are interested in learning more about hard drive failures, Andy Klein, Backblaze’s Director of Compliance, recently published the most recent hard drive statistics here.
Take a moment to think about all of the crucial information that’s been compiled on your hard drive over the last few months. Now, imagine that information getting wiped clean. One morning you wake up and it’s just gone without a trace. In the blink of an eye, you’re starting from nothing. It’s a scary thought and I’ve seen it happen to too many people. Losing files at the wrong moment could cause you to miss out on a critical deal or delay major projects indefinitely. Timing is everything.
So when it comes to your hard drives, you need to set up some type of daily backups as soon as possible. Whatever backup tool you decide to go with, just make sure you’re fully covered and prepared for the worst. The goal is to be able to fully recover a hard drive at a moment’s notice.
Once you’ve covered that first step, consider adding a cloud backup solution. Cloud storage is much more reliable than a series of physical backup drives.
Backing Up Email
I would be lost without email.
For me, this might actually be the most important part of my business to back up. My email includes all of my contacts, my entire work history, and the logins for all of my accounts. Everything I do on a day-to-day basis stems from my email. You might not rely on it as heavily as I do, but I’m sure that email still plays a crucial role in your business.
Today, most of us are already using cloud services, like G Suite, so we rarely think about backing up our email. And it’s true that your email won’t be lost if your computer gets damaged or your hard drive fails. But if you lost access to your login or your email account was corrupted, it would be devastating.
And it does happen. I’ve come across a few folks who were locked out of their email accounts by Google with no explanation. I’m sure there are bad actors out there abusing Google’s tools, but it’s also very possible for accounts to be accidentally shut down, too.
Even normal business operations result in lost email and documents. If your business has employees, put this at the top of your priority list. Any turnover usually results in losing that employee’s email history. For the most part, their emails will be deleted when the user is removed from your system, but there’s a good chance that you’re going to need access to those emails. Just because that employee is gone, it doesn’t mean that their responsibilities disappear.
While it’s possible to export your G Suite data, you’d then be on the hook for doing this regularly and storing your exports securely. In my opinion, this requires too much manual work and leaves room for error.
I’d recommend going through the G Suite Marketplace to find an app that can handle all of your backups automatically in the cloud. (Editor’s note: For the easiest, most reliable solution, we recommend Google Vault.) Once you set this up, you’ll never have to worry about your G Suite data again. Even if it somehow gets corrupted, you’ll always be able to restore it quickly.
What about Office 365 and Outlook? It’s easy to backup Outlook manually by exporting your entire inbox. There are also ways to back up your company’s email with Exchange Online. The best method will depend on your exact implementation of Outlook at your company.
For those of you managing email on your own network who don’t plan to move to a cloud-based email service, just ensure your existing backups cover your email or find a way to ensure they do as soon as possible.
Backing Up Your Website
If your website goes down, or, even worse, you become a victim of malware, you’ll lose the lifeblood of your business: new customers.
People hack websites all the time in order to spread viruses and malware. Small businesses and startups are an easy target for cybercriminals because their sites often aren’t protected as well as those of larger companies. If something horrible like this happens, you’ll need to reset your entire site to protect your business, customers, and website visitors.
This process is a whole lot easier when you have website backups. So, when you create your website, make daily backups a priority from the outset. Start with your web host. Contact them to see what kind of backups they offer. Their answer could ultimately sway you to use one web host over another. For those of you who are using WordPress, there are lots of different plugins that offer regular backups. I’ve reviewed the best options and covered this topic more extensively here.
Generally speaking, website backups will not be free. But paying for a high-quality backup solution is well worth the cost, and far less expensive than the price of recovering from a total loss without backups.
This will also protect you and your employees from the fallout of launching a bug that accidentally brings the whole website down. Unfortunately, this happens more often than any of us would like to admit. Backups make this an embarrassing error, rather than a fatal one.
Backing Up Paperwork
Being 100% paper-free isn’t always an option. Even though the vast majority of documentation has transitioned to digital, there are still some forms that stubbornly remain in paper. No matter how hard I try, I still get stuck with paper documents for physical receipts, some tax filings, and some government paperwork.
When you launch your business, you will generate a batch of paper records that will slowly grow over time. Keeping these papers neatly organized in a filing cabinet is important, but this only helps with storage. Paper documents are still vulnerable to theft, flooding, fire, and other physical damage. So why not just digitize them quickly and be done with it? Not only will this free up extra space around the office, but it will also give you peace of mind about losing your files in a catastrophe.
The easiest way to back up your paperwork is to get a scanner, scan your documents, and then upload them to the cloud with the rest of your files. You can forget about them until they’re necessary.
It’s in your best interest to do this with your existing paper files immediately. Then make it part of your process whenever you get physical paperwork. If you wait too long, not only are you susceptible to losing important files, but the task will only grow more tedious and time-consuming.
Backing Up Processes
Not many companies think about it, but not backing up processes has easily caused me the most grief out of any other item in this post. In a perfect world, all of your staff will give you plenty of notice before they leave. This will give you time to fill the position and have that employee train the next person in their remaining weeks. But you and I both know that the world isn’t perfect.
Things happen. Employees leave on the spot or do something egregious that results in an immediate firing. Not everyone leaving your business will end on good terms, so you can’t bank on them being helpful during a transitional period. And when people leave your company, their knowledge is lost forever.
If those processes aren’t written down, training someone else can be extremely difficult, and nearly impossible if a top-tier employee leaves. The only way to prevent this is by turning all processes into standard operating procedures, better known as SOPs. Then you just need to store these SOPs somewhere that is also backed up, whether that is your hard drive (as mentioned above) or in a project management tool like Confluence, Notion, or even a folder in your Google Drive. As long as you have your SOPs saved on some sort of cloud backup solution, they’ll always be there when you need to access them.
Backing Up Software Databases
If you run a software business or use software for any internal tools, you need to get backups set up for all of your databases. Similar to your hard drives, sooner or later one of them will go down.
When I was at KISSmetrics, we had an engineer shut down our core database for the entire product by accident. When someone makes a mistake like that they don’t always act rationally. Instead of notifying management immediately, this engineer walked away and went to bed. The database was down overnight until the following morning. While we had some backups, we still lost about twelve hours worth of customer data. Without those backups, it would have been even worse.
The more critical the database, the more robust the backup solution needs to be. As I said before, you need to plan for the worst. Sometimes a daily backup might not be good enough if the database is super critical. If you can’t afford to lose 24 hours worth of information, then you’ll need a solution that backs up at the frequency your business requires.
Work with your engineering team to make sure all core functionality is completely redundant. Customers can tolerate their login page being down for a short period, but they won’t tolerate permanent data loss.
Final Thoughts on Business Backup
I know, your list of things to do when you start a new business just got longer! But backing up your data, files, and other important information is crucial for every business across all industries. You can’t operate under the assumption that you’re immune from these pitfalls. Sooner or later, they happen to all of us. Whether it be physical damage to a hard drive, theft to a computer, human error, or a malicious attack against your website, you must limit your exposure.
But good news: Once your backups are in place, your business will be unbreakable.
Editor’s Note: If you’ve read this far, you’re likely very serious about backing up your business—or maybe you’re just passionate about the process? Either way, Lars has outlined a lot of the “whys” and plenty of good “hows” for you here, but we’d love to help you tick a few things off of your list. Here are a few notes for how you can implement Lars’ advice using Backblaze:
Backing up your…
This is an easy one: backup is the core of what we do, and backing up your computers, and your hard drives are the easiest first step you’ll take. And now, if you opt for Forever Version History, you only need to hook up your older drives once.
…Email… and Paperwork, Process, and Database:
If your email is already with a cloud service, you’ve got one backup, but if you are using Outlook, Apple Mail, or other applications storing email locally on your computer, Backblaze will automatically back those up.
As Lars mentioned, a lot of hosting services offer backup options. But especially if you’re looking for WordPress backups, we have you covered.
Another option to consider is using Cloudflare or other caching services to prevent “soft downtime.” If you’ve engaged with Backblaze, we have a partnership with Cloudflare to make this solution easier.
Now that you’re all backed up and have some extra time and peace of mind, we’d love to hear more about your business: How does your infrastructure help you succeed?
Editor’s Note: Since 2013, Backblaze has published statistics and insights based on the hard drives in our data centers. Why? Well, we like to be helpful, and we thought sharing would help others who rely on hard drives but don’t have reliable data on performance to make informed purchasing decisions. We also hoped the data might aid manufacturers in improving their products. Given the millions of people who’ve read our Hard Drive Stats posts and the increasingly collaborative relationships we have with manufacturers, it seems we might have been right.
But we don’t only share our take on the numbers, we also provide the raw data underlying our reports so that anyone who wants to can reproduce them or draw their own conclusions, and many have. We love it when people reframe our reports, question our logic (maybe even our sanity?), and provide their own take on what we should do next. That’s why we’re featuring Ryan Smith today.
Ryan has held a lot of different roles in tech, but lately he’s been dwelling in the world of storage as a product strategist for Hitachi. On a personal level, he explains that he has, “passion for data, finding insights from data, and helping others see how easy and rewarding it can be to look under the covers.” It shows.
A few months ago we happened on a post by Ryan with an appealing header featuring our logo with an EXPOSED stamp superimposed in red over our humble name. It looked like we had been caught in a sting operation. As a company that loves transparency, we were delighted. Reading on we found a lot to love and plenty to argue over, but more than anything, we appreciated how Ryan took data we use to analyze hard drive failure rates and extrapolated out all sorts of other gleanings about our business. As he puts it, “it’s not the value at the surface but the story that can be told by tying data together.” So, we thought we’d share his original post with you to (hopefully) incite some more arguments and some more tying together of data.
While we think his conclusions are reasonable based on the data available to him, the views and analysis below are entirely Ryan’s. We appreciate how he flagged some areas of uncertainty, but thought it most interesting to share his thoughts without rebuttal. If you’re curious about how he reached them, you can find his notes on process here. He doesn’t have the full story, but we think he did amazing work with the public data.
Our 2019 Q3 Hard Drive Stats post will be out in a few weeks, and we hope some of you will take Ryan’s lead and do your own deep dive into the reporting when it’s public. For those of you who can’t wait, we’re hoping this will tide you over for a little while.
If you’re interested in taking a look at the data yourselves, here’s our Hard Drive Data and Stats webpage that has links to all our past Hard Drive Stats posts and zip files of the raw data.
Ryan Smith Uses Backblaze’s SMART Stats to Illustrate the Power of Data
It is now common practice for end-customers to share telemetry (call home) data with their vendors. My analysis below shares some insights about your business that vendors might gain from seemingly innocent data that you are sending them every day.
On a daily basis, Backblaze (a cloud backup and storage provider) logs all its drive health data (aka SMART data) for over 100,000 of its hard drives. With 100K+ records a day, each year can produce over 30 million records. They share this raw data on their website, but most people probably don’t really dig into it much. I decided to see what this data could tell me and what I found was fascinating.
Rather than looking at nearly 100 million records, I decided to only look at just over one million which consisted of the last day of every quarter from Q1’16 to Q1’19. This would give me enough granularity to see what is happening inside Backblaze’s cloud backup storage business. For those interested, I used MySQL to import and transform the data into something easy to work with (see more details on my SQL query); I then imported the data into Excel where I could easily pivot the data and look for insights. Below are the results of this effort.
User Data vs Physical Capacity
I grabbed the publicly posted “Petabytes stored” that BackBlaze claims on their website (“User Petabytes”) and compared that to the total capacity from the SMART data they log (“Physical Petabytes”) and then compared them against each other to see how much overhead or unused capacity they have. The Theoretical Max (green line) is based on their ECC protection scheme (13+2 and/or 17+3) that they use to protect user data. If the “% User Petabytes” is below that max then this means Backblaze either has unused capacity or they didn’t update their website with the actual data stored.
Data Read/Written vs Capacity Growth
Looking at the last two years, by quarter, you can see a healthy amount of year-over-year growth in their write workload; roughly 80% over the last four quarters! This is good since writes likely correlate with new user data, which means broader adoption of their offering. For some reason their read workloads spiked in Q2’17 and have maintained a higher read workload since then (as indicated by the YoY spikes from Q2’17 to Q1’18, and then settling back to less than 50% YoY since); my guess is this was likely driven by a change to their internal workload rather than a migration because I didn’t see subsequent negative YoY reads.
Now let’s look at some performance insights. A quick note: Only Seagate hard drives track the needed information in their SMART data in order to get insights about performance. Fortunately, roughly 80% of Backblaze’s drive population (both capacity and units) are Seagate so it’s a large enough population to represent the overall drive population. Going forward, it does look like the new 12 TB WD HGST drive is starting to track bytes read/written.
Pod (Storage Enclosure) Performance
Looking at Power-on-hours of each drive, I was able to calculate the vintage of each drive and the number of drives in each “pod” (this is the terminology that Backblaze gives to its storage enclosures). This lets me calculate the number of pods that Backblaze has in its data centers. Their original pods stored 45 drives and this improved to 60 drives in ~Q2’16 (according to past blog posts by Backblaze). The power-on-date allowed me to place the drive into the appropriate enclosure type and provide you with pod statistics like the Mbps per pod. This is definitely an educated guess as some newer vintage drives are replacement drives into older enclosures but the overall percentage of drives that fail is low enough to where these figures should be pretty accurate.
Overall, Backblaze’s data centers are handling over 100 GB/s of throughput across all their pods which is quite an impressive figure. This number keeps climbing and is a result of new pods as well as overall higher performance per pod. From quick research, this is across three different data centers (Sacramento x 2, Phoenix x 1) and maybe a fourth on its way in Europe.
Hard Drive Performance
Since each pod holds between 45 and 60 drives, with an overall max pod performance of 1 Gbps, I wasn’t surprised to see such average low drive performance. You can see that Backblaze’s workload is read heavy with less than 1 MB/s and writes only a third of that. Just to put that in perspective, these drives can deliver over 100 MB/s, so Backblaze is not pushing the limits of these hard drives.
As discussed earlier, you can also see how the read workload changed significantly in Q2’17 and has not reverted back since.
As I expected, the read and write performance is highly correlated to the drive capacity point. So, it appears that most of the growth in read/write performance per drive is really driven by the adoption of higher density drives. This is very typical of public storage-as-a-service (STaaS) offerings where it’s really about $/GB, IOPS/GB, MBs/GB, etc.
As a side note, the black dashed lines (average between all densities) should correlate with the previous chart showing overall read/write performance per drive.
Switching gears, let’s look at Backblaze’s purchasing history. This will help suppliers look at trends within Backblaze to predict future purchasing activities. I used power-on-hours to calculate when a drive entered the drive population.
Hard Drives Purchased by Density, by Year
This chart helps you see how Backblaze normalized on 4 TB, 8 TB, and now 12 TB densities. The number of drives that Backblaze purchases every year has been climbing until 2018 where it saw its first decline in units. However, this is mainly due to the efficiencies of the capacity per drive.
A question to ponder: Did 2018 reach a point where capacity growth per HDD surpassed the actual demand required to maintain unit growth of HDDs? Or is this trend limited to Backblaze?
Petabytes Purchased by Quarter
This looks at the number of drives purchased over the last five years, along with the amount of capacity added. It’s not quite regular enough to spot a trend, but you can quickly spot that the amount of capacity purchased over the last two years has grown dramatically compared to previous years.
HDD Vendor Market Share
Western Digital/WDC, Toshiba/TOSYY, Seagate/STX
Seagate is definitely the preferred vendor, capturing almost 100% of the market share save for a few quarters where WD HGST wins 50% of the business. This information could be used by Seagate or its competitors to understand where it stands within the account for future bids. However, the industry is monopolistic so it’s not hard to guess who won the business if a given HDD vendor didn’t.
Drive Population by Quarter
This shows the total drive population over the past three years. Even though the number of drives being purchased has been falling lately, the overall drive population is still growing.
You can quickly see that 4 TB drives saw its peak population in Q1’17 and has rapidly declined. In fact, let’s look at the same data but with a different type of chart.
That’s better. We can see that 12 TBs really had a dramatic effect on both 4 TB and 8 TB adoption. In fact, Backblaze has been proactively retiring 4 TB drives. This is likely due to the desire to slow the growth of their data center footprint which comes with costs (more on this later).
As a drive vendor, I could use this data to use the 4 TB trend to calculate how much drive replacement will be occurring next quarter, along with natural PB growth. I will look more into Backblaze’s drive/pod retirement later.
Current Drive Population, by Deployed Date
Be careful when interpreting this graph. What we are looking at here is the Q1’19 drive population where the date on the x-axis is the date the drive entered the population. This helps you see of all the drives in Backblaze’s population today, in which the oldest drives are from 2015 (with the exception of a few stragglers).
This indicates that the useful life of drives within Backblaze’s data centers are ~4 years. In fact, a later chart will look at how drives/pods are phased out, by year.
Along the top of the chart, I noted when the 60-drive pods started entering into the mix. The rack density is much more efficient with this design (rather than the 45-drive pod). Combine this, along with the 4 TB to 12 TB efficiency, Backblaze has aggressively been retiring its 4 TB/45-drive enclosures. There is still a large population of these remaining so expect some further migration to occur.
Boot Drive Population
This is the overall boot drive population over time. You can see that it is currently dominated by the 500 GB with only a few remaining smaller densities in the population today. For some reason, Toshiba has been the preferred vendor with Seagate only recently gaining some new business.
The boot drive population is also an interesting data point to use for verifying the number of pods in the population. For example, there were 1,909 boot drives in Q1’19 and my calculation of pods based on the 45/60-drive pod mix was 1,905. I was able to use the total boot drives each quarter to double check my mix of pods.
Pods (Drive Enclosures)
As discussed earlier, pods are the drive enclosures that house all of Backblaze’s hard drives. Let’s take a look at a few more trends that show what’s going on within the walls of their data center.
Pods Population by Deployment Date
This one is interesting. Each line in the graph indicates a particular snapshot in time of the total population. And the x-axis represents the vintage of the pods for that snapshot. By comparing snapshots, this allows you to see changes over time to the population. Namely, new pods being deployed and old pods being retired. To capture this, I looked at the last day of Q1 data for the last four years and calculated the date the drives entered the population. Using the “Power On Date” I was able to deduce the type of pod (45 or 60 drive) it was deployed in.
Some insights from this chart:
From Q2’16 to Q1’17, they retired some pods from 2010-11
From Q2’17 to Q1’18, they retired a significant number of pods from 2011-14
From Q2’18 to Q1’19, they retired pods from 2013-2015
Pods that were deployed since late 2015 have been untouched (you can tell this by seeing the lines overlap with each other)
The most pods deployed in a quarter was 185 in Q2’16
Since Q2’16, the number of pods deployed has been declining, on average; this is due to the increase in # of drives per pod and density of each drive
There are still a significant number of 45-drive pods to retire
Totaling up all the new pods being deployed and retired, it is easier to see the yearly changes happening within Backblaze’s operation. Keep in mind that these are all calculations and may erroneously include drive replacements as new pods; but I don’t expect it to vary significantly from what is shown here.
The data shows that any new pods that have been deployed in the past few years have mainly been driven by replacing older, less dense pods. In fact, the pod population has plateaued at around 1,900 pods.
Based on blog posts, Backblaze’s pods are all designed at 4U (4 rack units) and pictures on their site indicate 10 pods fit in a rack; this equates to 40U racks. Using this information, along with the drive population and the power-on-date, I was able to calculate the number of pods on any given date as well as the total number of racks. I did not include their networking racks in which I believe they have two of these racks per row in their data center.
You can quickly see that Backblaze has done a great job at slowing the growth of the racks in their data center. This all results in lower costs for their customers.
What interested me when looking at Backblaze’s SMART data was the fact that drives were being retired more than they were failing. This means the cost of failures is fairly insignificant in the scheme of things. It is actually efficiencies driven by technology improvements such as drive and enclosure densities that drove most of the costs. However, the benefits must outweigh the costs. Being that Backblaze uses Sungard AS for its data centers, let’s try to visualize the benefit of retiring drives/pods.
Colocation Costs, Assuming a Given Density
This shows the total capacity over time in Backblaze’s data centers, along with the colocation costs assuming all the drives were a given density. As you can see, in Q1’19 it would take $7.7M a year to pay for colocating costs of 861 PB if all the drives were 4 TB in size. By moving the entire population to 12 TB this can be reduced to $2.6M. So, just changing the drive density can have significant impacts on Backblaze’s operational costs. I did assume $45/RU costs in the analysis which their costs may be as low as $15/RU based on the scale of their operation.
I threw in 32 TB densities to illustrate a hypothetical SSD-type density so you can see the colocation cost savings by moving to SSDs. Although lower, the acquisition costs are far too high at the moment to justify a move to SSDs.
Break-Even Analysis of Retiring Pods
This chart helps illustrate the math behind deciding to retire older drives/pods based on the break-even point.
Let’s break down how to read this chart:
This chart is looking at whether Backblaze should replace older drives with the newer 12 TB drives
Assuming a cost of $0.02/GB for a 12 TB drive, that is a $20/TB acquisition cost you see on the far left
Each line represents the cumulative cost over time (acquisition + operational costs)
The grey lines (4 TB and 8 TB) all assume they were already acquired so they only represent operational costs ($0 acquisition cost) since we are deciding on replacement costs
The operational costs (incremental yearly increase shown) is calculated off of the $45 per RU colocation cost and how many of this drive/enclosure density fits per rack unit. The more TBs you can cram into a rack unit, the lower your colocation costs are
Assuming you are still with me, this shows that the break-even point for retiring 4 TB 4U45 pods is just over two years! And 4 TB 4U60 pods at 3 years! It’s a no brainer to kill the 4 TB enclosures and replace them with 12 TB drives. Remember that this assumes a $45RU colocation cost so the break-even point will shift to the right if the colocation costs are lower (which they surely are). You can see that the math to replace 8 TB drives with 12 TB doesn’t make as much sense so we may see Backblaze’s retirement strategy slow down dramatically after it retires the 4 TB capacity points.
As hard drive densities get larger and $/GB decreases, I expect the cumulative costs to start lower (less acquisition cost) and rise slower (less RU operational costs) making future drive retirements more attractive. Eyeballing it, it would be once $/GB approaches $0.01/GB to $0.015/GB.
Things Backblaze Should Look Into
Top of mind, Backblaze should look into these areas:
The architecture around performance is not balanced; investigate having a caching tier to handle bursts and put more drives behind each storage node to reduce “enclosure/slot tax” costs.
Look into designs like 5U84 from Seagate/Xyratex providing 16.8 drives per RU versus the 15 being achieved on Backblaze’s own 4U60 design; Another 12% efficiency!
5U allows for 8 pods to fit per rack versus the 10.
Look at when SSDs will be attractive to replace HDDs at a given $/GB, density, idle costs, # of drives that fit per RU (using 2.5” drives instead of 3.5”) so that they can stay on top of this trend [there is no rush on this one].
Performance and endurance of SSDs is irrelevant since the performance requirements are so low and the WPD is almost non-existence, making QLC and beyond a great candidate.
Look at allowing pods to be more flexible in handling different capacity drives to handle drive failures more cost efficiently without having to retire pods. Having concepts of “virtual pods” that don’t have physical limits will better accommodate the future that Backblaze has where it won’t be retiring pods as aggressively, yet still let them grow their pod densities seamlessly.
It is kind of ironic that the reason Backblaze posted all their SMART data is to share insights around failures when I didn’t even analyze failures once! There is much more analysis that could be done around this data set which I may revisit as time permits.
As you can see, even simple health data from drives, along with a little help from other data sources, can help expose a lot more than you would initially think. I have long felt that people have yet to understand the full power of giving data freely to businesses (e.g. Facebook, Google Maps, LinkedIn, Mint, Personal Capital, News Feeds, Amazon). I often hear things like, “I have nothing to hide,” which indicates the lack of value they assign to their data. It’s not the value at its surface but the story that can be told by tying data together.
Until next time, Ryan Smith.
• • •
Ryan Smith is currently a product strategist at Hitachi Vantara. Previously, he served as the director of NAND product marketing at Samsung Semiconductor, Inc. He is extremely passionate about uncovering insights from just about any data set. He just likes to have fun by making a notable difference, influencing others, and working with smart people.
Announcing Backblaze Cloud Backup 7.0: The Version History and Beyond Release!
This release for consumers and businesses adds one of our most requested enhancements for our Backblaze Cloud Backup service: the ability to keep updated, changed, and even deleted files in your backups forever by extending version history. In addition, we’ve made our Windows and Mac apps even better, updated our Single Sign-on (SSO) support, added more account security options, became Catalina-ready, and increased the functionality of our iOS and Android mobile apps. These changes are awesome and we’re sure you’ll love them!
Extended Version History
Have you ever deleted a file by mistake or accidentally saved over an important bit of work? Backblaze has always kept a 30-day version history of your backed up files to help in situations like these, but today we’re giving you the option to extend your version history to one year or forever. This new functionality is available on the Overview page for Computer Backup, and the Groups Management page if you are using Backblaze Groups! Backblaze v7.0 is required to use Version History. Learn more about versions and extending Version History.
30-Day Version History
All Backblaze computer backup accounts have 30-Day Version History included with their backup license. That means you can go back in time for 30 days and retrieve old versions of your files or even files that you’ve deleted.
1-Year Version History
Extending your Version History from 30 days to one year means that all versions of your files that are backed up — whether you’ve updated, changed, or fully deleted them from your computer — will remain in your Backblaze backup for one year after being modified or deleted from your device. Extending your Version History to one year is an additional $2 per month and is charged based on your license type (monthly, yearly, or 2-year). As always, any charges will be prorated to match up with your license renewal date.
Forever Version History
Extending your Version History from 30 days or one year to forever means that Backblaze will never remove files from your Backblaze backup whether you’ve updated, changed, or fully deleted them from your computer, or not. Extending Version History to forever is similar to one year, at an additional $2 per month (prorated to your license plan type) plus $0.005/GB/month for versions modified on your computer more than one year ago.
This is a great new feature for people who want increased peace of mind. To learn more about Version History, pricing, and examples of how to restore, please visit the Version History FAQ.
MacOS and Windows Application Updates
More Efficient Performance For Uploads
We’ve changed the way that Backblaze transmits large files on your machine by reworking how we group and break apart files for upload. The maximum packet size has increased from 30 MB to 100 MB. This allows the app to transmit data more efficiently by better leveraging threading, which also smoothes out upload performance, reduces sensitivity to latency, and leads to smaller data structures.
Single Sign-On Updates for Backblaze Groups
We added support for Microsoft’s Office 365 in Backblaze Groups, and have made SSO updates to the Inherit Backup State feature so that it supports SSO-enabled accounts. This means that you can now sign into Backblaze using your Office 365 credentials, similar to using Google’s SSO.
Higher Resolution For Easier Viewing of Information
We updated the way our installers and applications looked on higher-resolution displays, making for a more delightful viewer experience!
An OpenSSL issue was causing problems on Intel’s Apollo Lake chipset, but we’ve developed a workaround. Apollo Lake is a lower-end chipset, so not many customers were seeing issues, but now computers using Apollo Lake will work as intended.
We’ve added support for MacOS Catalina and improved some MacOS system messages. MacOS provides some great new features for the Mac and we’ve changed some of our apps’ behavior to better fit Catalina. In Catalina, Apple is now requiring apps to ask for permission more frequently, and since Backblaze is a backup application, we require a lot of permissions. Thus you may notice more system messages when installing Backblaze on the new OS.
Of Note: Backblaze Restores
In order to implement the Version History features, we had to change the way our restore page handled dates. This may not seem like a big deal, but we had a date drop-down menu where you could select the time frames you wanted to restore from. Well, if you have 1-Year or Forever Version History, you can’t have an infinitely scrolling drop-down menu, so we implemented a datepicker to help with selection. You can now more easily choose the dates and times that you’d like to restore your files from.
Backblaze 7.0 Available: October 8th, 2019
We will be slowly auto updating all users in the coming weeks. To update now:
Perform a Check for Updates (right-click on the Backblaze icon)
Want to Learn More? Join Us on October 15th, 2019 at 11 a.m. PT
Want to learn more? Join Yev on a webinar where he’ll go over version 7.0 features and answer viewer questions. The webinar will be available on BrightTalk (registration is required) and you can sign up by visiting the Backblaze BrightTALK channel.
The only problem: both hosted storage (through existing cloud services) and purchased hardware (buying servers from Dell or Microsoft) were too expensive to hit this price point. Enter Tim Nufire, aka: The Podfather.
Tim led the effort to build what we at Backblaze call the Storage Pod: The physical hardware our company has relied on for data storage for more than a decade. On the occasion of the decade anniversary of the open sourcing of our Storage Pod 1.0 design, we sat down with Tim to relive the twists and turns that led from a crew of backup enthusiasts in an apartment in Palo Alto to a company with four data centers spread across the world holding 2100 storage pods and closing in on an exabyte of storage.
✣ ✣ ✣
Editors: So Tim, it all started with the $5 price point. I know we did market research and that was the price at which most people shrugged and said they’d pay for backup. But it was so audacious! The tech didn’t exist to offer that price. Why do you start there?
Tim Nufire: It was the pricing given to us by the competitors, they didn’t give us a lot of choice. But it was never a challenge of if we should do it, but how we would do it. I had been managing my own backups for my entire career; I cared about backups. So it’s not like backup was new, or particularly hard. I mean, I firmly believe Brian Wilson’s (Backblaze’s Chief Technical Officer) top line: You read a byte, you write a byte. You can read the byte more gently than other services so as to not impact the system someone is working on. You might be able to read a byte a little faster. But at the end of the day, it’s an execution game not a technology game. We simply had to out execute the competition.
E: Easy to say now, with a company of 113 employees and more than a decade of success behind us. But at that time, you were five guys crammed into a Palo Alto apartment with no funding and barely any budget and the competition — Dell, HP, Amazon, Google, and Microsoft — they were huge! How do you approach that?
TN: We always knew we could do it for less. We knew that the math worked. We knew what the cost of a 1 TB hard drive was, so we knew how much it should cost to store data. We knew what those markups were. We knew, looking at a Dell 2900, how much the margin was in that box. We knew they were overcharging. At that time, I could not build a desktop computer for less than Dell could build it. But I could build a server at half their cost.
I don’t think Dell or anyone else was being irrational. As long as they have customers willing to pay their hard margins, they can’t adjust for the potential market. They have to get to the point where they have no choice. We didn’t have that luxury.
So, at the beginning, we were reluctant hardware manufacturers. We were manufacturing because we couldn’t afford to pay what people were charging, not because we had any passion for hardware design.
E: Okay, so you came on at that point to build a cloud. Is that where your title comes from? Chief Cloud Officer? The pods were a little ways down the road, so Podfather couldn’t have been your name yet. …
TN: This was something like December, 2007. Gleb (Budman, the Chief Executive Officer of Backblaze) and I went snowboarding up in Tahoe, and he talked me into joining the team. … My title at first was all wrong, I never became the VP of Engineering, in any sense of the word. That was never who I was. I held the title for maybe five years, six years before we finally changed it. Chief Cloud Officer means nothing, but it fits better than anything else.
E: It does! You built the cloud for Backblaze with the Storage Pod as your water molecule (if we’re going to beat the cloud metaphor to death). But how does it all begin? Take us back to that moment: the podception.
TN: Well, the first pod, per se, was just a bunch of USB drives strapped to a shelf in the data center attached to two Dell 2900 towers. It didn’t last more than an hour in production. As soon as it got hit with load, it just collapsed. Seriously! We went live on this and it lasted an hour. It was a complete meltdown.
Two things happened: The bus was completely unstable, so the USB drives were unstable. Second, the DRDB (Distributed Replicated Block Device) — which is designed to protect your data by live mirroring it between the two towers — immediately fell apart. You implement a DRDB not because it works in a well-running situation, but because it covers you in the failure mode. And in failure mode it just unraveled — in an hour. It went into a split-brain mode under the hardware failures that the USB drives were causing. A well-running DRDB is fully mirrored, and split-brained mode is when the two sides simply give up and start acting autonomously because they don’t know what the other side is doing and they’re not sure who is boss. The data is essentially inconsistent at that point because you can choose A or B but the two sides are not in agreement.
While the USB specs say you can connect something like 256 or 128 drives to a hub, we were never able to do more than like, five. After something like five or six, the drives just start dropping out. We never really figured it out because we abandoned the approach. I just took the drives out and shoved them inside of the Dells, and those two became pods number 0 and 1. The Dells had room for 10 or 8 drives apiece, and so we brought that system live.
That was what the first six years of this company was like, just a never-ending stream of those kind of moments — mostly not panic inducing, mostly just: you put your head down and you start working through the problems. There’s a little bit of adrenaline, that feeling before a big race of an impending moment. But you have to just keep going.
E: Wait, so this wasn’t in testing? You were running this live?
TN: Totally! We were in friends-and-family beta at the time. But the software was all written. We didn’t have a lot of customers, but we had launched, and we managed to recover the files: whatever was backed up. The system has always had self healing built into the client.
E: So where do you go from there? What’s the next step?
TN: These were the early days. We were terrified of any commitments. So I think we had leased a half cabinet at the 365 Main facility in San Francisco, because that was the most we could imagine committing to in a contract: We committed to a year’s worth of this tiny little space.
We had those first two pods — the two Dell Towers (0 and 1) — which we eventually built out using external exclosures. So those guys had 40 or 45 drives by the end, with these little black boxes attached to them.
Pod number 2 was the plywood pod, which was another moment of sitting in the data center with a piece of hardware that just didn’t work out of the gate. This was Chris Robertson’s prototype. I credit him with the shape of the basic pod design, because he’s the one that came up with the top loaded 45 drives design. He mocked it up in his home woodshop (also known as a garage).
E: Wood in a data center? Come on, that’s crazy, right?
TN: It was what we had! We didn’t have a metal shop in our garage, we had a woodshop in our garage, so we built a prototype out of plywood, painted it white, and brought it to the data center. But when I went to deploy the system, I ended up having to recable and rewire and reconfigure it on the fly, sitting there on the floor of the data center, kinda similar to the first day.
The plywood pod was originally designed to be 45 drives, top loaded with port multipliers — we didn’t have backplanes. The port multipliers were these little cards that took one set of cables in and five cables out. They were cabled from the top. That design never worked. So what actually got launched was a fifteen drive system that had these little five drive enclosures that we shoved into the face of the plywood pod. It came up as a 15 drive, traditionally front-mounted design with no port multipliers. Nothing fancy there. Those boxes literally have five SATA connections on the back, just a one-to-one cabling.
E: What happened to the plywood pod? Clearly it’s cast in bronze somewhere, right?
TN: That got thrown out in the trash in Palo Alto. I still defend the decision. We were in a small one-bedroom apartment in Palo Alto and all this was cruft.
E: Brutal! But I feel like this is indicative of how you were working. There was no looking back.
TN: We didn’t have time to ask the question of whether this was going to work. We just stayed ahead of the problems: Pods 0 and 1 continued to run, pod 2 came up as a 15 drive chassis, and runs.
The next three pods are the first where we worked with Protocase. These are the first run of metal — the ones where we forgot a hole for the power button, so you’ll see the pried open spots where we forced the button in. These are also the first three with the port-multiplier backplane. So we built a chassis around that, and we had horrible drive instability.
We were using the Western Digital Green, 1 TB drives. But we couldn’t keep them in the RAID. We wrote these little scripts so that in the middle of the night, every time a drive dropped out of the array, the script would put it back in. It was this constant motion and churn creating a very unstable system.
We suspected the problem was with power. So we made the octopus pod. We drilled holes in the bottom, and ran it off of three PSUs beneath it. We thought: “If we don’t have enough power, we’ll just hit it with a hammer.” Same thing on cooling: “What if it’s getting too hot?” So we put a box fan on top and blew a lot of air into it. We were just trying to figure out what it was that was causing trouble and grief. Interestingly, the array in the plywood pod was stable, but when we replaced the enclosure with steel, it became unstable as well!
We slowly circled in on vibration as the problem. That plywood pod had actual disk enclosure with caddies and good locking mechanisms, so we thought the lack of caddies and locking mechanisms could be the issue. I was working with Western Digital at the time, too, and they were telling me that they also suspected vibration as the culprit. And I kept telling them, ‘They are hard drives! They should work!’
At the time, Western Digital was pushing me to buy enterprise drives, and they finally just gave me a round of enterprise drives. They were worse than the consumer drives! So they came over to the office to pick up the drives because they had accelerometers and lot of other stuff to give us data on what was wrong, and we never heard from them again.
We learned later that, when they showed up in an office in a one bedroom apartment in Palo Alto with five guys and a dog, they decided that we weren’t serious. It was hard to get a call back from them after that … I’ll admit, I was probably very hard to deal with at the time. I was this ignorant wannabe hardware engineer on the phone yelling at them about their hard drives. In hindsight, they were right; the chassis needed work.
But I just didn’t believe that vibration was the problem. It’s just 45 drives in a chassis. I mean, I have a vibration app on my phone, and I stuck the phone on the chassis and there’s vibration, but it’s not like we’re trying to run this inside a race car doing multiple Gs around corners, it was a metal box on a desk with hard drives spinning at 5400 or 7200 rpm. This was not a seismic shake table!
The early hard drives were secured with EPDM rubber bands. It turns out that real rubber (latex) turns into powder in about two months in a chassis, probably from the heat. We discovered this very quickly after buying rubber bands at Staples that just completely disintegrated. We eventually got better bands, but they never really worked. The hope was that they would secure a hard drive so it couldn’t vibrate its neighbors, and yet we were still seeing drives dropping out.
At some point we started using clamp down lids. We came to understand that we weren’t trying to isolate vibration between the drives, but we were actually trying to mechanically hold the drives in place. It was less about vibration isolation, which is what I thought the rubber was going to do, and more about stabilizing the SATA connector on the backend, as in: You don’t want the drive moving around in the SATA connector. We were also getting early reports from Seagate at the time. They took our chassis and did vibration analysis and, over time, we got better and better at stabilizing the drives.
We started to notice something else at this time: The Western Digital drives had these model numbers followed by extension numbers. We realized that drives that stayed in the array tended to have the same set of extensions. We began to suspect that those extensions were manufacturing codes, something to do with which backend factory they were built in. So there were subtle differences in manufacturing processes that dictated whether the drives were tolerant of vibration or not. Central Computer was our dominant source of hard drives at the time, and so we were very aggressively trying to get specific runs of hard drives. We only wanted drives with a certain extension. This was before the Thailand drive crisis, before we had a real sense of what the supply chain looked like. At that point we just knew some drives were better than others.
E: So you were iterating with inconsistent drives? Wasn’t that insanely frustrating?
TN: No, just gave me a few more gray hairs. I didn’t really have time to dwell on it. We didn’t have a choice of whether or not to grow the storage pod. The only path was forward. There was no plan B. Our data was growing and we needed the pods to hold it. There was never a moment where everything was solved, it was a constant stream of working on whatever the problem was. It was just a string of problems to be solved, just “wheels on the bus.” If the wheels fall off, put them back on and keep driving.
E: So what did the next set of wheels look like then?
TN: We went ahead with a second small run of steel pods. These had a single Zippy power supply, with the boot drive hanging over the motherboard. This design worked until we went to 1.5TB drives and the chassis would not boot. Clearly a power issue, so Brian Wilson and I sat there and stared at the non-functioning chassis trying to figure out how to get more power in.
The issue with power was not that we were running out of power on the 12V rail. The 5V rail was the issue. All the high end, high-power PSUs give you more and more power on 12V because that’s what the gamers need — it’s what their CPUs and the graphics card need, so you can get a 1000W or a 1500W power supply and it gives you a ton of power on 12V, but still only 25 amps on 5V. As a result, it’s really hard to get more power on the 5V rail, and a hard drive takes 12V and 5V: 12V to spin the motor and 5V to power the circuit board. We were running out of the 5V.
So our solution was two power supplies, and Brian and I were sitting there trying to visually imagine where you could put another power supply. Where are you gonna put it? We can put it were the boot drive is, and move the boot drive to the side, and just kind of hang the PSU up and over the motherboard. But the biggest consequence with this was, again, vibration. Mounting the boot drive to the side of a vibrating chassis isn’t the best place for a boot drive. So we had higher than normal boot drive failures in those nine.
So the next generation, after pod number 8, was the beginning of Storage Pod 1.0. We were still using rubber bands, but it had two power supplies, 45 drives, and we built 20 of them, total. Casey Jones, as our designer, also weighed in at this point to establish how they would look. He developed the faceplate design and doubled down on the deeper shade of red. But all of this was expensive and scary for us: We’re gonna spend $10 grand!? We don’t have much money. We had been two years without salary at this point.
We talked to Ken Raab from Sonic Manufacturing, and he convinced us that he could build our chassis, all in, for less than we were paying. He would take the task off my plate, I wouldn’t have to build the chassis, and he would build the whole thing for less than I would spend on parts … and it worked. He had better backend supplier connections, so he could shave a little expense off of everything and was able to mark up 20%.
We fixed the technology and the human processes. On the technology side, we were figuring out the hardware and hard drives, we were getting more and more stable. Which was required. We couldn’t have the same failure rates we were having on the first three pods. In order to reduce (or at least maintain) the total number of problems per day, you have to reduce the number of problems per chassis, because there’s 32 of them now.
We were also learning how to adapt our procedures so that the humans could live. By “the Humans,” I mean me and Sean Harris who joined me in 2010. There are physiological and psychological limits to what is sustainable and we were nearing our wits end.… So, in addition to stabilizing the chassis design, we got better at limiting the type of issues that would wake us up in the middle of the night.
E: So you reached some semblance of stability in your prototype and in your business. You’d been sprinting with no pay for a few years to get to this point and then … you decide to give away all your work for free? You open sourced Storage Pod 1.0 on September 9th, 2009. Were you a nervous wreck that someone was going to run away with all your good work?
TN: Not at all. We were dying for press. We were ready to tell the world anything they would listen to. We had no shame. My only regret is that we didn’t do more. We open sourced our design before anyone was doing that, but we didn’t build a community around it or anything.
Remember, we didn’t want to be a manufacturer. We would have killed for someone to build our pods better and cheaper than we could. Our hope from the beginning was always that we would build our own platform until the major vendors did for the server market what they did in the personal computing market. Until Dell would sell me the box that I wanted at the price I could afford, I was going to continue to build my chassis. But I always assumed they would do it faster than a decade.
Supermicro tried to give us a complete chassis at one point, but their problem wasn’t high margin; they were targeting too high of performance. I needed two things: Someone to sell me a box and not make too much profit off of me, and I needed someone who would wrap hard drives in a minimum performance enclosure and not try to make it too redundant or high performance. Put in one RAID controller, not two; daisy chain all the drives; let us suffer a little! I don’t need any of the hardware that can support SSDs. But no matter how much we ask for barebones servers, no one’s been able to build them for us yet.
So we’ve continued to build our own. And the design has iterated and scaled with our business. So we’ll just keep iterating and scaling until someone can make something better than we can.
E: Which is exactly what we’ve done, leading from Storage Pod 1.0 to 2.0, 3.0, 4.0, 4.5, 5.0, to 6.0 (if you want to learn more about these generations, check out our Pod Museum), preparing the way for more than 800 petabytes of data in management.
✣ ✣ ✣
But while Tim is still waiting to pass along the official Podfather baton, he’s not alone. There was the early help from Brian Wilson, Casey Jones, Sean Harris, and a host of others, and then in 2014, Ariel Ellis came aboard to wrangle our supply chain. He grew in that role over time until he took over the responsibility over charting the future of the Pod via Backblaze Labs, becoming the Podson, so to speak. Today, he’s sketching the future of Storage Pod 7.0, and — provided no one builds anything better in the meantime — he’ll tell you all about it on our blog.
This post is for all of the storage geeks out there who have followed the adventures of Backblaze and our Storage Pods over the years. The rest of you are welcome to come along for the ride.
It has been 10 years since Backblaze introduced our Storage Pod to the world. In September 2009, we announced our hulking, eye-catching, red 4U storage server equipped with 45 hard drives delivering 67 terabytes of storage for just $7,867 — that was about $0.11 a gigabyte. As part of that announcement, we open-sourced the design for what we dubbed Storage Pods, telling you and everyone like you how to build one, and many of you did.
Backblaze Storage Pod version 1 was announced on our blog with little fanfare. We thought it would be interesting to a handful of folks — readers like you. In fact, it wasn’t even called version 1, as no one had ever considered there would be a version 2, much less a version 3, 4, 4.5, 5, or 6. We were wrong. The Backblaze Storage Pod struck a chord with many IT and storage folks who were offended by having to pay a king’s ransom for a high density storage system. “I can build that for a tenth of the price,” you could almost hear them muttering to themselves. Mutter or not, we thought the same thing, and version 1 was born.
Tim, the “Podfather” as we know him, was the Backblaze lead in creating the first Storage Pod. He had design help from our friends at Protocase, who built the first three generations of Storage Pods for Backblaze and also spun out a company named 45 Drives to sell their own versions of the Storage Pod — that’s open source at its best. Before we decided on the version 1 design, there were a few experiments along the way:
The original Storage Pod was prototyped by building a wooden pod or two. We needed to test the software while the first metal pods were being constructed.
The Octopod was a quick and dirty response to receiving the wrong SATA cables — ones that were too long and glowed. Yes, there are holes drilled in the bottom of the pod.
The original faceplate shown above was used on about 10 pre-1.0 Storage Pods. It was updated to the three circle design just prior to Storage Pod 1.0.
Why are Storage Pods red? When we had the first ones built, the manufacturer had a batch of red paint left over that could be used on our pods, and it was free.
Back in 2007, when we started Backblaze, there wasn’t a whole lot of affordable choices for storing large quantities of data. Our goal was to charge $5/month for unlimited data storage for one computer. We decided to build our own storage servers when it became apparent that, if we were to use the other solutions available, we’d have to charge a whole lot more money. Storage Pod 1.0 allowed us to store one petabyte of data for about $81,000. Today we’ve lowered that to about $35,000 with Storage Pod 6.0. When you take into account that the average amount of data per user has nearly tripled in that same time period and our price is now $6/month for unlimited storage, the math works out about the same today as it did in 2009.
We Must Have Done Something Right
The Backblaze Storage Pod was more than just affordable data storage. Version 1.0 introduced or popularized three fundamental changes to storage design: 1) You could build a system out of commodity parts and it would work, 2) You could mount hard drives vertically and they would still spin, and 3) You could use consumer hard drives in the system. It’s hard to determine which of these three features offended and/or excited more people. It is fair to say that ten years out, things worked out in our favor, as we currently have about 900 petabytes of storage in production on the platform.
Over the last 10 years, people have warmed up to our design, or at least elements of the design. Starting with 45 Drives, multitudes of companies have worked on and introduced various designs for high density storage systems ranging from 45 to 102 drives in a 4U chassis, so today the list of high-density storage systems that use vertically mounted drives is pretty impressive:
Exos AP 4U100
Thunder SX FA100-B7118
Viking Enterprise Solutions
Viking Enterprise Solutions
Viking Enterprise Solutions
Another driver in the development of some of these systems is the Open Compute Project (OCP). Formed in 2011, they gather and share ideas and designs for data storage, rack designs, and related technologies. The group is managed by The Open Compute Project Foundation as a 501(c)(6) and counts many industry luminaries in the storage business as members.
What Have We Done Lately?
In technology land, 10 years of anything is a long time. What was exciting then is expected now. And the same thing has happened to our beloved Storage Pod. We have introduced updates and upgrades over the years twisting the usual dials: cost down, speed up, capacity up, vibration down, and so on. All good things. But, we can’t fool you, especially if you’ve read this far. You know that Storage Pod 6.0 was introduced in April 2016 and quite frankly it’s been crickets ever since as it relates to Storage Pods. Three plus years of non-innovation. Why?
If it ain’t broke, don’t fix it. Storage Pod 6.0 is built in the US by Equus Compute Solutions, our contract manufacturer, and it works great. Production costs are well understood, performance is fine, and the new higher density drives perform quite well in the 6.0 chassis.
Disk migrations kept us busy. From Q2 2016 through Q2 2019 we migrated over 53,000 drives. We replaced 2, 3, and 4 terabyte drives with 8, 10, and 12 terabyte drives, doubling, tripling and sometimes quadrupling the storage density of a storage pod.
Lots of data kept us busy. In Q2 2016, we had 250 petabytes of data storage in production. Today, we have 900 petabytes. That’s a lot of data you folks gave us (thank you by the way) and a lot of new systems to deploy. The chart below shows the challenge our data center techs faced.
In other words, our data center folks were really, really busy, and not interested in shiny new things. Now that we’ve hired a bunch more DC techs, let’s talk about what’s next.
Storage Pod Version 7.0 — Almost
Yes, there is a Backblaze Storage Pod 7.0 on the drawing board. Here is a short list of some of the features we are looking at:
Updating the motherboard
Upgrade the CPU and consider using an AMD CPU
Updating the power supply units, perhaps moving to one unit
Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
Upgrading the SATA cards
Modifying the tool-less lid design
The timeframe is still being decided, but early 2020 is a good time to ask us about it.
“That’s nice,” you say out loud, but what you are really thinking is, “Is that it? Where’s the Backblaze in all this?” And that’s where you come in.
The Next Generation Backblaze Storage Pod
We are not out of ideas, but one of the things that we realized over the years is that many of you are really clever. From the moment we open sourced the Storage Pod design back in 2009, we’ve received countless interesting, well thought out, and occasionally odd ideas to improve the design. As we look to the future, we’d be stupid not to ask for your thoughts. Besides, you’ll tell us anyway on Reddit or HackerNews or wherever you’re reading this post, so let’s just cut to the chase.
Build or Buy
The two basic choices are: We design and build our own storage servers or we buy them from someone else. Here are some of the criteria as we think about this:
Cost: We’d like the cost of a storage server to be about $0.030 – $0.035 per gigabyte of storage (or less of course). That includes the server and the drives inside. For example, using off-the-shelf Seagate 12 TB drives (model: ST12000NM0007) in a 6.0 Storage Pod costs about $0.032-$0.034/gigabyte depending on the price of the drives on a given day.
Maintenance: Things should be easy to fix or replace — especially the drives.
Commodity Parts: Wherever possible, the parts should be easy to purchase, ideally from multiple vendors.
Racks: We’d prefer to keep using 42” deep cabinets, but make a good case for something deeper and we’ll consider it.
Possible Today: No DNA drives or other wistful technologies. We need to store data today, not in the year 2061.
Scale: Nothing in the solution should limit the ability to scale the systems. For example, we should be able to upgrade drives to higher densities over the next 5-7 years.
Other than that there are no limitations. Any of the following acronyms, words, and phrases could be part of your proposed solution and we won’t be offended: SAS, JBOD, IOPS, SSD, redundancy, compute node, 2U chassis, 3U chassis, horizontal mounted drives, direct wire, caching layers, appliance, edge storage units, PCIe, fibre channel, SDS, etc.
The solution does not have to be a Backblaze one. As the list from earlier in this post shows, Dell, HP, and many others make high density storage platforms we could leverage. Make a good case for any of those units, or any others you like, and we’ll take a look.
What Will We Do With All Your Input?
We’ve already started by cranking up Backblaze Labs again and have tried a few experiments. Over the coming months we’ll share with you what’s happening as we move this project forward. Maybe we’ll introduce Storage Pod X or perhaps take some of those Storage Pod knockoffs for a spin. Regardless, we’ll keep you posted. Thanks in advance for your ideas and thanks for all your support over the past ten years.
Everyone is faced with the decision to raise prices at some point. It sucks, but in some cases you have to do it. Most companies, especially SaaS businesses, will look at their revenue forecasts, see a dip, run a calculation predicting the difference between the revenue increase and how many customers might leave, and then raise prices if the math looks favorable. Backblaze is not most companies — here’s how we did it.
In February of 2019, we made the announcement that one month later, our prices for our Personal Backup and Business Backup services would be going up by $1: our first price increase for our Computer Backup service since launching the service over a decade ago. What was announced in February 2019 actually started in December 2016, more than two years before the actual price increase would take effect. Why the long wait? We wanted to make sure that we did it right, not just mechanically (there’s a lot of billing code that has to change), but also in how we communicated to our customers and and took them through the process. Oh, and a big reason for the delay was our main competitor leaving the consumer space, but more on that later.
In this post I’ll dive in to our process for how we wanted the price increase to go, why we decided to build the extension program for existing customers, what went in to our communication strategy, and what the reactions were to the price increase, including looking at churn numbers.
Is Raising Prices a Smart Move?
Raising prices, especially on a SaaS product where you’ve built a following, is never an easy decision. There are a ton of factors that come into play when considering what, if any, is the best course of action. Each factor needs to be considered individually and then as a whole to determine whether the price increase will actually benefit the business long term.
Why Raise Prices?
There are many reasons why companies raise prices. Typically it’s to either increase revenue or adjust to the market costs (the total cost associated with providing goods or services) in their sector. In our case it was the latter. In the price increase announcement, we discussed our reasoning in-depth, but it boiled down to two things: 1) adjusting to the market cost of storage (it was no longer decreasing at the rate it was when we first launched the product), and 2) we had spent years enhancing the service and making it easier for people to store more and more data with us, thereby increasing our costs.
One of the core values of Backblaze is to make backup astonishingly easy and affordable. Maintaining a service that is easy to use, has predictable pricing, and takes care of the heavy lifting for our customers was and is very important to us. When we started considering increasing prices we knew that we were going to be messing with the affordable part of that equation, but it was time for us to adjust to the market.
How to Raise Prices?
Most companies say that they love their customers, and many actually do. When we first started discussing exactly how we were going to raise prices we rejected the easiest path, which was to create a pricing table, update the website, and flip a switch. That was the easy way, but it was important for us to do something for the customers who have trusted us with their important files and memories throughout the years. We would still need to build out the pricing table (fun fact: from 2008 to 2017 our prices were hard-coded) but we started thinking about creating an extension program for our existing customers and fans.
The Extension Program
The extension program was a way for existing Backblaze users to prepay for one year of service, essentially delaying their price increase. They would buy 12 months of backup credits for $50 for each computer on their account, and after those credits were used up, the new prices would go into effect on their next renewal. It was a way to say thank you to our existing customers, but there was just one problem — it didn’t exist.
Building the extension program became a six month project in and of itself. First we needed to build a crediting system. Then, we needed to build the mechanism for our customers to actually buy that block of credits and have them applied to their account. Afterwards, we’d need FAQs, confirmation emails, and website changes to help explain the program to our customers. This became a full-time job for a handful of our most senior engineers, and resulted in a six month project before we were ready to put it through our QA testing. The long development time of the project was a large point of consideration, but there were also financial implications that we had to consider.
The extension program was great for customers, but good/bad for Backblaze. Why? By allowing folks to sign up for an extension we were essentially delaying their price increase, therefore delaying our ability to collect the additional revenue. While that was not ideal, the extension program brought in additional revenue from people purchasing those extensions, which was good. However, since those purchases were for credits, that additional revenue was deferred, and we still had to provide the service. So, while good from a cash flow perspective (we moved up about $2M in cash), we had to be very careful about how we accounted for and earmarked that money.
Continuing to Provide Value
Extensions were only part of the puzzle. We didn’t want customers to feel like we were simply raising prices to line our pockets. Our goal is to continue making backup easy and affordable, and we wanted to show our fans that we were still actively developing the service. The simplest way to show forward progress is to make…forward progress. We decided that before the announcement date we needed to have a product release that substantially improved the backup service, and that’s when we started to plan Backblaze Version 5.0, what we dubbed the Rapid Access Release.
Adding to the development time of creating extensions were the projects to speed up both the backup and restore functions of the Backblaze app (those changes were good for customers, but actually increased our cost of providing the service). In addition, customers could now preview, access, and share backed up files by leveraging our B2 Cloud Storage product. To top it off we strengthened our security by adding ToTP as a two-factor verification method. All those features were rolled up into the 5.0 release and were released a few weeks before we were set to announce our price increase, which was scheduled to be announced on August 22nd, 2017.
Another of our core values is open communication, which we equate to being as open as possible. If you have followed Backblaze over the years, you know that we’ve open sourced our storage pod design, shared our hard drive failure statistics, and have told entertaining stories about how we survived the Thailand drive crisis, and the time we were almost acquired. Most companies would never talk about topics like these, but we don’t shy away from hard conversations. In keeping with that tradition, we made the decision to be honest in our announcement about why we were raising prices (market costs and our own enhancements). We also made the decision to not mention one valid reason: inflation.
Our price back in 2008 was $5/month. With the inflation rate, in 2019 that would be around $5.96, so our price increase to $6 was right in-line with the inflation rate. So why not talk about it? We wanted the conversation to be about our business and the benefits that we’re providing for our customers in building a service that they feel is a good value. Bringing up global economics seemed like an odd tactic, considering that we weren’t even keeping up with inflation and ultimately customers got there on their own.
Lol, you guys kill me. A $1 price increase over 10 years? You aren’t even keeping up with inflation
When reading @backblaze blog post justifying their price increase you would think they were changing dramatically, an increase of $1 / month since 2008 barely exceeds inflation for a service that keeps getting better!
We started down the increase path in 2016. In 2017, we designed and released version 5.0, we built and tested our extension program, we lined up our blog post, we wrote up FAQs, and we created customer service emails to let people know what was happening. After all that, we were ready to announce the following month’s price increase at 10am Pacific Time on August 22nd, 2017.
On August 22nd, at 8am, we pulled the plug and cancelled the announcement.
Early that morning news broke that our main competitor, Crashplan, was leaving the consumer backup space. You may be saying: Wait a minute, a main competitor is leaving the market and you have a mechanism to increase your prices in place — that sounds like the perfect day to raise prices! Nope. Another one of our values, is to be fair and good. Raising prices on a day when consumers found out that there were fewer choices in the market felt predatory and ultimately gross. Once we saw the news, we got in a room, quickly decided that we couldn’t raise prices for at least 6 months, and instead we would write a quick blog post inviting orphaned customers to give us a try.
The year following Crashplan’s announcement we saw a huge increase in customers, which is simultaneously good and bad. It was good because of the increased revenue from our newfound customers, but less ideal from an operations perspective, as we were not anticipating an influx of customers. In fact, we were anticipating an increase in churn coinciding with our cancelled price increase announcement. That meant we had to scramble to deploy enough storage to house all of the new incoming data.
We wouldn’t revisit the price increase until a year after the Crashplan announcement.
That decision was not without financial repercussions. Put simply, we gave up $10 per customer per year. And, the decision affected not only our existing customers on August 22nd, but also all of those we would gain over the coming months and years. While this doesn’t factor in potential churn and other variables, when the size of our customer base is fully accounted for, the revenue left on the table was significant. In purely financial terms, raising prices on the day when the industry started having fewer options would have been the right financial decision, but not the right Backblaze decision.
Hindsight Is 20/20
Looking back, releasing version 5.0 earlier that month was a happy accident. What originally was intended to show forward progress to our existing customers was now being looked at by a lot of new customers and prospects as well. The speed increase that we built into the app as part of the release made it possible for people exiting Crashplan’s service to transition to us and get fully backed up more quickly. Because these were people who understood the importance of keeping a backup, having no downtime in their coverage was a huge benefit.
Picking Up Where We Left Off — The Price Increase
Around August of 2018, we decided that enough time had passed and we were comfortable dusting off our price increase playbook. The process proved harder than we thought as we uncovered edge-cases that we had missed the first time around — another happy accident.
The Problem With Long Development Gaps
The new plan was to announce the price increase in December and raise prices in January 2019. When we started unpacking our playbook and going over the plan, we realized that the simple decisions we had made over a year ago were either flawed or outdated. A good example of this was how we would treat two-year licenses. At one point in the original project spec, we decided that we were simply going to slide the renewal date by one year for anyone with a two-year license that purchased an extension, pushing their actual renewal date out a year. Upon thinking about it again, we realized this would cause a lot of customer issues and had to re-do the entire plan for two-year customers, a large part of our install base.
While we did have project sheets and spec documents, we also realized that we had lost a lot of the in the moment knowledge that comes in project development. We found ourselves constantly saying things like, “why did we make this choice,” and “are we sure that’s what we meant here?” The long gap between the original project start date and the day we picked it back up meant that the ramp-up time for the extension program was a lot longer than we expected. We realized that we wouldn’t be able to announce the price increase in December, with prices going up at the start of the year: we needed more time, both to QA the extension program and create version 6.0.
Part of the original playbook was to provide value for customers by releasing version 5.0, and we wanted to stick to the original plan. We started thinking about what it would take to have another meaningful release and version 6.0, the Larger Longer Faster Better release was born.
First, we doubled the size of physical media restores, allowing people to get back more of their data more quickly and affordably (this was an oft-requested change, and one that is an example of a good-for-the-customer feature that incurs Backblaze extra costs). We leveraged B2 Cloud Storage again and built in functionality that would allow people to save their backed up data to B2, building off of the previous year’s preview and share capabilities. We made the service more efficient, increased backup speeds again, and also added network management tools. Looking past the Mac and PC apps, we also revamped our mobile offerings by refreshing our iOS and Android apps. All of that added development time again, and our new time table for the price increase was a February 2019 announcement, with the price increase going into effect in March.
Wait a Minute…
You might be saying, you released version 5.0 in a run-up to a price increase, then scrapped it, and then released version 6.0 in a run-up to a price increase. Does that mean that every new version number increase will be followed by a price increase? Absolutely not. The first five versions of Backblaze didn’t precipitate a price increase, and we’re already hard at work on version 7.0 with no planned price increases on the horizon.
Price Increase Announcement
We’ve all been subjected to price increases that were clandestine, then abruptly announced and put into effect the same day, or were not well explained. That never feels great and we really wanted to give customers one month of warning before the prices actually increased. That would give people time to buy the extensions that we worked so hard to build. Conversely, if people were on monthly licenses, or had a renewal date coming up after the price increase went into effect, it would give them an opportunity to cancel their service ahead of the increase. Of course we didn’t want anyone to leave, but realized that any change in our subscription plans would cause a stir and people who were more price-sensitive would likely have second thoughts about renewing.
Another goal was to be as communicative as possible. We wanted our customers to know exactly what we were doing, why we were doing it, and we didn’t want anyone to fall through the cracks of not knowing that this was happening. That meant writing a blog post, creating emails for all Personal Backup customers and Group administrators, and even briefing some members of the press and reviews sites who’d need to update their pricing tables. It might seem silly to pitch the press on a price increase (something that is usually a negative event), but we’ve had some wonderful relationships develop with journalists over the years and it felt like the right thing to do to let them know ahead of time.
Once all of those things were in back in place, it was time to press go, this time for real. The price increase was announced on February 12th and went into effect March 12th.
The Reaction & Churn Analysis
Customer Reaction — Plan for the Worst, Hope for the Best
We didn’t expect the response to be positive. Planning is great, but you never know exactly what’s going to happen until it’s actually happening. We were ready with support responses, FAQs, and a communications plan in case the response was overwhelmingly negative, but were lucky and that didn’t turn out to be the case.
Customers wrote to us and said, finally. Some people went out of their way to express how relieved they were that we were finally going to raise prices, concerned that we had been burning cash over the years. Other sentiments made it clear that we communicated the necessity for the increase and priced it correctly, saying that a $1 increase after 12 years is more than fair.
@GlebBudman Thanks for explaining so clearly and respectfully to your customers online. I am OK with the price increase. A healthy @backblaze means a healthy location for my backups. Well worth the price.
@GlebBudman As a @backblaze customer since May, 2015, I received the notice today of the product price increase by $1 per month to $6. A very fair price adjustment for the service your company provides. No worries. Thank you for such an excellent product. – David | Boston, MA
When the press picked up the story, they had similar sentiments. Yes, it was news that Backblaze was increasing prices, but the reports were positive and very fair. One of the press members that we sent the news to early responded with: “Seems reasonable…”
There were of course some people who were angry and annoyed, and while some of our customers did come to our defense, we did see an increase in churn.
Churn Rate Analysis
Over the next few months we monitored churn carefully to see the true impact on our existing customers from the price increase.
Every time a person leaves Backblaze we send one final email thanking them for their time with us, wishing them well, and asking if they have any feedback. Those emails go directly into our ticketing system where I read all of them every month to get a picture of why people are leaving Backblaze. Sometimes they are reasons we cannot address, but if we can, they go on our roadmap. After the price increase we’ve seen about a 30 percent increase in people saying that they are leaving for billing reasons. It makes sense that more people are citing the price increase as they leave Backblaze, but we’ve had a lot of positive feedback as well from the issues we addressed in versions 5.0 and 6.0.
What about the people who didn’t necessarily write back to our email? We dove deep into the analytics and found that our typical consumer backup service churn rate six months before announcing the price increase was about 5.38 percent. The six months after announcement saw a churn rate of 5.75 percent, which indicates an increase in churn of about 7 percent. In our estimates we anticipated that number being a bit higher for the first year and then coming back down to historical averages after the bulk of our customers had their first renewal at the new price.
New Customer Acquisition
People leaving the service after you increase prices is only half of the equation. The other half lies in your new customer acquisition. Due to the market having competition, raising prices can cause prospective customers to look elsewhere when comparing products. This number was a bit hard for us to calculate since the year prior our biggest competitor for our consumer service went out of business. The best comparable we had was to look at 2017 versus. 2019. We went back to 2017 to look at the historical data and found that even with the increase, and six months afterwards, two year growth rate of our Personal Backup service was a healthy 42 percent.
Lessons Learned From Raising Prices
We learned a lot during this whole process. One of the most important lessons is treating your customers well and not taking them for granted. At the outset we’d sometimes say things like, “it’s only a dollar, who is going to care,” and we’d quickly nip those remarks in the bud and take the process seriously. A dollar may not seem like much, but to a lot of people and our global customers, it was an increase that they felt and that was evidenced in the churn going up by 7 percent.
Some might think, well a 7 percent increase in churn isn’t so bad, you could have raised prices even more, but that’s the wrong lesson to take away. Any changes to the plan we had in place could have yielded very different results.
The extension program was a hit for our existing customers and a welcome option for many. Taking the time to build it resulted in over 30,000 Backblaze Personal Backup accounts buying extensions, which resulted in about $1.8M in revenue. There is a flip-side to this. If those 30,000 accounts had simply renewed at the increased price, we would have made $2.2M, resulting in $366,000 of lost revenue. But that’s only if you assume that all of those customers would have renewed. Some may have churned, and by buying an extension they signaled to us that they were willing to stay with us, even after the price increase goes into effect for them.
As a customer, def happy with the way you handled this. Appreciated the opportunity to buy an extension at the old pricing, comms was clear & direct, and the increase is reasonable. Cheers!
Having a good foundation of community and an open dialog with your customers is helpful. When we made the announcement, we weren’t met with the anger that we were somewhat anticipating. In large part this was due to our customers trusting us, and knowing that this was not something we were doing because we simply wanted to make a few extra bucks.
When your community trusts you, they are willing to hear you out even when the news is not great. Build a good rapport with your customers and it will hopefully buy you the benefit of the doubt once or twice, but be careful not to abuse that privilege.
Similar to having a good community relationship, explaining the why of what is happening helps educate customers and continues to strengthen your connection with them. When I was on reddit and in the blog post comments discussing the price increase, the people on reddit and on our blog who have grown accustomed to our answering questions were comfortable asking some pretty hard ones, and appreciated when we would respond with thoughtful and long-form answers. I cannot stress enough how much we enjoy the conversations we have on these platforms. We learn a lot about who is using Backblaze, what their pain points are, and if there’s something we can do to help them. These conversations really do affect how we create and consider our product roadmap.
So many companies raise their prices chasing profits, keeping it on the low, so it was refreshing to get an email from @backblaze explaining why they have decided to increase theirs for the first time ever – transparent, decent, fair https://t.co/TDG5mknSLp#howbusinessshouldbe
Rarely does anyone want to increase their prices — especially when it affects customers who have been with them for a decade. Many companies don’t want to discuss their decision making process or playbooks, but there are a lot of organizations that face the need to raise prices. Unfortunately, there are few resources to help them thread the needle between something they have to do, and something that their current and future customers will understand and accept.
I wanted to share our journey through our price increase process in hopes that people find it both informative and interesting. Thinking about your customers first may sound like a trope, but if you spend the time to really sit back and consider their reactions and what you can do as a way to thank your existing customers or clients, you can be successful, or at the very least mitigate some of the downside.
If you’ve ever raised prices at your company, or have examples of companies that have done a great job (or a bad job), we’d love to hear those examples in the comments below!
I never considered myself to be extremely techy. My family and friends would occasionally come to me with computer problems that I could solve with the help of Google and FAQ pages, but I would not go much further than that.
When I came across Backblaze’s job posting for a marketing position, I applied mostly on a lark. My background looked similar to the job description, but I never expected to hear anything from them. When I received the email that I had gotten an interview with Backblaze, my initial thought was, how? Backblaze was the type of company I feared when first arriving in the Bay Area from Ohio. Worry began to bubble up in me about being in a room filled with people who were all smarter or more experienced than me. My family teased me that I would walk into my first day and it would be an episode of Punk’d.
Silicon Valley has a stigma for most everyone who isn’t located in the Bay Area. We assume it will be filled with competitive geniuses and be too expensive to survive. “You may be Ohio smart, but that’s a different kind of smart,” is something I have heard in actual conversations. And I, too, had similar thoughts as I considered trying to fit in at a startup.
Having watched the HBO show, Silicon Valley, my perspective of how my future coworkers would act could not have been more different from reality. The show portrays Silicon Valley workers as smug, arrogant, anti-social coders who are ready to backstab their coworkers on their way to the top of the industry. At Backblaze, I have found the opposite to be true: everyone has been supportive, fun to be around, and team-oriented.
Now that I live in Silicon Valley, rather than watching it, I have to say I let the intimidation get to me. One of my favorite quotes that helps me during times of high stress is by the Co-Founder of Lumi Labs, Marissa Mayer, in reference to how she’s succeeded in her career, “I always did something I was not ready to do. I think that’s how you grow.” That’s an important thing to remember when you are starting a new job, adventure, or experience: On the other side of the challenge, no matter how it goes, you’ll have grown. Here are some of the things that I have learned during my first few weeks of growth at Backblaze and living in the Bay Area. Hopefully, they’ll help you to try something you’re not ready for, too.
Nine Lessons Learned
Don’t be Thrown by Big Words
Write them down. Google is your best friend. There may be words, companies, software, acronyms, and a bunch of other things that come up in meetings that you have never heard before. Take notes. Research them and do research on how they apply to your company or work position. Most of the time it’s something you might have known about but didn’t know the correct word or phrase for.
No One Understands Your Thought Process
Show your work. Something that’s hard when it comes to talking to your boss or your team is that they cannot see inside your brain. Talk them through how you got to where you are with your thoughts and conclusions. There are plenty of times where I have had to remind myself to over-explain an idea or a thought so the people around me could understand and help.
You Don’t Have to Know Everything
Own up to your lack of knowledge. This one is tough because when you are new to a position you have the inclination to not lift the veil and reveal yourself as someone who does not know something. This could be something as big as not knowing how a core feature works or as small as not knowing how the coffee machine works. When you are new to a company you are never going to walk in and know exactly how everything works. At the moment you don’t understand something, admit it and most people you work with will help or at least point you in the direction of where and how to learn.
Living in Someone’s Backyard in an In-Law Suite is Normal
Look everywhere before choosing where to live. Moving to Silicon Valley while trying to establish a stable income sounds impossible, and indeed it is very hard. When talking to people before my move everyone would say, “ugh, the housing payments!” This was not encouraging to hear. But that doesn’t mean there aren’t creative ways to lower your housing costs. While living with roommates to drive housing costs down, I found a family that wanted to make a little extra money and had an unused in-law suite . While it’s not owning your own home or having a full-size apartment to yourself, it’s different and that can be fun! Plus, like with roommates, you never know what connections you will make.
Not Understanding the Software Doesn’t Mean You Don’t Get It
You have the experience, use it. I came to Backblaze with a very surface-level idea of coding, no idea about the different ways to back up my computer, and no knowledge of how the cloud actually works, but I did understand that it was important to have backups. Just because you don’t understand how something works initially doesn’t mean you don’t understand the value it has. You can use that understanding to pitch ideas and bring an outside perspective to the group.
Talk to People with Important Titles
They all have been in your shoes. The CEOs, presidents, directors, and managers of the world all have been in your position at one point. Now they hold those titles, so obviously they did something right. Get to know them and what they enjoy. They are human and they would love to share their wisdom with you, whether it’s about the company, their favorite food places nearby, or where they go to relax.
Don’t Let Things Slip
Follow up. If someone said they were going to show you something in a meeting or in the hallway, send them a note and see if you should schedule a chat. Have a question during an important meeting that you didn’t want to ask? Follow up! Someone mentioned they knew of a class that could teach something you wanted to learn? Make sure they send you a link! All work environments can feel busy but most people would rather you follow up with them rather than let them forget about something that might be important later on.
Soak In the Environment
Be a fly on the wall. Watch how the office operates and how people talk to each other. Get an idea of when people leave for lunch, when to put your headphones on, and what’s normal to wear around the office. Also, pay attention to who talks in meetings and what it is like to pitch an idea. Observing before fully immersing yourself helps you figure out where your experience fits in and how you can best contribute.
Know Yourself and Know Your Worth
You can figure it out. It may take time, patience, research, and understanding to stand confidently in a room full of experts in the field and pitch ideas. You’ve done it before. Maybe when you were little and asked your parents to take the training wheels off your bicycle? It took a few falls but you figured it out and you can do it again.
We hope that this was a little bit helpful or informative or at least entertaining to read! Have you ever joined a company in an industry you weren’t familiar with? What are some tips or hints that you wish you had known? Share them in the comments below!
Big news: Our first European data center, in Amsterdam, is open and accepting customer data!
This is our fourth data center (DC) location and the first outside of the western United States. As longtime readers know, we have two DCs in the Sacramento, California area and one in the Phoenix, Arizona area. As part of this launch, we are also introducing the concept of regions.
When creating a Backblaze account, customers can choose whether that account’s data will be stored in the EU Central or US West region. The choice made at account creation time will dictate where all of that account’s data is stored, regardless of product choice (Computer Backup or B2 Cloud Storage). For customers wanting to store data in multiple regions, please read this knowledge base article on how to control multiple Backblaze accounts using our (free) Groups feature.
Whether you choose EU Central or US West, your pricing for our products will be unchanged:
For B2 Cloud Storage — it’s $0.005/GB/Month. For comparison, storing your data in Amazon S3’s Ireland region will cost ~4.5x more
For Computer Backup — $60/Year/Computer is the monthly cost of our industry leading, unlimited data backup for desktops/laptops
Later this week we will be publishing more details on the process we undertook to get to this launch. Here’s a sneak preview:
Wednesday, August 28:Getting Ready to Go (to Europe). How do you even begin to think about opening a DC that isn’t within any definition of driving distance? For the vast majority of companies on the planet, simply figuring out how to get started is a massive undertaking. We’ll be sharing a little more on how we thought about our requirements, gathered information, and the importance of NATO in the whole equation.
Thursday, August 29: The Great European (Non) Vacation. With all the requirements done, research gathered, and preliminary negotiations held, there comes a time when you need to jump on a plane and go meet your potential partners. For John & Chris, that meant 10 data center tours in 72 hours across three countries — not exactly a relaxing summer holiday, but vitally important!
Friday, August 30: Making a Decision. After an extensive search, we are very pleased to have found our partner in Interxion! We’ll share a little more about the process of narrowing down the final group of candidates and selecting our newest partner.
Q: Does the new DC mean Backblaze has multi-region storage? A: Yes, by leveraging our Groups functionality. When creating an account, users choose where their data will be stored. The default option will store data in US West, but to choose EU Central, simply select that option in the pull-down menu.
If you create a new account with EU Central selected and have an existing account that’s in US West, you can put both of them in a Group, and manage them from there! Learn more about that in our Knowledge Base article.
Q: I’m an existing customer and want to move my data to Europe. How do I do that? A: At this time, we do not support moving existing data within Backblaze regions. While it is something on our roadmap to support, we do not have an estimated release date for that functionality. However, any customer can create a new account and upload data to Europe. Customers with multiple accounts can administer those accounts via our Groups feature. For more details on how to do that, please see this Knowledge Base article. Existing customers can create a new account in the EU Central region and then upload data to it; they can then either keep or delete the previous Backblaze account in US West.
Q: Finally! I’ve been waiting for this and am ready to get started. Can I use your rapid ingest device, the B2 Fireball? A: Yes! However, as of the publication of this post, all Fireballs will ship back to one of our U.S. facilities for secure upload (regardless of account location). By the end of the year, we hope to offer Fireball support natively in Europe (so a Fireball with a European customer’s data will never leave the EU).
Q: What are my payment options? A: All payments to Backblaze are made in U.S. dollars. To get started, you can enter your credit card within your account.
Q: What’s next? A: We’re actively working on region selection for individual B2 Buckets (instead of Backblaze region selection on an account basis), which should open up a lot more interesting workflows! For example, customers who want can create geographic redundancy for data within one B2 account (and for those who don’t want to set that up, they can sleep well knowing they have 11 nines of durability).
We like to develop the features and functionality that our customers want. The decision to open up a data center in Europe is directly related to customer interest. If you have requests or questions, please feel free to put them in the comment section below.
I first met Laura D’Antoni when we were shooting B2 Cloud Storage customer videos for Youngevity and Austin City Limits. I enjoyed talking about her filmmaking background and was fascinated by her journey as a director, editor, and all around filmmaker. When she came to the Backblaze office to shoot our Who We Are and What We Do video, I floated the idea of doing an interview with her to highlight her journey and educate our blog readers who may be starting out or are already established in the filmmaking world. We’ve finally gotten around to doing the interview, and I hope you enjoy the Q&A with Laura below!
Q: How did you get involved in visual storytelling? My interest in directing films began when I was 10 years old. Back then I used my father’s Hi8 camera to make short films in my backyard using my friends as actors. My passion for filmmaking continued through my teens and I ended up studying film and television at New York University.
Q: Do you have a specialty or favorite subject area for your films? I’ve always been drawn to dramatic films, especially those based on real life events. My latest short is a glimpse into a difficult time in my childhood, told in reverse Memento-style from a little girl’s perspective.
Most of my filmmaking career I actually spent in the documentary world. I’ve directed a few feature documentaries about social justice and many more short docs for non-profit organizations like the SPCA.
Q: Who are you visual storyteller inspirations? What motivates you to tell your stories? The film that inspired me the most when I was just starting out was The Godfather: Part II. The visuals and the performances are incredible, and probably my father being from Sicily really drew me in (the culture, not the Mafia, ha!). Lately I’ve been fascinated by the look of The Handmaid’s Tale, and tried to create a similar feel for my film on a much, much tinier budget. As far as what motivates me, it’s the love for directing. Collaborating with a team to make your vision on paper a reality is an incredible feeling. It’s a ton of work that involves a lot of blood, sweat, and tears, but in the end you’ve made a movie! And that’s pretty cool.
Q: What kind of equipment do you take on shoots? Favorite camera, favorite lens? For shoots I bring lights, cameras, tripods, a slider and my gimbal. I use my Panasonic EVA-1 as my main camera and also just purchased the Panasonic GH5 as B-cam to match. Most of my lenses are Canon photo lenses; the L-glass is fantastic quality and I like the look of them. My favorite lens is the Canon 70-200mm f2.8.
Q: How much data per day does a typical shoot create? If I’m shooting in 4K, around 150GB.
Q: How do you back up your daily shoots? Copy to a disk? Bunch of disks? I bring a portable hard drive and transfer all of the footage from the cards to that drive.
Q: Tell us a bit about your workflow from shooting to editing. Generally, if the whole project fits onto a drive, I’ll use that drive to transfer the footage and then edit from it as well. If I’ve shot in 4K then the first step before editing is creating proxies in Adobe Premiere Pro of all of the video files so it’s not so taxing on my computer. Once that’s done I can start the edit!
Q: How do you maintain your data? If it’s a personal project, I have two copies of everything on separate hard drives. For clients, they usually have a backup of the footage on a drive at their office. The data doesn’t really get maintained, it just stays on the drive and may or may not get used again.
Q: What are some best practices for keeping track of all your videos and assets? I think having a Google Docs spreadsheet and numbering your drives is helpful so you know what footage/project is where.
Q: How has having a good backup and archive strategy helped in your filmmaking? Well, I learned the hard way to always back up your footage. Years ago while editing a feature doc, I had an unfortunate incident with PluralEyes software and it ate the audio of one of my interview subjects. We ended up having to use the bad camera audio and nobody was happy. Now I know. I think the best possible strategy really is to have it backed up in the cloud. Hard drives fail, and if you didn’t back that drive up, you’re in trouble. I learned about a great cloud storage solution called Backblaze when I created a few videos for them. For the price it’s absolutely the best option and I plan on dusting off my ancient drives and getting them into the cloud, where they can rest safely until someday someone wants to watch a few of my very first black and white films!
Q: What advice do you have for filmmakers and videographers just starting out? Know what you want to specialize in early on so you can focus on just that instead of many different specialties, and then market yourself as just that.
It also seems that the easiest way into the film world (unless you’re related to Steven Spielberg or any other famous person in Hollywood) is to start from the bottom and work your way up.
Also, remember to always be nice to the people you work with, because in this industry that PA you worked with might be a big time producer before you know it.
Q: What might our readers find surprising about challenges you face in your work? In terms of my directing career, the most challenging thing is to simply be seen. There is so much competition, even among women directors, and getting your film in front of the right person that could bring your career to the next level is nearly impossible. Hollywood is all about who you know, not what you know, unfortunately. So I just keep on making my films and refuse to give up on my dream of winning an Academy Award for best director!
Q: How has your workflow changed since you started working with video? I only worked with film during my college years. It definitely teaches you to take your time and set up that shot perfectly before you hit record,; or triple check where you’re going to cut your film before it ends up on the floor and you have to crawl around and find it to splice it back in. Nowadays that’s all gone. A simple command- z shortcut and you can go back several edits on your timeline, or you can record countless hours on your video camera because you don’t have to pay to have it developed. My workflow is much easier, but I definitely miss the look of film.
At the end of Q2 2019, Backblaze was using 108,660 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 108,461 hard drives. The table below covers what happened in Q2 2019.
Notes and Observations
If a drive model has a failure rate of 0 percent, it means there were no drive failures of that model during Q2 2019 — lifetime failure rates are later in this report. The two drives listed with zero failures in Q2 were the 4 TB and 14 TB Toshiba models. The Toshiba 4 TB drive doesn’t have a large enough number of drives or drive days to be statistically reliable, but only one drive of that model has failed in the last three years. We’ll dig into the 14 TB Toshiba drive stats a little later in the report.
There were 199 drives (108,660 minus 108,461) that were not included in the list above because they were used as testing drives or we did not have at least 60 of a given drive model. We now use 60 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics as there are 60 drives in all newly deployed Storage Pods — older Storage Pod models had a minimum of 45.
2,000 Backblaze Storage Pods? Almost…
We currently have 1,980 Storage Pods in operation. All are version 5 or version 6 as we recently gave away nearly all of the older Storage Pods to folks who stopped by our Sacramento storage facility. Nearly all, as we have a couple in our Storage Pod museum. There are currently 544 version 5 pods each containing 45 data drives, and there are 1436 version 6 pods each containing 60 data drives. The next time we add a Backblaze Vault, which consists of 20 Storage Pods, we will have 2,000 Backblaze Storage Pods in operation.
Goodbye Western Digital
In Q2 2019, the last of the Western Digital 6 TB drives were retired from service. The average age of the drives was 50 months. These were the last of our Western Digital branded data drives. When Backblaze was first starting out, the first data drives we deployed en masse were Western Digital Green 1 TB drives. So, it is with a bit of sadness to see our Western Digital data drive count go to zero. We hope to see them again in the future.
Hello “Western Digital”
While the Western Digital brand is gone, the HGST brand (owned by Western Digital) is going strong as we still have plenty of the HGST branded drives, about 20 percent of our farm, ranging in size from 4 to 12 TB. In fact, we added over 4,700 HGST 12 TB drives in this quarter.
This just in; rumor has it there are twenty 14 TB Western Digital Ultrastar drives getting readied for deployment and testing in one of our data centers. It appears Western Digital has returned: stay tuned.
Goodbye 5 TB Drives
Back in Q1 2015, we deployed 45 Toshiba 5 TB drives. They were the only 5 TB drives we deployed as the manufacturers quickly moved on to larger capacity drives, and so did we. Yet, during their four plus years of deployment only two failed, with no failures since Q2 of 2016 — three years ago. This made it hard to say goodbye, but buying, stocking, and keeping track of a couple of 5 TB spare drives was not optimal, especially since these spares could not be used anywhere else. So yes, the Toshiba 5 TB drives were the odd ducks on our farm, but they were so good they got to stay for over four years.
Hello Again Toshiba 14 TB Toshiba Drives
We’ve mentioned the Toshiba 14 TB drives in previous reports, now we can dig in a little deeper given that they have been deployed almost nine months and we have some experience working with them. These drives got off to a bit of a rocky start, with six failures in the first three months of being deployed. Since then, there has been only one additional failure, with no failures reported in Q2 2019. The result is that the lifetime annualized failure rate for the Toshiba 14 TB drives has decreased to a very respectable 0.78% as shown in the lifetime table in the following section.
Lifetime Hard Drive Stats
The table below shows the lifetime failure rates for the hard drive models we had in service as of June 30, 2019. This is over the period beginning in April 2013 and ending June 30, 2019.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know if you find anything interesting.
Once the hot new marketing strategy, content marketing has lost some of its luster. If you follow marketing newsletters and blogs, you’ve likely even seen the claim that content marketing is dead. Some say it’s no longer effective because consumers are oversaturated with content. Others feel that much of content marketing is too broad a strategy and it’s more effective to target those who can directly affect the behavior of others using influencer marketing. Still others think that the hoopla over content marketing is over and money is better spent on keyword purchases, social media, SEO, and other techniques to direct customers into the top of the marketing funnel.
Backblaze has had its own journey of discovery in figuring out which kind of marketing would help it grow from a small backup and cloud storage business to a serious competitor to Amazon, Google, and Microsoft and other storage and cloud companies. Backblaze’s story provides a useful example of how a company came to content marketing after rejecting or not finding success using a number of other marketing approaches. Content marketing worked for Backblaze in large part due to the culture of the company, which will reinforce our argument a little bit later that content marketing is a lot about your company culture. But first things first: what exactly is content marketing?
What is Content Marketing?
Content marketing is the practice of creating, publishing, and sharing content with the goal of building the reputation and visibility of your brand.
The goal of content marketing is to get customers to come to you by providing them with something they need or enjoy. Once you have their attention, you can promote (overtly or covertly) whatever it is you wish to sell to them.
Conceptually, content marketing is similar to running a movie theatre. The movie gets people into the theatre where they can be sold soft drinks, popcorn, Mike & Ikes and Raisinets, which is how theatre owners make most of their money, not from ticket sales. Now you know why movie theatre snacks and drinks are so expensive; they have to cover the cost of the loss leader, the movie itself, as well as give the owner some profit.
The Growth of Content Marketing
Marketing in recent years has increasingly become a game of metrics. Marketers today have access to a wealth of data about customer and marketing behavior and an ever growing number of apps and tools to quantify and interpret that data. We have all this data because marketing has become largely an online game and it’s fairly easy to collect behavioral data when users interact with websites, emails, webinars, videos, and podcasts. Metrics existed before for conventional mail campaigns and the like, and focus groups provided some confirmation of what marketers guessed was true, but it was generally a matter of manually counting heads, responses, and sales. Now that we’re online, just adding snippets of code to websites, apps, and emails can provide a wealth of information about consumers’ behavior. Conversion, funnel, nurturing, and keyword ranking are in the daily lexicon of marketers who look to numbers to demystify consumer behavior and justify the funding of their programs.
A trend contrary to marketing metrics grew in importance alongside the metrics binge and that trend is modern content marketing. While modern content marketing takes advantage of the immediacy and delivery vehicles of the internet, content marketing itself is as old as any marketing technique. It isn’t close to being the world’s oldest profession, but it does go back to the first attempts by humans to lure consumers to products and services with a better or more polished pitch than the next guy.
Benjamin Franklin used his annual Poor Richard’s Almanack as early as 1732 to promote his printing business and made sure readers knew where his printing shop was located. Farming equipment manufacturer John Deere put out the first issue of The Furrow in 1895. Today it has a circulation of 1.5 million in 40 countries and 12 different languages.
One might argue that long before these examples, stained glass windows in medieval cathedrals were another example of content marketing. They presented stories that entertained and educated and were an enticement to bring people to services.
Much later, the arrival of the internet and the web, and along with them, fast and easy content creation and easy consumer targeting, fueled the rapid growth of content marketing. We now have many more types of media beyond print suitable for content marketing, including social media, blogs, video, photos, podcasts and the like, which enabled content marketing to gain even more power and importance.
What’s the Problem With So Much Content Marketing?
If content marketing is so great, why are we hearing so many statements about content marketing being dead? My view is that content marketing isn’t any more dead now than in was in Benjamin Franklin’s time, and people aren’t going to stop buying popcorn at movie theaters. The problem is that there is so much content marketing that doesn’t reach its potential because it is empty and meaningless.
Unfortunately, too many people who are running content marketing programs have the same mindset as the people running poor metrics marketing programs. They look at what’s worked in the past for themselves or others and assume that repeating an earlier campaign will be as successful as the original. The approach that’s deadly for content marketing is to think that since a little is good, more must be better, and more of the very same thing.
When content marketing isn’t working, it’s usually not the marketing vehicle that’s to blame, it’s the content itself. Hollywood produces some great and creative content that gets people into theaters, but it also produces a lot of formulaic, repetitive garbage that falls flat. If a content marketing campaign is just following a formula and counting on repeating a past success, no amount of obscure performance metric optimization is going to make the content itself any better. That applies just as much to marketing technology products as it does to marketing Hollywood movies.
When content marketing isn’t working, it’s usually not the marketing vehicle that’s to blame, it’s the content itself.
The screenwriter William Goldman (Butch Cassidy and the Sundance Kid, All the President’s Men, Marathon Man, The Princess Bride) once famously said, “In Hollywood, no one knows anything.” He meant that no matter how much experience a producer or studio might have, it’s hard to predict what’s going to resonate with an audience because what always resonates is what is fresh and authentic, which are the hardest qualities to judge in any content and eludes simple formulas. Movie remakes sometimes work, but more often they fail to capture something that audiences responded to in the original: a fresh concept, great performances by engaged actors, an inspired director, and a great script. Just reproducing the elements in a previous success doesn’t guarantee success. The experience in the new version has to capture the magic in the original that appealed to the audience.
The Dissatisfaction With So Much Content
A lot of content just dangles an attractive hook to entice content consumers to click, and that’s all it does. Anyone can post a cute animal video or a suggestive or revealing photo, but it doesn’t do anything to help your audience understand who you are or help solve their problems.
Unfortunately for media consumers, clickbait works in simply getting users to click, which is the reason it hasn’t disappeared. As long as people click on the enticing image, celebrity reference, or promised secret revelation, we’ll have to suffer with clickbait. Even worse, clickbait is often used to tip the scales of value from the reader, where it belongs, to the publisher. Many viral tests, quizzes and celebrity slideshows plant advertising cookies that benefit the publisher by increasing the cost and perceived value of advertising on their site, leaving the consumer feeling that they’ve been used, which of course is exactly what has happened.
Another, and I think more important reason that content marketing isn’t succeeding for many is not that it’s not interesting or even useful, but that the content isn’t connected in a meaningful way with the content publisher. Just posting memes, how-tos, thought pieces, and stories unrelated to who you are as a business, or not reflecting who your employees are and the values you hold as a company, doesn’t do anything to connect your visitors to you. Empty content is like empty calories in junk food; it doesn’t nourish and strengthen the relationship you should be building with your audience.
Is SEO the Enemy?
SEO is not the enemy, but focusing on only some superficial SEO tactics above other approaches is not going to create a long term bond with your visitors. Keyword stuffing and optimization can damage the user experience if the user feels manipulated. Google might still bring people to your content as a result of these techniques, but it’s a hollow relationship that has no staying power. When you create quality content that your audience will like and will recommend to others, you produce backlinks and social signals what will improve your search rankings, which is the way to win in SEO.
Despite all the supposed secret formulas and tricks to get high search engine ranking, the real secret is that Google loves quality content and will reward it, so that’s the smart SEO strategy to follow.
What is Good Content Marketing?
Similar to coming up with an idea for the next movie blockbuster to get people into theaters, content marketing is about creating good and useful content that entertains, educates, creates interest, or is useful in some way. It works best when it is the kind of content that people want to share with others. The viral effect will compound the audience you earn. That’s why content marketing has really taken off in the age of social media. Word-of-mouth and good write-ups have always propelled good content, but they are nothing compared to the effect viral online sharing can have on a good blog post, video, photograph, meme or other content.
How do you create this great content? We’re going to cover three steps that will take you from ho-hum content marketing to good and possibly even great content marketing. If you follow these three steps, you’ll be ahead of 90 percent of the businesses out there that are trying to crack the how-to of content marketing.
First — Start with Why You Do What You Do
Simon Sinek in his book, Start with Why, and in his presentations, especially his TED Talk, How Great Leaders Inspire Action, argues that people don’t base their purchasing decisions primarily on what a company does, but on why they do it. This might be hard to envision for some products, like toothpaste or laundry detergent, but I think it does apply to every purchase we make, even if in some cases it’s to a small degree. For some things it’s much more apparent. People identify with iOS or Android, Ford or Chevy, Ducati or Suzuki, based on much more than practical considerations of price, effectiveness, and other qualities. People want to use products and services that bolster their image of who they are, or who they want to be. Some companies are great at using this desire (Apple, BMW, Nike, Sephora, Ikea, Whole Foods, REI) and have a distinct identity that is the foundation for every message they put out. ￼
To communicate the why of your products and services, you can’t just put out generic content that works for anyone. You have to produce content that shows specifically who you are. The best content marketing is cultural. The content you deliver tells your audience what kind of company you are, what your values are, who are the people in the company, and why they work there and do the things they do. That means you must be authentic and transparent. That takes courage, and isn’t easy, which is why so few companies are good at it. It takes vision, leadership, and a constant reminder from company leaders of what you’re doing and why it matters.
Unfortunately, this is hard to maintain as companies grow. The organizations that have grown dramatically and yet successfully maintained the core company values have either had a charismatic leader who represented and reiterated the company’s values at every opportunity (Apple), or have built them into every communication, event, and presentation by the company, no matter who is delivering them (Salesforce).
If your company isn’t good at this, don’t despair. These skills can be learned, so if your company needs to get better at understanding and communicating the why of who they are, there’s still hope that with some effort, it can still happen.
Second — Put Yourself in Your Customers’ Shoes
You not only need to understand yourself and your company and present yourself authentically, you have to really understand your customer — really, really understand your customer. That takes time, research, and empathy to walk a mile in their shoes. You need to visit your customers, spend a day fielding support calls or working customer service, go places, do things, and ask questions that you’ve never asked. Are they well off with cash to burn, or do they count every penny? Do they live for themselves, their parents, their children, their community, their church, their livelihood? How could your company help them solve their problems or make their lives better?
The best marketers have imagination and empathy. They, like novelists, playwrights, and poets, are able to imagine what it would be like to live like someone else. Some marketing organizations formalize this function by having one person who is assigned to represent the customers and always advocate for their interests. This can help prevent falling into the mindset of thinking of the customer only as a source of revenue or problems that have to be solved.
One common marketing technique is to create a persona or personas that represent your ideal customer(s). What is their age, sex, occupation? What are their interests, fears, etc.? This can help make sure that the customer is never just an unknown face or potential revenue source, but instead is a real person whom you need to be close to and understand as deeply as possible.
Once you’ve made the commitment to understand your customers, you’re ready to help solve their problems.
Third — Focus on Solving Your Customers’ Problems
Once you have your authentic voice down and you really know who your customer is and how they think, the third thing you need to do is focus on providing useful content. Useful content for your customers is content that solves a real problem they have. What’s causing them pain or what’s impeding them doing what they need or want to do? The customer may or may not know they have this pain. You might be creating a new need or desire for them by telling a story about how their life will be if they only had this thing, service, or experience. Help them dream of being on a riverboat in Europe, enjoying the pool in their backyard on a summer’s day, or showing off a new mobile phone to their friends at work.
By speaking to the needs of your customers, you’re helping them solve problems, but also forging a bond of trust and usefulness that will go forward in your relationship with them.
Mastering Blogging for Content Marketing
There are many ways to create and deliver content that is authentic and serves a need. Podcasts, Vlogs, events, publications, words, pictures, music, and videos all can be effective delivery vehicles for quality content. Let’s focus on one vehicle that can return exceptional results when done right, and that is blogging, which has worked well for Backblaze.
Backblaze billboard on Highway 101 in Silicon Valley
Backblaze decided early on that it would be as transparent as possible in its business practices. That meant that if there were no good reason not to release information, the company should release it, and the blog became the place where the company made that information public. Backblaze’s CEO Gleb Budman wrote about this commitment to transparency, and the results from it, in a blog post in 2017, The Decision on Transparency. An early example of this transparency is a 2010 post in which Backblaze analyzed why a proposed acquisition of the company failed, Backblaze online backup almost acquired — Breaking down the breakup. Companies rarely write about acquisitions that fall through.
Backblaze’s blog really took off in 2015 when the company decided to publish the statistics it had collected on the failure rate of hard drives in its data centers, Reliability Data Set For 41,000 Hard Drives Now Open Source. While many cloud companies routinely collected this kind of data, including Amazon, Google, and Microsoft, none had ever made it public. It turned out that readers were tremendously hungry for data on how hard drives performed, and Backblaze’s blog readership subsequently increased by hundreds of thousands of readers. Readers analyzed the drive failure data and debated which drives were the best for their own purposes. This was despite Backblaze’s disclaimer that how Backblaze used hard drives in its data centers didn’t really reflect how the drives would perform in other applications, including homes and businesses. Customers didn’t care. They were starved for the information and waited anxiously for the release of each new Drive Stats post.
It Turns Out That Blogging with Authenticity and Transparency is Rewarded
As Gilmore and Pine wrote in their book, Authenticity, “People increasingly see the world in terms of real and fake, and want to buy something real from someone genuine, not a fake from some phony.” How do you convince your customers that you’re real and genuine? The simple answer is to communicate honestly about who you are, which means sometimes telling them about your failures and mistakes and owning up to less than stellar performances by yourself or your company. Consider lifting the veil occasionally to reveal who you are. If you put the customer first, that shouldn’t be too hard even when you fall short. If your intentions are good, being transparent will almost always be rewarded with forgiveness and greater trust and loyalty from your customers.
Many companies created blogs thinking they had to because everyone else was and they started posting articles by their executives and product marketers going on about how great their products were. Then they were surprised when they got little traffic. These people didn’t get the message about how content should be used to help customers with their problems and build a relationship with them through authenticity and transparency.
If you have a blog, you could use that as a place to write about how you do business, the lessons you’ve learned, and yes, even the mistakes you’ve made. Don’t assume that all your company information needs to be protected. If at all possible, write about the tough issues and why you made the decisions you did. Your customers will respond because they don’t see that kind of frankness elsewhere and because they appreciate understanding the kind of company they’re paying for the product or service.
Your Blog Isn’t One Audience of Thousands or Millions, But Many Audiences of One
Don’t be afraid to write to a specific audience or group on your blog. You might have multiple audiences, but you might have specialized ones, as well. When you’re writing to an audience with specialized vocabulary or acronyms, don’t be afraid to use them. Other readers will recognize that the post is not for them and skip over it, or they’ll use it as an entry to a new area that interests them. If you try to make all your posts suitable for a homogeneous reader, you’ll end up with many readers leaving because you’re not speaking directly to them using their language.
If the piece is aimed at a novice or general audience, definitely take the time to explain unfamiliar concepts and spell out abbreviations and acronyms that might not be familiar to the reader. However, if the piece is aimed at a professional audience, you should avoid doing that because the reader might think that the post isn’t aimed at professionals and they could dismiss the post and the blog thinking it’s not suitable for them.
Strive to match the content, vocabulary, graphics, technical argot, and level of reading to the intended market segment. The goal is to make each reader feel that the piece was written specifically for him or her.
Taking Just OK Content Marketing and Making It Great
Authenticity, honesty, frankness, and sincerity are all qualities that to some degree or other are present in the best content. Unfortunately, marketers have the reputation for producing content that’s at the opposite end of the spectrum. Comedian George Burns could have been parodying a modern marketing course when he wrote, “To be a fine actor, you’ve got to be honest. And if you can fake that, you’ve got it made.”
There’s a reason that the recommendation to be authentic sounds like advice you’d get from your mom or dad about how to behave on your first date. We all learn sooner or later that if someone doesn’t like you for who you are, there’s no amount of faking being someone else that will make them like you for more than just a little while.
Be yourself, say something that really means something to you, and tell a story that connects with your audience and gives them some value. Those intangibles are hard to measure in metrics, but, when done well, might earn you an honest response, some respect, and perhaps a repeat visit.
We’re very sorry to interrupt your time enjoying the beach, pool, and other fun outdoor and urban places.
We’ve got some important advice you need to hear so that you can be responsible students when you go back to school this fall.
Now that all the students have stopped listening and likely it’s just us now, I’d like to address the parents of students who are starting or about to return to school in the fall.
You’re likely spending a large amount of money on your children’s education. That money is well spent as it will help your child succeed and be good adults and citizens in the future. We’d like to help by highlighting something you can do to protect your investment, and that is to ensure the safety of your students’ data.
Our Lives Are Digital Now — Students’ Especially
We don’t have to tell you how everything in our lives has become digital. That’s true as well of schools and universities. Students now take notes, write papers, read, communicate, and record everything on digital devices.
You don’t want data damage or loss to happen to the important school or university files and records your child (and possible future U.S. president) has on his or her digital device.
Think about it.
Has your child ever forgotten a digital device in a vehicle, restaurant, or friend’s house?
We thought so.
How about water damage?
Yes, us too.
Did you ever figure out what that substance was clogging the laptop keyboard?
We’ve learned that parenting is full of unanswered questions, as well.
Maybe your student is ahead of the game and already has a plan for backing up their data while at school. That’s great, and a good sign that your student will succeed in life and maybe even solve some of the many challenges we’re leaving to their generation.
Parents Can Help
If not, you can be an exceptional parent by giving your student the gift of automatic and unlimited backup. Before they start school, you can install Backblaze Computer Backup on their Windows or Mac computer. It takes just a couple of minutes. Once that’s done, every time they’re connected to the internet all their important data will be automatically backed up to the cloud.
If anything happens to the computer, that file is safe and ready to be restored. It also could prevent that late night frantic call asking you to somehow magically find their lost data. Who needs that?
Let’s Hear From the Students Themselves
You don’t have to take our word for it. We asked two bona fide high school students who interned at Backblaze this summer for the advice they’d give to their fellow students.
My friends do not normally back up their data other than a few of them putting their important school work on Microsoft’s OneDrive.
With college essays, applications, important school projects and documents, there is little I am willing to lose.
I will be backing up my data when I get home for sure. Next year I will ensure that all of my data is backed up in two places.
After spending a week at Backblaze, I realized how important it is to keep your data safe.
Always save multiple copies of your data. Accidents happen and data gets lost, but it is much easier to recover if there is another copy saved somewhere reliable. Backblaze helps with this by keeping a regularly updated copy of your files in one of their secure data centers.
When backing up data, use programs that make sense and are easy to follow. Stress runs high when files are lost. Having a program like Backblaze that is simple and has live support certainly makes the recovery process more enjoyable.
Relax! The pressures of performing well at school are high. Knowing your files are safe and secure can take a little bit of the weight off your shoulders during such a stressful time.
I definitely plan on using Backblaze in the future and I think all students should.
We couldn’t have said it better. Having a solid backup plan is a great idea for both parents and students. We suggest using Backblaze Personal Backup, but the important thing is to have a backup plan for your data and act on it no matter what solution you’re using.
Learning to Back Up is a Good Life Lesson
Students have a lot to think about these days, and with all the responsibilities and new challenges they’re going to face in school, it’s easy for them to forget some of the basics. We hope this light reminder will be just enough to set them on the right backup track.
Have a great school year everyone!
P.S. If you know a student or the parent of a student going to school in the fall, why not share this post with them? You can use the Email or other sharing buttons to the left or at the bottom of this post.
There are many uses for the cloud, and many services that provide storage drives, sync, backup, and sharing. It’s hard for computer users to know which service is best for which use.
Every spring for the past twelve years we’ve commissioned an online survey conducted by The Harris Poll to help us understand if and how computer users are backing up. We’ve asked the same question, “How often do you backup all the data on your computer?” every year. We just published the results of the latest poll, which showed that more surveyed computer owners are backing up in 2019 than when we conducted our first poll in 2008. We’re heartened that more people are protecting their valuable files, photos, financial records, and personal documents.
This year we decided to ask a second question that would help us understand how the cloud compares to other backup destinations, such as external drives and NAS, and which cloud services are being used for backing up.
This was the question we asked:
What is the primary method you use to backup all of the data on your computer?
1 Cloud backup (e.g., Backblaze, Carbonite, iDrive) 2 Cloud drive (e.g., Google Drive, Microsoft OneDrive) 3 Cloud sync (e.g., Dropbox, iCloud) 4 External hard drive (e.g., Time Machine, Windows Backup and Restore) 5 Network Attached Storage (NAS) (e.g., QNAP, Synology) 6 Other 7 Not sure
Where Computer Users are Backing Up
More than half of those who have ever backed up all the data on their computer (58 percent) indicated that they are using the cloud as one of the primary methods to back up all of the data on their computer. Nearly two in five (38 percent) use an external hard drive, and just 5 percent use network attached storage (NAS). (The total is greater than 100 percent because respondents were able to select multiple destinations.)
Backup Destinations (Among Those Who Have Ever Backed Up All Data on Their Computer)
What Type of Cloud is Being Used?
The survey results tell us that the cloud has become a popular destination for backing up data. Among those who have ever backed up all data on their computer, the following indicated what type of cloud service they used:
38 percent are using cloud drive (such as Google Drive or Microsoft OneDrive)
21 percent are using cloud sync (such as Dropbox or Apple iCloud)
11 percent are using cloud backup (such as Backblaze Computer Backup)
Cloud Destinations (Among Those Who Have Ever Backed Up All Data on Their Computer)
Choosing the Best Cloud for Backups
Backblaze customers or regular readers of this blog will immediately recognize the issue in these responses. There’s a big difference in what type of cloud service you select for cloud backup. Both cloud drive and cloud sync services can store data in the cloud, but they’re not the same as having a real backup. We’ve written about these differences in our blog post, What’s the Diff: Sync vs Backup vs Storage, and in our guide, Online Storage vs. Online Backup.
Put simply, those who use cloud drive or cloud sync are missing the benefits of real cloud backup. These benefits can include automatic backup of all data on your computer, not being limited to just special folders or directories that can be backed up, going back to earlier versions of files, and not having files lost when syncing, such as when a shared folder gets deleted by someone else.
Cloud backup is specifically designed to protect your files, while the purpose of cloud drives and sync is to make it easy to access your files from different computers and share them when desired. While there is overlap in what these services offer and how they can be used, obtaining the best results requires selecting the right cloud service for your needs. If your goal is to back up your files, you want the service to seamlessly protect your files and make sure they’re available when and if you need to restore them due to data loss on your computer.
As users have more time and experience with their selected cloud service(s), it will be interesting in future polls to discover how happy they are with the various services and how well their needs are being met. We plan to cover this topic in our future polls.
• • •
Survey Method This survey was conducted online within the United States by The Harris Poll on behalf of Backblaze from June 6-10, 2019 among 2,010 U.S. adults ages 18 and older, among whom 1,858 own a computer and 1,484 have ever backed up all data on their computer. This online survey is not based on a probability sample and therefore no estimate of theoretical sampling error can be calculated. For complete survey methodology, including weighting variables and subgroup sample sizes, please contact Backblaze.
Ah, the iconic 3.5″ hard drive, now approaching a massive 16TB of storage capacity. Backblaze storage pods fit 60 of these drives in a single pod, and with well over 750 petabytes of customer data under management in our data centers, we have a lot of hard drives under management.
Yet most of us have just one, or only a few of these massive drives at a time storing our most valuable data. Just how safe are those hard drives in your office or studio? Have you ever thought about all the awful, terrible things that can happen to a hard drive? And what are they, exactly?
It turns out there are a host of obvious physical dangers, but also other, less obvious, errors that can affect the data stored on your hard drives, as well.
Dividing by One
It’s tempting to store all of your content on a single hard drive. After all, the capacity of these drives gets larger and larger, and they offer great performance of up to 150 MB/s. It’s true that flash-based hard drives are far faster, but the dollars per gigabyte price is also higher, so for now, the traditional 3.5″ hard drive holds most data today.
However, having all of your precious content on a single, spinning hard drive is a true tightrope without a net experience. Here’s why.
Drivesaver Failure Analysis by the Numbers
I asked our friends at Drivesavers, specialists in recovering data from drives and other storage devices, for some analysis of the hard drives brought into their labs for recovery. What were the primary causes of failure?
Reason One: Media Damage
The number one reason, accounting for 70 percent of failures, is media damage, including full head crashes.
Modern hard drives stuff multiple, ultra thin platters inside that 3.5 inch metal package. These platters spin furiously at 5400 or 7200 revolutions per minute — that’s 90 or 120 revolutions per second! The heads that read and write magnetic data on them sweep back and forth only 6.3 micrometers above the surface of those platters. That gap is about 1/12th the width of a human hair and a miracle of modern technology to be sure. As you can imagine, a system with such close tolerances is vulnerable to sudden shock, as evidenced by Drivesavers’ results.
This damage occurs when the platters receive shock, i.e. physical damage from impact to the drive itself. Platters have been known to shatter, or have damage to their surfaces, including a phenomenon called head crash, where the flying heads slam into the surface of the platters. Whatever the cause, the thin platters holding 1s and 0s can’t be read.
It takes a surprisingly small amount of force to generate a lot of shock energy to a hard drive. I’ve seen drives fail after simply tipping over when stood on end. More typically, drives are accidentally pushed off of a desktop, or dropped while being carried around.
A drive might look fine after a drop, but the damage may have been done. Due to their rigid construction, heavy weight, and how often they’re dropped on hard, unforgiving surfaces, these drops can easily generate the equivalent of hundreds of g-forces to the delicate internals of a hard drive.
To paraphrase an old (and morbid) parachutist joke, it’s not the fall that gets you, it’s the sudden stop!
Reason Two: PCB Failure
The next largest cause is circuit board failure, accounting for 18 percent of failed drives. Printed circuit boards (PCBs), those tiny green boards seen on the underside of hard drives, can fail in the presence of moisture or static electric discharge like any other circuit board.
Reason Three: Stiction
Next up is stiction (a portmanteau of friction and sticking), which occurs when the armatures that drive those flying heads actually get stuck in place and refuse to operate, usually after a long period of disuse. Drivesavers found that stuck armatures accounted for 11 percent of hard drive failures.
It seems counterintuitive that hard drives sitting quietly in a dark drawer might actually contribute to its failure, but I’ve seen many older hard drives pulled from a drawer and popped into a drive carrier or connected to power just go thunk. It does appear that hard drives like to be connected to power and constantly spinning and the numbers seem to bear this out.
Reason Four: Motor Failure
The last, and least common cause of hard drive failure, is hard drive motor failure, accounting for only 1 percent of failures, testament again to modern manufacturing precision and reliability.
Mitigating Hard Drive Failure Risk
So now that you’ve seen the gory numbers, here are a few recommendations to guard against the physical causes of hard drive failure.
1. Have a physical drive handling plan and follow it rigorously
If you must keep content on single hard drives in your location, make sure your team follows a few guidelines to protect against moisture, static electricity, and drops during drive handling. Keeping the drives in a dry location, storing the drives in static bags, using static discharge mats and wristbands, and putting rubber mats under areas where you’re likely to accidentally drop drives can all help.
It’s worth reviewing how you physically store drives, as well. Drivesavers tells us that the sudden impact of a heavy drawer of hard drives slamming home or yanked open quickly might possibly damage hard drives!
2. Spread failure risk across more drives and systems
Improving physical hard drive handling procedures is only a small part of a good risk-reducing strategy. You can immediately reduce the exposure of a single hard drive failure by simply keeping a copy of that valuable content on another drive.This is a common approach for videographers moving content from cameras shooting in the field back to their editing environment. By simply copying content over from one fast drive to another, the odds of both drives failing at once are less likely. This is certainly better than keeping content on only a single drive, but definitely not a great long-term solution.
Multiple drive NAS and RAID systems reduce the impact of failing drives even further. A RAID 6 system composed of eight drives not only has much faster read and write performance than a single drive, but two of its drives can fail and still serve your files, giving you time to replace those failed drives.
Mitigating Data Corruption Risk
The Risk of Bit Flips
Beyond physical damage, there’s another threat to the files stored on hard disks: small, silent bit flip errors often called data corruption or bit rot.
Bit rot errors occur when individual bits in a stream of data in files change from one state to another (positive or negative, 0 to 1, and vice versa). These errors can happen to hard drive and flash storage systems at rest, or be introduced as a file is copied from one hard drive to another.
While hard drives automatically correct single-bit flips on the fly, larger bit flips can introduce a number of errors. This can either cause the program accessing them to halt or throw an error, or perhaps worse, lead you to think that the file with the errors is fine!
Flash drives are not immune either. Bianca Shroeder recently published a similar study of flash drives, Flash Reliability in Production: The Expected and the Unexpected, and found that “…between 20-63% of drives experienced at least one of the (unrecoverable read errors) during the time it was in production. In addition, between 2-6 out of 1,000 drive days were affected.”
“These UREs are almost exclusively due to bit corruptions that ECC cannot correct. If a drive encounters a URE, the stored data cannot be read. This either results in a failed read in the user’s code, or if the drives are in a RAID group that has replication, then the data is read from a different drive.”
Exactly how prevalent bit flips are is a controversial subject, but if you’ve ever retrieved a file from an old hard drive or RAID system and see sparkles in video, corrupt document files, or lines or distortions in pictures, you’ve seen the results of these errors.
Protecting Against Bit Flip Errors
There are many approaches to catching and correcting bit flip errors. From a system designer standpoint they usually involve some combination of multiple disk storage systems, multiple copies of content, data integrity checks and corrections, including error-correcting code memory, physical component redundancy, and a file system that can tie it all together.
Backblaze has built such a system, and uses a number of techniques to detect and correct file degradation due to bit flips and deliver extremely high data durability and integrity, often in conjunction with Reed-Solomon erasure codes.
Thanks to the way object storage and Backblaze B2 works, files written to B2 are always retrieved exactly as you originally wrote them. If a file ever changes from the time you’ve written it, say, due to bit flip errors, it will either be reproduced from a redundant copy of your file, or even mathematically reconstructed with erasure codes.
So the simplest, and certainly least expensive way to get bit flip protection for the content sitting on your hard drives is to simply have another copy on cloud storage.
With some thought, you can apply these protection steps to your environment and get the best of both worlds: the performance of your content on fast, local hard drives, and the protection of having a copy on object storage offsite with the ultimate data integrity.
12 Power Tips for Business Users of Backblaze Business Backup and B2
1 Manage All Users of Backblaze Business Backup or B2
Backblaze Groups can be used for both Backblaze Business Backup and B2 to manage accounts and users. See the status of all accounts and produce reports using the admin console.
2 Restore For Free via Web or USB
Admins can restore data from endpoints using the web-based admin console. USB drives can be shipped worldwide to facilitate the management of a remote workforce.
3 Back Up Your VMs
Backblaze Business Backup can handle virtual machines, such as those created by Parallels, VMware Fusion, and VirtualBox; and B2 integrates with StarWind, OpenDedupe, and CloudBerry to back up enterprise-level VMs.
4 Mass Deploy Backblaze Remotely to Many Computers
Companies, organizations, schools, non-profits, and others can use the Backblaze Business Backup MSI installer, Jamf, Munki, and other tools to deploy Backblaze computer backup remotely across all their computers without any end-user interaction.
5 Save Money with Free Data Exchange with B2’s Compute Partners
Spin up compute applications with high speed and no egress charges using our partners Packet and Server Central.
6 Speed up Access to Your Content With Free Egress to Cloudflare
You can use Backblaze’s Fireball hard disk array to load large volumes of data without saturating your network. We ship a Fireball to you and once you load your data onto it, you ship it back to us and we load it directly into your B2 account.
8 Use Single Sign-On (SSO) and Two Factor Verification for Enhanced Security
Single sign-on (Google and Microsoft) improves security and speeds signing into your Backblaze account for authorized users. With Backblaze Business Backup, all data is automatically encrypted client-side prior to upload, protected during transfer, and stored encrypted in our secure data centers. Adding Two Factor Verification augments account safety with another layer of security.
9 Get Quick Answers to Your Backing Up Questions
Refer to an extensive library of FAQs, how-tos, and help articles for Business Backup and B2 in our online help library.
10 Application Keys Enable Controlled Sharing of Data for Users and Apps
11 Manage Your Server Backups with CloudBerry MBS and B2
Automate and centrally manage server backups using CloudBerry Managed Backup Service (MBS) and B2. It’s easy to set up and once configured, you have a true set-it-and-forget-it backup solution in place.
12 Protect your NAS Data Using Built-in Sync Applications and B2
Does this sound familiar? An employee walks over with panic and confusion written all over their face. They approach holding their laptop and say that they’re not sure what happened. You open their computer to find that there is a single message displayed:
You want your files? Your computer has been infected with ransomware and you will need to pay us to get them back.
They may not know what just happened, but the sinking feeling in your stomach has a name you know well. Your company has been hit with ransomware, which is, unfortunately, a growing trend. The business of ransomware is a booming one, bringing productivity and growth to a dead stop.
As ransomware attacks increase on businesses of all sizes, ransomware may prove to be the single biggest destructive force for business data, surpassing even hard drive failures as the leader of data loss.
When Ransomware Strikes
It’s a situation that most IT Managers will face at some point in their career. Per Security Magazine, “Eighty-six percent Small to Medium Business (SMB) clients were recently victimized by ransomware.” In fact, it happened to us at Backblaze. Cybersecurity company Ice Cybersecurity published that ransomware attacks occur every 40 seconds (that’s over 2,000 times per day!). Coveware’s Ransomware Marketplace Report says that the average ransom cost has increased by 89% to $12,762, as compared to $6,733 in Q4 of 2018. The downtime resulting from ransomware is also on the rise. The average number of days a ransomware incident lasts amounts to just over a week at 7.3 days, which should be factored in when calculating the true cost of ransomware. The estimated downtime costs per ransomware attack per company averaged $65,645. The increasing financial impact on businesses of all sizes has proven that the business of ransomware is booming, with no signs of slowing down.
How Has Ransomware Grown So Quickly?
Ransomware has taken advantage of multiple developments in technology, similar to other high-growth industries. The first attacks occurred in 1989 with floppy desks distributed across organizations, purporting to raise money to fund AIDS research. At the time, the users were asked to pay $189 to get their files back.
Since then, ransomware has grown significantly due to the advent of multiple facilitators. Sophisticated RSA encryption with increasing key sizes make encrypted files more difficult to decrypt. Per the Carbon Black report, ransomware kits are now relatively easy to access on the dark web and only cost $10, on average. With cryptocurrency in place, payment is both virtually untraceable and irreversible. As recovery becomes more difficult, the cost to business rises alongside it. Per the Atlantic, ransomware now costs businesses more than $75 billion per year.
If Your Job is Protecting Company Data, What Happens After Your Ransomware Attack?
Isolate, Assess, Restore
Your first thought will probably be that you need to isolate any infected computers and get them off the network. Next, you may begin to assess the damage by determining the origins of the infected file and locating others that were affected. You can check our guide for recovering from ransomware or call in a specialized team to assist you. Once you prevent the malware from spreading, your thoughts will surely turn to the backup strategy you have in place. If you have used either a backup or sync solution to get your data offsite, you are more prepared than most. Unfortunately, even for this Eagle Scout level of preparedness, too often the backup solution hasn’t been tested against the exact scenario it’s needed for.
Both backup and sync solutions help get your data offsite. However, sync solutions vary greatly in their process for backup. Some require saving data to a specific folder. Others provide versions of files. Most offer varying pricing tiers for storage space. Backup solutions also have a multitude of features, some of which prove vital at the time of restore.
If you are in IT, you are constantly looking for points of failure. When it comes time to restore your data after a ransomware attack, three weak points immediately come to mind:
1. Your Security Breach Has Affected Your Backups
Redundancy is key in workflows. However, if you are syncing your data and get hit with ransomware on your local machine, your newly infected files will automatically sync to the cloud and thereby, infect your backup set.
This can be mitigated with backup software that offers multiple versions of your files. Backup software, such as Backblaze Business Backup, saves your original file as is and creates a new backup file with every change made. If you accidentally delete a file or if your files are encrypted by ransomware and you are backed up with Backblaze Business Backup, you can simply restore a prior version of a file — one that has not been encrypted by the ransomware. The capability of your backup software to restore a prior version is the difference between usable and unusable data.
2. Restoring Data will be Cumbersome and Time-Consuming
Depending on the size of your dataset, restoring from the cloud can be a drawn out process. Moreover, for those that need to restore gigabytes of data, the restore process may not only prove to be lengthy, but also tedious.
Snapshots allow you to restore all of your data from a specific point in time. When dealing with ransomware, this capability is crucial. Without this functionality, each file needs to be rolled back individually to a prior version and downloaded one at a time. At Backblaze, you can easily create a snapshot of your data and archive those snapshots into cloud storage to give you the appropriate amount of time to recover.
You can download the files that your employees need immediately and request the rest of their data to be shipped to you overnight on a USB drive. You can then either keep the drive or send it back for a full refund.
3. All Critical Data Didn’t Get Backed Up
Unfortunately, human error is the second leading cause of data loss. As humans, we all make mistakes and some of those may have a large impact on company data. Although there is no way to prevent employees from spilling drinks on computers or leaving laptops on planes, others are easier to avoid. Some solutions require users to save their data to a specific folder to enable backups. When thinking about the files on your average employees’ desktops, are there any that may prove critical to your business? If so, they need to be backed up. Relying on those employees to change their work habits and begin saving files to specific, backed-up locations is certainly not the easiest nor reliable method of data protection.
In fact, it is the responsibility of the backup solution to protect business data, regardless of where the end user saves it. To that end, Backblaze backs up all user-generated data by default. The most effective backup solutions are ones that are easiest for the end users and require the least amount of user intervention.
Are you interested in assessing the risk to your business? Would you like to learn how to protect your business from ransomware? To better understand innovative ways that you can protect business data, we invite you to attend our Ransomware: Prevention and Survival webinar on July 17th. Join Steven Rahseparian, Chief Technical Officer at Ice CyberSecurity and industry expert on cybersecurity, to hear stories of ransomware and to learn how to take a proactive approach to protect your business data.
A lot has changed in the four years since Brian Beach wrote a post announcing Backblaze Vaults, our software architecture for cloud data storage. Just looking at how the major statistics have changed, we now have over 100,000 hard drives in our data centers instead of the 41,000 mentioned in the post video. We have three data centers (soon four) instead of one data center. We’re approaching one exabyte of data stored for our customers (almost seven times the 150 petabytes back then), and we’ve recovered over 41 billion files for our customers, up from the 10 billion in the 2015 post.
In the original post, we discussed having durability of seven nines. Shortly thereafter, it was upped to eight nines. In July of 2018, we took a deep dive into the calculation and found our durability closer to eleven nines (and went into detail on the calculations used to arrive at that number). And, as followers of our Hard Drive Stats reports will be interested in knowing, we’ve just started using our first 16 TB drives, which are twice the size of the biggest drives we used back at the time of this post — then a whopping eight TB.
We’ve updated the details here and there in the text from the original post that was published on our blog on March 11, 2015. We’ve left the original 135 comments intact, although some of them might be non sequiturs after the changes to the post. We trust that you will be able to sort out the old from the new and make sense of what’s changed. If not, please add a comment and we’ll be happy to address your questions.
Storage Vaults form the core of Backblaze’s cloud services. Backblaze Vaults are not only incredibly durable, scalable, and performant, but they dramatically improve availability and operability, while still being incredibly cost-efficient at storing data. Back in 2009, we shared the design of the original Storage Pod hardware we developed; here we’ll share the architecture and approach of the cloud storage software that makes up a Backblaze Vault.
Backblaze Vault Architecture for Cloud Storage
The Vault design follows the overriding design principle that Backblaze has always followed: keep it simple. As with the Storage Pods themselves, the new Vault storage software relies on tried and true technologies used in a straightforward way to build a simple, reliable, and inexpensive system.
A Backblaze Vault is the combination of the Backblaze Vault cloud storage software and the Backblaze Storage Pod hardware.
Putting The Intelligence in the Software
Another design principle for Backblaze is to anticipate that all hardware will fail and build intelligence into our cloud storage management software so that customer data is protected from hardware failure. The original Storage Pod systems provided good protection for data and Vaults continue that tradition while adding another layer of protection. In addition to leveraging our low-cost Storage Pods, Vaults take advantage of the cost advantage of consumer-grade hard drives and cleanly handle their common failure modes.
Distributing Data Across 20 Storage Pods
A Backblaze Vault is comprised of 20 Storage Pods, with the data evenly spread across all 20 pods. Each Storage Pod in a given vault has the same number of drives, and the drives are all the same size.
Drives in the same drive position in each of the 20 Storage Pods are grouped together into a storage unit we call a tome. Each file is stored in one tome and is spread out across the tome for reliability and availability.
Every file uploaded to a Vault is divided into pieces before being stored. Each of those pieces is called a shard. Parity shards are computed to add redundancy, so that a file can be fetched from a vault even if some of the pieces are not available.
Each file is stored as 20 shards: 17 data shards and three parity shards. Because those shards are distributed across 20 Storage Pods, the Vault is resilient to the failure of a Storage Pod.
Files can be written to the Vault when one pod is down and still have two parity shards to protect the data. Even in the extreme and unlikely case where three Storage Pods in a Vault lose power, the files in the vault are still available because they can be reconstructed from any of the 17 pods that are available.
Each of the drives in a Vault has a standard Linux file system, ext4, on it. This is where the shards are stored. There are fancier file systems out there, but we don’t need them for Vaults. All that is needed is a way to write files to disk and read them back. Ext4 is good at handling power failure on a single drive cleanly without losing any files. It’s also good at storing lots of files on a single drive and providing efficient access to them.
Compared to a conventional RAID, we have swapped the layers here by putting the file systems under the replication. Usually, RAID puts the file system on top of the replication, which means that a file system corruption can lose data. With the file system below the replication, a Vault can recover from a file system corruption because a single corrupt file system can lose at most one shard of each file.
Creating Flexible and Optimized Reed-Solomon Erasure Coding
Just like RAID implementations, the Vault software uses Reed-Solomon erasure coding to create the parity shards. But, unlike Linux software RAID, which offers just one or two parity blocks, our Vault software allows for an arbitrary mix of data and parity. We are currently using 17 data shards plus three parity shards, but this could be changed on new vaults in the future with a simple configuration update.
The beauty of Reed-Solomon is that we can then re-create the original file from any 17 of the shards. If one of the original data shards is unavailable, it can be re-computed from the other 16 original shards, plus one of the parity shards. Even if three of the original data shards are not available, they can be re-created from the other 17 data and parity shards. Matrix algebra is awesome!
Handling Drive Failures
The reason for distributing the data across multiple Storage Pods and using erasure coding to compute parity is to keep the data safe and available. How are different failures handled?
If a disk drive just up and dies, refusing to read or write any data, the Vault will continue to work. Data can be written to the other 19 drives in the tome, because the policy setting allows files to be written as long as there are two parity shards. All of the files that were on the dead drive are still available and can be read from the other 19 drives in the tome.
When a dead drive is replaced, the Vault software will automatically populate the new drive with the shards that should be there; they can be recomputed from the contents of the other 19 drives.
A Vault can lose up to three drives in the same tome at the same moment without losing any data, and the contents of the drives will be re-created when the drives are replaced.
Handling Data Corruption
Disk drives try hard to correctly return the data stored on them, but once in a while they return the wrong data, or are just unable to read a given sector.
Every shard stored in a Vault has a checksum, so that the software can tell if it has been corrupted. When that happens, the bad shard is recomputed from the other shards and then re-written to disk. Similarly, if a shard just can’t be read from a drive, it is recomputed and re-written.
Conventional RAID can reconstruct a drive that dies, but does not deal well with corrupted data because it doesn’t checksum the data.
Each vault is assigned a number. We carefully designed the numbering scheme to allow for a lot of vaults to be deployed, and designed the management software to handle scaling up to that level in the Backblaze data centers.
The overall design scales very well because file uploads (and downloads) go straight to a vault, without having to go through a central point that could become a bottleneck.
There is an authority server that assigns incoming files to specific Vaults. Once that assignment has been made, the client then uploads data directly to the Vault. As the data center scales out and adds more Vaults, the capacity to handle incoming traffic keeps going up. This is horizontal scaling at its best.
We could deploy a new data center with 10,000 Vaults holding 16TB drives and it could accept uploads fast enough to reach its full capacity of 160 exabytes in about two months!
Backblaze Vault Benefits
The Backblaze Vault architecture has six benefits:
1. Extremely Durable
The Vault architecture is designed for 99.999999% (eight nines) annual durability (now 11 nines — Editor). At cloud-scale, you have to assume hard drives die on a regular basis, and we replace about 10 drives every day. We have published a variety of articles sharing our hard drive failure rates.
The beauty with Vaults is that not only does the software protect against hard drive failures, it also protects against the loss of entire Storage Pods or even entire racks. A single Vault can have three Storage Pods — a full 180 hard drives — die at the exact same moment without a single byte of data being lost or even becoming unavailable.
2. Infinitely Scalable
A Backblaze Vault is comprised of 20 Storage Pods, each with 60 disk drives, for a total of 1200 drives. Depending on the size of the hard drive, each vault will hold:
12TB hard drives => 12.1 petabytes/vault (Deploying today.) 14TB hard drives => 14.2 petabytes/vault (Deploying today.) 16TB hard drives => 16.2 petabytes/vault (Small-scale testing.) 18TB hard drives => 18.2 petabytes/vault (Announced by WD & Toshiba) 20TB hard drives => 20.2 petabytes/vault (Announced by Seagate)
At our current growth rate, Backblaze deploys one to three Vaults each month. As the growth rate increases, the deployment rate will also increase. We can incrementally add more storage by adding more and more Vaults. Without changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 90 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 90 zettabytes! (Also knowWithout changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 160 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 160 zettabytes! (Also known as 160,000,000,000,000 GB.)
3. Always Available
Data backups have always been highly available: if a Storage Pod was in maintenance, the Backblaze online backup application would contact another Storage Pod to store data. Previously, however, if a Storage Pod was unavailable, some restores would pause. For large restores this was not an issue since the software would simply skip the Storage Pod that was unavailable, prepare the rest of the restore, and come back later. However, for individual file restores and remote access via the Backblaze iPhone and Android apps, it became increasingly important to have all data be highly available at all times.
The Backblaze Vault architecture enables both data backups and restores to be highly available.
With the Vault arrangement of 17 data shards plus three parity shards for each file, all of the data is available as long as 17 of the 20 Storage Pods in the Vault are available. This keeps the data available while allowing for normal maintenance and rare expected failures.
4. Highly Performant
The original Backblaze Storage Pods could individually accept 950 Mbps (megabits per second) of data for storage.
The new Vault pods have more overhead, because they must break each file into pieces, distribute the pieces across the local network to the other Storage Pods in the vault, and then write them to disk. In spite of this extra overhead, the Vault is able to achieve 1,000 Mbps of data arriving at each of the 20 pods.
This capacity required a new type of Storage Pod that could handle this volume. The net of this: a single Vault can accept a whopping 20 Gbps of data.
Because there is no central bottleneck, adding more Vaults linearly adds more bandwidth.
5. Operationally Easier
When Backblaze launched in 2008 with a single Storage Pod, many of the operational analyses (e.g. how to balance load) could be done on a simple spreadsheet and manual tasks (e.g. swapping a hard drive) could be done by a single person. As Backblaze grew to nearly 1,000 Storage Pods and over 40,000 hard drives, the systems we developed to streamline and operationalize the cloud storage became more and more advanced. However, because our system relied on Linux RAID, there were certain things we simply could not control.
With the new Vault software, we have direct access to all of the drives and can monitor their individual performance and any indications of upcoming failure. And, when those indications say that maintenance is needed, we can shut down one of the pods in the Vault without interrupting any service.
6. Astoundingly Cost Efficient
Even with all of these wonderful benefits that Backblaze Vaults provide, if they raised costs significantly, it would be nearly impossible for us to deploy them since we are committed to keeping our online backup service affordable for completely unlimited data. However, the Vault architecture is nearly cost neutral while providing all these benefits.
When we were running on Linux RAID, we used RAID6 over 15 drives: 13 data drives plus two parity. That’s 15.4% storage overhead for parity.
With Backblaze Vaults, we wanted to be able to do maintenance on one pod in a vault and still have it be fully available, both for reading and writing. And, for safety, we weren’t willing to have fewer than two parity shards for every file uploaded. Using 17 data plus three parity drives raises the storage overhead just a little bit, to 17.6%, but still gives us two parity drives even in the infrequent times when one of the pods is in maintenance. In the normal case when all 20 pods in the Vault are running, we have three parity drives, which adds even more reliability.
Backblaze’s cloud storage Vaults deliver 99.999999% (eight nines) annual durability (now 11 nines — Editor), horizontal scalability, and 20 Gbps of per-Vault performance, while being operationally efficient and extremely cost effective. Driven from the same mindset that we brought to the storage market with Backblaze Storage Pods, Backblaze Vaults continue our singular focus of building the most cost-efficient cloud storage available anywhere.
• • •
Note: This post was updated from the original version posted on March 11, 2015.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.