With the impact of coronavirus on all of our lives, it’s been a struggle to find pieces of good news to share. But we wanted to take a break from the usual programming and share a milestone we’re excited about, one that’s more than 12 years in the making.
Since the beginning of Backblaze—back in 2007, when our five co-founders were working out of Brian Wilson’s apartment in Palo Alto—watching the business grow has always been profoundly exciting.
Our team has grown. From five, way back in the Palo Alto days, to 145 today. Our customer base has grown. Today, we have customers in over 160 countries… it’s not so long ago that we were excited about having our 160thcustomer.
More than anything else, the data we manage for our customers has grown.
By 2014, we were storing 100 petabytes—the equivalent of 11,415 years of HD video.
Years passed, our team grew, the number of customers grew, and—especially after we launched B2 Cloud Storage in 2015—the data grew. At some scale it got harder to contextualize what hundreds and hundreds of petabytes really meant. We like to remember that each byte is part of some individual’s beloved family photos or some organization’s critical data that they’ve entrusted us to protect.
That belief is part of every single Backblaze job description. Here’s how we put it in that context:
Our customers use our services so they can pursue dreams like curing cancer (genome mapping is data-intensive), archive the work of some of the greatest artists on the planet (learn more about how Austin City Limits uses B2), or simply sleep well at night (anyone that’s spilled a cup of coffee on a laptop knows the relief that comes with complete, secure backups).”
It’s critically important for us that we achieved this growth by staying the same in the most important ways: being open & transparent, building a sustainable business, and caring about being good to our customers, partners, community, and team. That’s why I’m excited to announce a huge milestone today—our biggest growth number yet.
We’ve reached 1.
Or, by another measurement, we’ve reached 1,000,000,000,000,000,000.
Yes, today, we’re announcing that we are storing 1 exabyte of customer data.
What does it all mean? Well. If you ask our engineers, not much. They’ve already rocketed past this number mentally and are considering how long it will take to get to a zettabyte (1,000,000,000,000,000,000,000 bytes).
But, while it’s great to keep our eyes on the future, it’s also important to celebrate what milestones mean. Yes, crossing an exabyte of data is another validation of our technology and our sustainably independent business model. But I think it really means that we’re providing value and earning the trust of our customers.
Thank you for putting your trust in us by keeping some of your bytes with us. Particularly in times like these, we know that being able to count on your infrastructure is essential. We’re proud to serve you.
As the world grapples with a pandemic, celebrations seem inappropriate. But we did want to take a moment and share this milestone with you, both for those of you who have been with us over the long haul and in the hopes that it provides a welcome distraction. To that end, we’ve been working on a few things that we’d planned to launch in the coming weeks. We’ve made the decision to push forward with those launches in hopes that the tools may be of some use for you (and, if nothing else, to try to do our part to provide a little entertainment). For today, here’s to our biggest 1 yet. And many more to come.
So now that you know what an exabyte looks like, let’s look at how Backblaze got there.
Way back in 2010, we had 10 petabytes of customer data under management. It was a big deal for us, it took us two years to accomplish and, more importantly, it was a sign that thousands of customers trusted us with their data.
It meant a lot! But when we decided to tell the world about it, we had a hard time quantifying just how big 10 petabytes were, so naturally we made an infographic.
In what felt like the blink of an eye, it was two years later, and we had 75 petabytes of data. The Burj was out. And, because it was 2013, we quantified that amount of data like this…
Pop songs now average around 3:30 in length, which means if you tried to listen to this imaginary musical archive, it would take you 167,000 years. And sadly, the total number of recorded songs is only the tens to hundreds of millions, so you’d have some repeats.
That’s a lot of songs! But more importantly, our data under management had grown by 750%! But we could barely take time to enjoy it because five months later we hit 100 petabytes, and we had to call it out. Stacking up to the Burj Khalifa was in the past! Now, we rivaled Mt. Shasta…
But stacking drives was rapidly becoming less effective as a measurement. Simply put, the comparison was no longer apples to apples: the 3,000 drives we stacked up in 2010 only held one terabyte of data. If you were to take those same 3,000 drives and use the average drive size we had in 2013, about 4 terabytes of data per drive, the size of the stack would stay the same, as hard drives had not physically grown, but the density of the storage inside the drives had grown by 400%.
The thought of migrating petabytes or even terabytes of data from your existing cloud provider to another may seem impossible. Naturally, as data sets grow, they become harder to move, and many times the inertia of that growing data means that it stays put, even in scenarios in which the storage solution is poorly suited to the data’s use case.
What is even worse are scenarios in which major cloud storage providers lock customers into their ecosystems by design, making it difficult to diversify one’s data storage or even for them to detach from a given service completely. These providers are like a digital Hotel California; you can check out anytime you like, but you can never leave, because the egress fees, downtime, and operational overhead will kill you.
For Additional Information: This post is one of a series in lieu of this year’s NAB conference, which was recently cancelled. The content here accompanies a series of webinars outlining cloud-storage based solutions for data-heavy workflows. If you’d like to learn more about seamless data migration using Flexify.IO, please join us for our upcoming webinar about this solution on April 7.
Backblaze and Flexify.IO Offer a New Path for Cloud Migration
Fortunately, there is a way of avoiding these pitfalls and adopting a seamless data migration strategy, thanks to our integration partner, Flexify.IO. They help businesses migrate data between clouds in a reliable, secure, and predictable manner. Oh, and they are blazing fast—but more on that later.
Before we dive into how Flexify.IO works, it would be helpful for you to understand why and when a business may want to consider a cloud migration strategy. If you are like most businesses, you researched the cloud storage space and landed on a provider that served your needs at some point in time. But then, as your business grew and your needs evolved, your storage solution became difficult and expensive to scale with your growing storage needs, and you lost the control and flexibility you once had.
For Aiden Korotkin of AK Productions, it was exactly this operational overhead that prompted him to migrate his data from Google Cloud Platform into Backblaze B2 Cloud Storage. As Korotkin transitioned from freelancing to running a full-service video production company, the cost and complexity of data management began to weigh him down. Simple tasks took several hours, and he could no longer justify the cost of doing business on Google Cloud. He needed to migrate his data into an easy-to-use, cost-effective storage platform, without causing any disruption to business workflows.
Korotkin discovered Backblaze B2 while researching secure cloud storage platforms on a user privacy and security blog. After digging more deeply into Backblaze’s services, he felt that we might be the right solution to his challenge. But the hurdle of transferring all of his data was nearly paralyzing. Ordinarily, moving several terabytes of data into Backblaze B2 via standard business connections would take weeks, months, or longer. But when Korotkin described his misgivings to the Solutions Engineers at Backblaze, they realized AK Productions might benefit from one of our newest partnerships. And sure enough, Flexify.IO accomplished the transfer in just 12 hours, with no downtime.
How Does Flexify.IO Do It?
With Flexify.IO, there is no need for users to manually offload their static and production data from a cloud storage provider and reupload it into Backblaze B2. Flexify.IO reads the data from the source storage (like Amazon S3, for instance) and writes it to the destination storage (Backblaze B2) via inter-cloud bandwidth. Since the data transfer happens within the cloud environment, (see Figure 1) this core framework allows Flexify.IO to achieve fast and secure data migration at cloud-native speeds. This process is considerably quicker than moving data through your local internet bandwidth.
Speed and Data Security
A do-it-yourself (DIY) migration approach may work for you if you are migrating less than one terabyte or one million objects, but it will become more challenging as your data set grows. When you need to migrate millions of objects, fine-tuning a DIY script is time-consuming and can be extremely costly if something goes wrong.
Flexify.IO is designed and tested for reliability, with proper error handling that “retries” in place to ensure a successful migration. What this means is that they periodically check to make sure the data has been moved to the destination it was intended for.
They also check to make sure your data is not corrupted during the process by comparing hashes and checksums at every stage. A checksum is a value used to verify the integrity of a file or a data transfer. It is a sum derived from a file or other data object that a system can “check” by comparing the sum against a record of what the sum should be. Checksums are typically used to compare two sets of data to make sure they are the same.
And with an advanced migration algorithm (see Figure 3) that allows for high speed transfers, Flexify.IO migrates incredibly large amounts of data from Amazon S3, Google Cloud, Azure, or any cloud storage provider into Backblaze B2, remarkably fast.
One of the biggest challenges when migrating data to new platforms is downtime, or being unable to access your data for a certain amount of time. Via traditional migration methods that span days, weeks, or months, lengthy downtimes can put your business at risk. And if data is changed or modified during migration, this may result in data loss or gaps and further delay your migration.
Flexify.IO reduces downtime by implementing an “incremental migration process.” They do several passes to scan your source and destination data, looking for changes between the two. If there are any differences, only those get migrated over. No manual checks or scans are necessary, as they ensure a seamless migration.
Cost and Support
The number one reason businesses stay locked-in to their cloud storage provider is due to the high egress fees required to free their data. Since partnering with Flexify.IO, we have been able to significantly reduce these egress costs for customers who wish to migrate off their existing cloud storage platforms. You pay a flat fee per GB, which includes egress fees, infrastructure costs, setup, and planning.
For an Amazon S3 to Backblaze B2 migration in the US and Canada regions, this reduced fee comes out to $0.04/GB, which is less than half the price you would usually pay to retrieve your data from S3. And at $5/TB of storage, Backblaze B2 is a fraction of the cost of major cloud storage providers like Amazon S3, Google Cloud, or Azure. Customers like AK Productions begin to realize significant cost savings within a few months of switching to a more affordable platform like Backblaze B2.
An Example of ROI: Moving Data from S3 to Backblaze B2 Using Flexify.IO If you currently store 100 terabytes of data in Amazon S3, you are now paying $2,300 per month. Migrating this data to Backblaze B2 will reduce your storage cost to $500 per month. If you attempt this move without Flexify.IO, then your cost will be around $9,000 and will require significant personnel time. Migration expense with Flexify.IO to Backblaze B2 is only $4,000. Taken together, these moves would lead to $1,800 per month in savings after two months! To learn more about what you could save, contact our sales team.
Additionally, Flexify.IO offers a transparent and upfront pricing structure for several cloud-to-cloud as well as on-premises to cloud migration options, which you can find on the pricing table on their website. We recommend using their managed service plan, which gives you access to Flexify.IO’s stellar support team who ensure a seamless end-to-end migration.
“Flexify.IO cared that my data transferred correctly. They had no idea that some files were the very last copies I had, but they treated them that way regardless,” says Korotkin of AK Productions.
Get Started with Flexify.IO
The recent trend in the commoditization of object storage has unlocked better alternatives for businesses looking to find a new home for their data. You now have the option of choosing more affordable and easy-to-use platforms, like Backblaze B2, without paying unnecessarily high storage bills or egress fees. And with the help of Flexify.IO’s migration solution, cloud-to-cloud migrations have never been more straightforward.
It only takes a few minutes to set up a Flexify.IO account. Once you click that big “Start Migration” button inside the platform, Flexify.IO handles the rest. To get started today, check out our Quick Start Guide on how to transfer your data from your existing cloud storage provider into Backblaze B2.
If you’d like to learn more about seamless data migration using Flexify.IO, please join us for our upcoming webinar about this solution on April 7.
For Additional Information: This post is one of a series focusing on solutions for professionals in media and entertainment. If you’d like to learn more, we’re hosting a series of webinars about these solutions over the coming months. Please join us for our first on March 26 with iconik!
Jeff Nicosia, owner and founder of Industrious Films, will be the first to tell you that his company of creatives is typical of the new, modern creative agency. They make their name and reputation every day by delivering big advertising agency creative services directly to clients without all the extra expense of big advertising agency overhead and excess.
Part of this lighter approach includes taking a more flexible attitude towards building their team. With decades of experience at some of the best-known agencies in LA and New York, Industrious Films knows that the best people for their projects are spread out all over the country, so they employ a distributed set of creatives on almost every job.
But with entire teams working remotely, they need tools that boost collaboration and reduce the inefficiency of sending huge media files back and forth as everyone pushes to meet client delivery deadlines.
Backblaze hired Industrious Films to produce videos for our NAB booth presence last year, and during our collaborative process we introduced Backblaze B2 Cloud Storage to their team. For this group of road-tested veterans, the potential for our cloud storage project to help their process was eye-opening indeed.
How Cloud Storage Has Impacted Industrious Films Workflow
As we re-engaged with Industrious Films to work on new projects this year, we wanted to hear Jeff’s thoughts on what cloud storage has accomplished for his team, and what it was like before they started using B2.
Skip Levens: Jeff, can you tell me about the niche that Industrious Films has carved out, and what your team is like?
Jeff Nicosia: Industrious Films brings the best of advertising agency video production directly to companies by eliminating the middleman. We tell customer and company stories really well, with craft, and have found a way to do it at a price that lets companies do a lot more video in their marketing vs. a once-a-year luxury. We’ve really found our niche in telling company stories and customer videos, working for companies like Quantum, Backblaze (of course), DDN, Thanx, Unisan, ExtraHop and tons more.
We’re all creatives that worked at ad agencies, design studios, post houses, etc. We’re spread out but come together for projects all over the country, and actually the world. Right now I’m in Manhattan Beach (Los Angeles, CA) while our main editor is on the other side of LA—25 minutes or 2 hours by car away depending on time of day—and our main graphics editor is in Iowa. Oh, and our colorist is either in Los Angeles or Brazil, depending on the time of year.
As for shooting we use sound guys, shooters, PA’s, etc., either from LA, or we hire locally wherever we’re shooting the video. We have crews we have collaborated with on multiple occasions in LA, Seattle, New York, London, and San Francisco. I actually shot a timelapse of a fairly typical shoot day: “A 14-Hour Shoot in 60s” to give you an idea of what it’s like.
SL: Jeff, before we talk about how you adopted Backblaze B2 and cloud storage in general, can you paint a picture of what it’s usually like to shoot and deliver a large video project like the one you created for us?
JN: It’s a never-ending exchange of hard drives and bouncing between Dropbox, Box, Google Drive, and what have you, as everyone is swapping files and sending updates. We’re also chasing customers and asking, “Did you get the file?” Or, “Did you send the file?” All of this was hard enough when video size was HD—now, when everything’s 4K or higher it just doesn’t work at all. A single 4K RAW file of 3-4GB might take up an entire Google Drive allowance, and it gets very expensive to save to Google Drive beyond that size. We’ve spent an entire day trying to upload a single critical file that looks like its uploading, then have it crap out hours later. At that point, we’ve just wasted a day and we’re back to slapping files on a hard drive and trying to make the FedEx cutoff.
“Any small business or creative endeavor has to be remote nowadays. You want to hire the best people, no matter where they are. When people live where they want and work out of a home office they charge less for their rates—that’s how we deliver a full-service ad agency and video production service at the prices we can.”
SL: I remember, from working together on other projects, that we were constantly swapping hard drives and saying, “Is this one yours?” Or finally seeing you again years later, and handing you back your hard drive.
JN: Right! It’s so common. And you can’t just put files on a hard drive and ship it. We’ve had overnight services lose drives on us enough times that we’ve learned to always make extra copies on entirely new hard drives before sending a drive out. It’s always a time crunch and you have to make sure you have a spare drive and that it’s big enough. And you just know that when you send it to a client you’re never going to see that drive again. It’s a cost of business, and hundreds of dollars a month just gone—or at least it used to be. I’ve spent way too much time stalking Best Buy buying extra hard drives when there’s a sale because we were constantly buying drives.
SL: So that was the mindset when we kicked off our NAB Video Project last year (for NAB 2019) and I said, instead of handing you a hard drive with all of our B-roll, logos, etc., let’s use Backblaze B2.
Technical Note: I helped Industrious Films set up three buckets: a private bucket that I issued write-only app keys for (basically a ‘drop bucket’ for any incoming content); a private bucket for everyone on the project to access; and a public bucket for sharing review files directly from Backblaze if needed.
Next, I cut app keys for Industrious Films that spanned all three buckets so that they could easily organize and rearrange content as needed. I entered the app key into a bookmark in Cyberduck, and gave Industrious Films the bookmark to drop into Cyberduck.
JN: Well, we work for technical clients, but I’m not really a technical guy. And my team are all creatives, not techies, so anything we use has to be incredibly simple. I wasn’t sure how it was going to work. Most of us were familiar with FTP clients, and this interface looks like the files and folders we’d see on a typical shared storage server, so it was very easy to adapt.
“Even though I have a background in tech, I’ve worked in technology companies, and my customers are tech companies, I’m not a tech savvy guy at all and I don’t want to be. So the tools I use have to be simple and let me get on with telling my customer’s story.”
Everyone on my team works out of their home offices, or shared workspaces. I’ve got a 100 Megabit connection, up and down, and our graphics guy has the same—and he’s in the middle of Iowa. We each started uploading files in Cyberduck, then we jumped on a Skype call together and watched 6GB files fly across and we were just blown away. We just couldn’t believe that this was cloud storage, and it seemed like the more we put in, the faster it got. Our graphics guy was just raving about it, trying out bigger and bigger file uploads. He was freaking out—he kept saying, “What kind of secret sauce do these guys have!?”
SL: Can you tell me how the team adjusted to using a shared bucket? What did collaboration look like?
JN: First of all, since we had a files and folders interface, I jumped right in and did the usual organization of assets. One folder for Backblaze customer video reels, one for Backblaze B-roll, one for logos, one for audio, one for storyboards, motion graphics templates, etc. Then everyone downloads what they need from the folder to work locally, and puts changed and finished files back up in the shared bucket for everyone to see. That way we can review on the fly.
I sync everything to a local RAID array, but most of the time my focus is only on the shared bucket with the team. I don’t use an asset manager or project manager solution—I can always drop in something like iconik later if we’re doing overlapping large projects simultaneously. This works for our team for now and is exactly what we need.
“My graphics lead moved from North Hollywood to Iowa. And whether he’s 25 miles away from me or 2000, if we’re not in the same room, we need a way to send files to each other quickly to work together. So if the tools are good enough, it doesn’t matter where the team is anymore.”
SL: I seem to remember we needed some of those files for tweaks and changes as we were deploying on the NAB show floor?
JN: Right, since we had the entire project and all the source files online, in all the chaos of NAB booth building before the show opened, as we played our video on the huge screens—we realized we could still swap in a better graphic. So, we just pulled from the Backblaze web interface and dropped it in right there. Otherwise, we’d have had to track down the new file and have someone deliver it to us, or more likely not make the change at all.
“Speed is collaboration for us as a small team. When uploads and downloads are fast and we’re not fighting to move files around, then we can try new ideas quickly, zero in on the best approach, and build and turn projects faster. It’s how we punch way, way above our weight as a small shop and compete with the biggest agencies.”
SL: What advice would you give creatives who want to try to rely less on dragging hard drives around? Any final thoughts?
A: Well, first of all, hard drives are never totally going away. At least not until something very simple and very cheap comes along. I might work for technical customers, but sometimes their marketing leads will hand me hard drives, or when I want to deliver a file or have them review a file they’ll ask me to put it on a private YouTube or Vimeo link. They want to review on their phone or at lunch, so it needs to be simple for them, too. But at least we can organize everything we do on Backblaze and there’s a lot fewer hard drives in my life at least.
One of the biggest revelations I’ve had is not just for editors and producers working on projects like we did, but for shooters too. On a shoot, everyone takes a copy of the raw files and no one leaves the shoot until there are two copies. If there’s a problem with the camera carts (storage cards) this whole process can be agonizingly slow. If only more people knew they could upload a copy to something like Backblaze that would not only function as a shared copy but also allow everyone to start reviewing and editing files right away instead of waiting until they got back to the shop.
And finally, everyone can do what we’ve done. The way we’ve thrived and how creatives find their niche and thrive in a gig economy is to use simple, easy to use tools that let you tell those stories, offer better service, and compete with bigger agencies with higher overhead. We did it, anyone else can too.
SL: Absolutely! Thanks Jeff, for taking time out to talk to us. We really appreciate your team’s work and look forward to working together on our next project!
“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.” –The Editors
“Big Data” is a phrase people love to throw around in advertising and planning documents, despite the fact that the term itself is rarely defined the same way by any two businesses, even among industry leaders. However, everyone can agree about its rapidly growing importance—understanding Big Data and how to leverage it for the greatest value will be of critical organizational concern for the foreseeable future.
So then what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? What makes it so big, anyway? Let’s bring Big Data down to size.
What is Big Data?
First things first, for purposes of this discussion, “Big” means any amount of data that exceeds the storage capacity of a single organization. “Data” refers to information stored or processed on a computer. Collectively, then, “Big Data” is a massive volume of both structured or unstructured (or both) data that is too large to effectively process using traditional relational database management systems or applications. In more general terms, when your infrastructure is too small to handle the data your business is generating—either because the volume of data is too large, it moves too fast, or it simply exceeds the current processing capacity of your systems—you’ve entered the realm of Big Data.
Let’s take a look at the defining characteristics.
Characteristics of Big Data
Current definitions of Big Data often reference a “triple (or in some cases quadruple) V” construct for detailing its characteristics. The “V”s reference velocity, volume, variety, and variability. We’ll define them for you here:
Velocity refers to the speed of generation of the data—the pace at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, mobile devices, etc. This speed determines how rapidly data must be processed to meet business demands, which determines the real potential for the data.
The term Big Data itself obviously references significant volume. But beyond just being “big,” the relative size of a data set is a fundamental factor in determining its value. The volume of data stored by an organization is used to ascertain its scalability, accessibility, and ease or difficulty of management. A few examples of high volume data sets are all of the credit card transactions in the United States on a given day; the entire collection of medical records in Europe; and every video uploaded to YouTube in an hour. A small to moderate volume might be the total number of credit card transactions in your business.
Variety refers to how many disparate or separate data sources contribute to an organization’s Big Data, along with the intrinsic nature of the data coming from each source. This relates to both structured and unstructured data. Years ago, spreadsheets and databases were the primary sources of data handled by the majority of applications. Today, data is generated in a multitude of formats such as email, photos, videos, monitoring devices, PDFs, audio, etc.,—all of which demand different considerations in analysis applications. This variety of formats can potentially create issues for storage, mining, and analyzing data.
This concerns any inconsistencies in the data formats coming from any one source. Where variety considers different inputs from different sources, variability considers different inputs from one data source. These differences can complicate the effective management of the data store. Variability may also refer to differences in the speed of the data flow into your storage systems. Where velocity refers to the speed of all of your data, variability refers to how different data sets might move at different speeds. Variability can be a concern when the data itself has inconsistencies despite the architecture remaining constant.
An example from the health sector would be the variances within influenza epidemics (when and where they happen, how they’re reported in different health systems) and vaccinations (where they are/aren’t available) from year to year.
Understanding the makeup of Big Data in terms of Velocity, Volume, Variety, and Variability is key when strategizing big data solutions. This fundamental terminology will help you to effectively communicate among all players involved in decision making when you bring Big Data solutions to your team or your wider business. Whether pitching solutions, engaging consultants or vendors, or hearing out the proposals of the IT group, a shared terminology is crucial.
What is Big Data Used For?
Businesses use Big Data to try to predict future customer behavior based on past patterns and trends. Effective predictive analytics are the metaphorical crystal ball that organizations seek about what their customers want and when they want it. Theoretically, the more data collected, the more patterns and trends the business can identify. This information can potentially make all the difference for a successful strategy in customer acquisition and retention, and create loyal advocates for a business.
In this case, bigger is definitely better! But, the method an organization chooses to address its Big Data needs will be a pivotal marker for success in the coming years. Choosing your approach begins with understanding the sources of your data.
Sources of Big Data
Today’s world is incontestably digital: an endless array of gadgets and devices function as our trusted allies on a daily basis. While helpful, these constant companions are also responsible for generating more and more data every day. Smartphones, GPS technology, social media, surveillance cameras, machine sensors (and the growing number of users behind them) are all producing reams of data on a moment-to-moment basis that has increased exponentially, from 1 Zetabyte of customer data produced in 2009 to more than 35 Zetabytes in 2020.
If your business uses an app to receive and process orders for customers, or if you log extensive point-of-sale retail data, or if you have massive email marketing campaigns, you could have sources for untapped insight into your customers.
Once you understand the sources of your data, the next step is understanding the methods for housing and managing it. Data Warehouses and Data Lakes are two of the primary types of storage and maintenance systems that you should be familiar with.
Where Is Big Data Stored? Data Warehouses & Data Lakes
Although both Data Lakes and Data Warehouses are widely used for Big Data storage they are not interchangeable terms.
A Data Warehouse is an electronic system used to organize information. A Data Warehouse goes beyond the capabilities of a traditional relational database’s function of housing and organizing data generated from a single source only.
How Do Data Warehouses Work?
A Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose. A warehouse combines information from multiple sources into a single comprehensive database.
For example, in the retail world, a data warehouse may consolidate customer info from point-of-sale systems, the company website, consumer comment cards, and mailing lists. This information can then be used for distribution and marketing purposes, to track inventory movements, customer buying habits, manage promotions, and to determine pricing policies.
Additionally, the Data Warehouse may also incorporate information about company employees such as demographic data, salaries, schedules, and so on. This type of information can be used to inform hiring practices, set Human Resources policies and help guide other internal practices.
Data Warehouses are fundamental in the efficiency of modern life. For instance:
Have a plane to catch?
Airline systems rely on Data Warehouses for many operational functions like route analysis, crew assignments, frequent flyer programs, and more.
Have a headache?
The healthcare sector uses Data Warehouses to aid organizational strategy, help predict patient outcomes, generate treatment reports, and cross-share information with insurance companies, medical aid services, and so forth.
Are you a solid citizen?
In the public sector, Data Warehouses are mainly used for gathering intelligence and assisting government agencies in maintaining and analyzing individual tax and health records.
Playing it safe?
In investment and insurance sectors, the warehouses are mainly used to detect and analyze data patterns reflecting customer trends, and to continuously track market fluctuations.
Have a call to make?
The telecommunications industry makes use of Data Warehouses for management of product promotions, to drive sales strategies, and to make distribution decisions.
Need a room for the night?
The hospitality industry utilizes Data Warehouse capabilities in the tailored design and cost-effective implementation of advertising and marketing programs targeted to reflect client feedback and travel habits.
Data Warehouses are integral in many aspects of the business of everyday life. That said, they aren’t capable of handling the inflow of data in its raw format, like object files or blobs. A Data Lake is the type of repository needed to make use of this raw data. Let’s examine Data Lakes next.
What is a Data Lake?
A Data Lake is a vast pool of raw data, the purpose for which is not yet defined. This data can be both structured and unstructured. The prime attributes of a Data Lake are a secure and adaptable data storage and maintenance system distinguished by its flexibility, agility, and ease of use.
If you’re considering a business approach that involves Data Lakes, you’ll want to look for solutions that have the following characteristics: they should retain all data and support all data types; they should easily adapt to change; and they should provide quick insights to as wide a range of users as you require.
Use Cases for Data Lakes
Data Lakes are most helpful when working with streaming data, like the sorts of information gathered from machine sensors, live event-based data streams, clickstream tracking, or product/server logs.
Deployments of Data Lakes typically address one or more of the following business use cases:
Business intelligence and analytics – analyzing streams of data to determine high-level trends and granular, record-level insights. A good example of this is the oil and gas industry, which has used the nearly 1.5 Terabytes of data they generate on a daily basis to increase their efficiency.
Data science – unstructured data allows for more possibilities in analysis and exploration, enabling innovative applications of machine learning, advanced statistics and predictive algorithms. State, city, and federal governments around the world are using data science to dig more deeply into the massive amount of data they collect regarding traffic, utilities, and pedestrian behavior to design safer, smarter cities.
Data serving – Data Lakes are usually an integral part of high-performance architectures for applications that rely on fresh or real-time data, including recommender systems, predictive decision engines or fraud detection tools. A good example of this use case are the different Customer Data Platforms available that pull information from many behavioral and transactional data sources to highly refine and target marketing to individual customers.
When considered together, the different potential applications for Data Lakes in your business seem to promise an endless source of revolutionary insights. But the ongoing maintenance and technical upgrades required for these data sources to retain relevance and value is massive. If neglected or mismanaged, Data Lakes quickly devolve. As such, one of the biggest considerations to weigh when considering this approach is whether you have the financial and personnel capacity to manage Data Lakes over the long term.
What is a Data Swamp?
A Data Swamp, put simply, is a Data Lake that no one cared to manage appropriately. They arise when a Data Lake is being treated as storage only, with a lack of curation, management, retention and lifecycle policies, and metadata. And if you decided to work Data Lake derived insights into your business planning, and end up with a Swamp, you are going to be sorely disappointed. You’re paying the same amount to store all of your data, but returning zero effective intelligence to your bottom line.
Final Thoughts on Big Data Maintenance
Any business or organization considering entry into Big Data country will want to be very careful and planful as they consider how they will store, maintain, and analyze their data. Making the right choices at the outset will ensure you’re able to traverse the developing digital landscape with strategic insights that enable informed decisions to keep you ahead of your competitors. We hope this primer on Big Data gives you the confidence to take the appropriate first steps.
Getting started with cloud storage is easy. You can sign up for an account in seconds, and in minutes you can have directories full of files from your latest project in your cloud account, accessible to you and anyone you want to share the files with. If you have dozens or even hundreds of terabytes of data already, however, uploading it all into your cloud storage bucket, or transferring it there from another cloud service will take a bit of careful planning.
Thankfully, whether you’re looking to transfer a significant amount of data from your on-premises solution to the cloud, or you’d like to escape the grips of a cloud storage provider like Amazon S3 without breaking your budget, Backblaze has worked hard to ensure you have any number of pathways to, from, and around the cloud. We’ve gathered six of our favorite services and partnerships for transferring or migrating your data.
For Additional Information: This post is one of a series leading up to the annual NAB conference in Las Vegas from April 18-22. If you’re attending, please join us at booth SL8716 to learn more about the solutions outlined below. For all of our readers, we’re hosting a series of webinars about these solutions over the coming months. Please join us for our first on March 26 with iconik!
Quick and Affordable Uploads to the Cloud with the Fireball
Many of our customers have used our Backblaze Fireball rapid ingest service to migrate large data sets from their on-premises environments into Backblaze cloud storage quickly, affordably, and securely.
How it works: we send you a Backblaze Fireball, a 70 TB hard drive array. You copy files to the Fireball directly or through a data transfer tool of your choice. (Backup, sync, and archive tools are great for that.) Once you’re done, return the Fireball to Backblaze and we’ll securely upload the files to your Backblaze B2 Cloud Storage account inside one of our data centers. Fireball service is priced at $550 per 30-day rental, which gives you a comfortable window of time to load your data.
This service has proved to be a customer favorite because even with high speed internet connections, it can take months to transfer data sets to the cloud. Fireball can get you up and running in weeks. For example, after KLRU—the PBS affiliate in Austin, Texas—completed their digital restoration project of more than four decades of Austin City Limits shows, they used the Fireball to efficiently load the entire 40 terabyte video library into B2 Cloud Storage.
Similarly, creative agency Baron & Baron jump-started their entrance into cloud backups with the Fireball. Using Archiware, they created full backups of all of their servers into Fireballs, which were then uploaded securely into their B2 account much quicker than if they’d backed up over their internet connection.
As popular as the Fireball is, not all of our customers use it to get started. Some don’t have dozens of terabytes to upload at once. Others have access to high-speed internet connections, or their content wasn’t organized enough to justify renting a Fireball for a single bulk upload to the cloud. These customers opted for other pathways to migrate their data into the cloud that we’ll outline below.
No Rush? Try Internet Transfers with Cyberduck and Transmit 5
If you don’t have a huge digital library to upload or don’t generate terabytes of data every day, your existing internet connection may be sufficient for your transfer. And with the right tools you won’t have to wait too long. B2 Cloud Storage is integrated with two tools—Cyberduck and Transmit 5—that use multi-threading to transfer large files much faster.
Cyberduck runs on both macOS and Windows. It’s open-source, free software, but their team of volunteers will gladly accept donations to help them develop, maintain, and distribute it.
Transmit 5 by Panic runs on macOS. You can try it out for free for seven days or keep it forever for $45 per license. Volume discounts are available.
If you want to learn more about how multi-threading makes this speed boost possible, you can read more about the basics of threading here. The short version is that before transmission, large files are broken into chunks that are simultaneously transferred across multiple threads, then recompiled after transmission is complete. If you’re transferring video, high-resolution photos, or other large files, these file transfer tools are well-worth the effort of installing them, and more than worth their inexpensive cost.
One Backblaze media customer took this approach for migrating their decades-old digital archive to the cloud. They started by backing up their active projects to the cloud to protect against far-too-often accidental deletions. Then, in the background, they gradually copied their archive of finished projects and raw footage from on-premises storage to the cloud. The complete archive migration took months to finish, but that was acceptable given their ability to maintain their copies on existing on-premises storage during the process.
Boost Large File Transfers with FileCatalyst
If your existing internet bandwidth is sufficient for your day-to-day needs, but you occasionally need to transfer high volumes of data quickly—for example, after videotaping a weekend conference—look to fast file transport solutions like FileCatalyst.
FileCatalyst accelerates file transfers between remote locations even when there is high network latency or packet loss—in other words, when the connection is unreliable and weak—allowing you to send at speeds up to 10Gbps. Optimized for large data sets, its proprietary technologies include UDP-based protocol that’s much faster than standard TCP (the other main protocol computers use to communicate over the internet).
As a software-only solution, FileCatalyst doesn’t require customers to add special hardware or make bandwidth upgrades. Service starts at $300/month, and is available as a month-to-month subscription with no additional data transfer charges or bandwidth caps. Consumption pricing and perpetual license pricing are also available.
FileCatalyst has been integrated with Backblaze cloud storage for two years and is a solution we recommend to customers who need to transfer data on the order of 10 terabytes or more per day. With this solution you can get all the footage from that weekend conference transferred and available to editors come Monday morning.
Pay-As-You-Go with MASV Fast File Transfer
MASV is a recent Backblaze integration that also offers fast file transfer services, but on a pay-as-you-go basis. MASV runs in your web browser, so you don’t need to download and install any software. This simplicity makes MASV perfect for delivering content to and from clients who would rather login and drag-and-drop files than download software and train their team.
Have you ever struggled with keeping content for collaborative teams organized? With MASV, you can provide a simple Portals page where contributors can upload huge media files and share across the team members without requiring direct access to the project’s cloud storage buckets. MASV not only moves the large files faster, but also makes it easier for contributors to get started, and reduces the risk of disruption. And MASV’s pay-as-you-go 25 cents/GB pricing means you only pay for what the team actually uses. A free 7-day trial lets you test it out.
Smart Applications for a “Hybrid Cloud” Approach with iconik
Knowing that their customers increasingly are keeping their data on different types of storage simultaneously, some vendors are building their applications to access and manage data wherever it’s stored, whether on-premises or in the cloud.
B2 Cloud Storage is integrated with iconik, a media asset management (MAM) platform that was designed with this hybrid cloud approach in mind. With iconik, all your assets appear in the same visual interface, regardless of where they’re stored. The cloud-based MAM generates and stores proxies or thumbnails in the cloud, but keeps full-resolution files in their original location. iconik downloads the full-resolution files to the user only when needed.
This hybrid cloud is great when you want the flexibility of the cloud, but want to migrate there in stages. As we’ve noted above, it can take time to move everything to the cloud and it’s easier for people to do it on their own schedule. According to their recent Media Stats Report, iconik customers are making good use of this flexibility, with 53% of iconik-managed assets stored in the cloud and 47% stored on-premises.
Backblaze customer Fin Films took this approach when they moved to the cloud. Owner Chris Aguilar was painfully aware of how vulnerable content on aging hard drives can be; he had to pay a drive reconstruction team to salvage footage before. So, Fin Films uploaded their most irreplaceable content to B2 cloud storage first, then began gradually moving other content.
As they were migrating content to their new cloud archive, Fin Films rolled out iconik to manage their assets better. iconik’s “bring your own storage” approach meant that they didn’t have to pull down their content from Backblaze and upload it to iconik. And iconik’s pay-as-you-use-it pricing allowed Chris Aguilar to add collaborators on the fly, and share his content quickly by adding view-only licenses at no charge. Pricing for iconik starts at $250/month for a small team.
Move from Cloud to Cloud with Flexify.IO
Sometimes the difficulty isn’t getting data into the cloud, it’s moving data from one cloud service to another. Transferring your data from one cloud storage service to another isn’t as easy or inexpensive as you might hope. Cloud providers offer different capabilities and more significantly, dramatically different pricing. As is true with moving your household, hiring experts to do the job can save you a lot of time and stress. Enter Flexify.IO.
Flexify.IO offers cloud data migration services that simplify moving or copying your data between cloud storage providers. The service ensures maximum throughput by using cloud internet connections, as opposed to your local internet bandwidth, and eliminates downtime during migration. And it’s now fully integrated with Backblaze cloud storage, with special pricing for migrating data from AWS to Backblaze.
Video production company AK Productions recently used Flexify.IO to move their video content to B2 Cloud Storage from Google Cloud Storage. They had found their Google cloud service took hours to manage—time they couldn’t afford to spend while growing their business. They were also worried about the privacy of their clients’ important data on Google.
Partnering with Flexify.IO, Backblaze successfully migrated 12 terabytes of data stored on Google Cloud to B2 Cloud Storage in 12 hours. Transferring that amount of data via standard business connections can take weeks, months, or even longer. AK Productions was able to achieve all of this with no disruption to their project workflows. Even with the cost of egress fees from Google, the business will break even and start realizing significant cost savings in approximately six months.
So Many Choices for So Many Pathways to the Cloud
We know choosing the right path to the cloud for your organization when there are so many choices can be difficult. For that reason, we’ll be hosting “how-to” webinars that go deeper on the process and visually show you the steps to take. Our first webinar is March 26 with iconik. To stay in the loop for future webinars after NAB 2020, follow our webinar channel on BrightTalk. And we’ll have a team of Solution Engineers at NAB to demonstrate the tools and offer expert guidance. Sign up now to meet with us at NAB.
As of December 31, 2019, Backblaze had 124,956 spinning hard drives. Of that number, there were 2,229 boot drives and 122,658 data drives. This review looks at the hard drive failure rates for the data drive models in operation in our data centers. In addition, we’ll take a look at how our 12 and 14 TB drives are doing and get a look at the new 16 TB drives we started using in Q4. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
2019 Hard Drive Failure Rates
At the end of 2019 Backblaze was monitoring 122,658 hard drives used to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 5,000 drive days during Q4 (see notes and observations for why). This leaves us with 122,507 hard drives. The table below covers what happened in 2019.
Notes and Observations
There were 151 drives (122,658 minus 122,507) that were not included in the list above. These drives were either used for testing or did not have at least 5,000 drive days during Q4 of 2019. The 5,000 drive-day limit removes those drive models where we only have a limited number of drives working a limited number of days during the period of observation. NOTE: The data for all drives, data drives, boot drives, etc., is available for download on the Hard Drive Test Data webpage.
The only drive model not to have a failure during 2019 was the 4 TB Toshiba, model: MD04ABA400V. That’s very good, but the data sample is still somewhat small. For example, if there had been just 1 (one) drive failure during the year, the Annualized Failure Rate (AFR) for that Toshiba model would be 0.92%—still excellent, not 0%.
The Toshiba 14 TB drive, model MG07ACA14TA, is performing very well at a 0.65% AFR, similar to the rates put up by the HGST drives. For their part, the Seagate 6 TB and 10 TB drive continue to be solid performers with annualized failure rates of 0.96% and 1.00% respectively.
The AFR for 2019 for all drive models was 1.89% which is much higher than 2018. We’ll discuss that later in this review.
Beyond the 2019 Chart—“Hidden” Drive Models
There are a handful of drive models that didn’t make it to the 2019 chart because they hadn’t recorded enough drive-days in operation. We wanted to take a few minutes to shed some light on these drive models and where they are going in our environment.
Seagate 16 TB Drives
In Q4 2019 we started qualifying Seagate 16 TB drives, model: ST16000NM001G. As of the end of Q4 we had 40 (forty) drives in operation, with a total of 1,440 drive days—well below our 5,000 drive day threshold for Q4, so they didn’t make the 2019 chart. There have been 0 (zero) failures through Q4, making the AFR 0%, a good start for any drive. Assuming they continue to pass our drive qualification process, they will be used in the 12 TB migration project and to add capacity as needed in 2020.
Toshiba 8 TB Drives
In Q4 2019 there were 20 (twenty) Toshiba 8 TB drives, model: HDWF180. These drives have been installed for nearly two years. In Q4, they only had 1,840 drive days, below the reporting threshold, but lifetime they do have 13,994 drive days with only 1 drive failure, giving us an AFR of 2.6%. We like these drives, but by the time they were available to us in quantity, we could buy 12 TB drives at the same cost per TB. More density, same price. Given we are moving to 16 TB drives and beyond, we most likely will not be buying any of these drives in the future.
HGST 10 TB Drives
There are 20 (twenty) HGST 10 TB drives, model: HUH721010ALE600 in the operation. These drives have been in service a little over one year. They reside in the same Backblaze Vault as the Seagate 10 TB drives. The HGST drives recorded only 1,840 drive days in Q4 and a total of 8,042 since being installed. There have been 0 (zero) failures. As with the Toshiba 8 TB, purchasing more of these 10 TB drives is unlikely.
Toshiba 16 TB Drives
You won’t find these in the Q4 stats, but in Q1 2020 we added 20 (twenty) Toshiba 16 TB drives, model: MG08ACA16TA. They have logged a total of 100 drive days, so it is way too early to say anything other than more to come in the Q1 2020 report.
Comparing Hard Drive Stats for 2017, 2018, and 2019
The chart below compares the Annualized Failure Rates (AFR) for each of the last three years. The data for each year is inclusive of that year only and for the drive models present at the end of each year.
The Rising AFR in 2019
The total AFR for 2019 rose significantly in 2019. About 75% of the different drive models experienced a rise in AFR from 2018 to 2019. There are two primary drivers behind this rise. First, the 8 TB drives as a group seem to be having a mid-life crisis as they get older, with each model exhibiting their highest failure rates recorded. While none of the rates is cause for worry, they contribute roughly one fourth (1/4) of the drive days to the total, so any rise in their failure rate will affect the total. The second factor is the Seagate 12 TB drives, this issue is being aggressively addressed by the 12 TB migration project reported on previously.
The Migration Slows, but Growth Doesn’t
In 2019 we added 17,729 net new drives. In 2018, a majority of the 14,255 drives added were due to migration. In 2019, less than half of the new drives were for migration with the rest being used for new systems. In 2019 we decommissioned 8,800 drives totaling 37 Petabytes of storage and replaced them with 8,800 drives, all 12 TB, totaling about 105 Petabytes of storage, then we added an additional 181 Petabytes of storage in 2019 using 12 TB and 14 TB drives.
Manufacturer diversity across drive brands increased slightly in 2019. In 2018, Seagate drives were 78.15% of the drives in operation, by the end of 2019 that percentage had decreased to 73.28%. HGST went from 20.77% in 2018, to 23.69% in 2019, and Toshiba increased form 1.34% in 2018 to 3.03% in 2019. There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.
Lifetime Hard Drive Stats
While comparing the annual failure rates of hard drives over multiple years is a great way to spot trends, we also look at the lifetime annualized failure rates of our hard drives. The chart below shows the annualized failure rates of all of the drives models in production as of 12/31/2019.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purposes. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the CSV files for each chart.
Good luck and let us know if you find anything interesting.
Regular Hard Drive Stats readers will recall that our blog post about Q3 2019 explained that we planned to take a closer look at some drive failures we were seeing at the time and report back when we knew more. Well, we’ve been monitoring the situation since then and wanted to update you on where things stand. Despite the fact that Hard Drive Stats for 2019 are just around the corner, we decided to share this information with you as soon as we could, rather than waiting for the next post. In summary, this year (and going into the next year) we expect to see higher failure rates in some of our hard drives and we will be migrating some drives to newer models. Below, we’ll discuss what’s going on, what we’re doing about it, and why customers shouldn’t worry.
So What’s Up?
In a recent blog post, we interviewed our Director of Supply Chain, Ariel Ellis, about how we purchase and qualify hard drives to be deployed in our data centers. The TL/DR is that our qualification process is robust. Nevertheless, for all providers of scale in the cloud storage industry, trends that are hard to project during testing can emerge over time after drives are used in production batches of dozens of petabytes, or more, at a time.
What we’re seeing in our fleet right now is a higher-than-typical failure rate among some of our 12TB Seagate drives. It’s customary for hard drive manufacturers like Seagate, when working with data centers and cloud service providers, to ensure successful deployment of large-scale drive fleets, and as such we’re working closely with them to analyze the drives and their performance. This analysis usually includes things like testing new drive platforms in real workload environments, providing telemetry tools to predict failures, performing ongoing custom adjustments, and employing firmware development and replacement units (RMAs). Customer data durability is paramount for both Backblaze and Seagate, so as we analyze root causes and implications we’re also working together on a migration effort to replace these particular drives in our data centers. In the short term, failure rates for a subset of our drives may increase, but we have processes in place to adjust for that fluctuation.
Running a cloud business is complex, so it’s very helpful to have a partner like Seagate who can help us to react quickly and bring their expertise in drive deployment to bear in aiding our migration efforts. It’s worth noting that situations like this are not uncommon in our industry and often go unnoticed by the end-users of the services, as most cloud providers do not inform customers or the public when they experience issues like what we’re describing. Backblaze, on the other hand, is a bit more open than most companies in the industry.
We’re in a unique position because of the Hard Drive Stats that we publish, which is why we felt it was important to let folks know about the upcoming changes ahead of time. At the end of the day, we think this openness is helpful for everyone, especially our customers.
In the near term, we expect to see moderately increased failure rates for this specific subset of 12TB drives, but as we complete the drive migration, we project our fleet’s failure rates will restore to historical norms. Meanwhile, it will be business as usual. We’ll continue to provide the most reliable, affordable, and easy-to-use cloud storage and computer backup available, and we’ll continue to provide our Hard Drive Stats for you every quarter.
For those who follow Backblaze, you’ll know that QNAP was an early integrator for our B2 Cloud Storage service. The popular storage company sells solutions for almost any use case where local storage is needed, and with their Hybrid Backup Sync software, you can easily sync that data to the cloud. For years, we’ve helped QNAP users like Yoga International and SoCo Systems back up and archive their data to B2. But QNAP never stops innovating, so we wanted to share some recent updates that will have both current and potential users excited about the future of our integrations.
Hybrid Backup Sync 3.0
Current QNAP and B2 users are used to having Hybrid Backup Sync (HBS) quickly and reliably sync their data to the cloud. With the HBS 3.0 update, the feature has become far more powerful. The latest update adds true backup capability for B2 users with features like version control, client-side encryption, and block-level deduplication. QNAP’s operating system, QTS, continues to deliver innovation and add thrilling new features. In the QTS 4.4.1 update, you also have the ability to preview backed up files using the QuDedup Extract Tool, allowing QNAP users to save on bandwidth costs.
The QTS 4.4.1 update is now available (you can download it here) and the HBS 3.0 update is currently available in the App Center on your QNAP device.
Hybrid Mount and VJBOD Cloud
The new Hybrid Mount and VJBOD Cloud apps will allow QNAP users to designate a drive in their system to function as a cache while accessing their B2 Cloud Storage. This allows users to interact with B2 just like you would a folder on your QNAP device while using B2 as an active storage location.
Hybrid Mount and VJBOD Cloud are both included in the QTS 4.4.1 update and function as a storage gateway on a file-based or block-based level, respectively. Hybrid Mount enables B2 to be used as a file server and is ideal for online collaboration and file-level data analysis. VJBOD Cloud is ideal for a large number of small files or singular massively large files (think databases!) since it’s able to update and change files on a block-level basis. Both apps offer the ability to connect to B2 via popular protocols to fit any environment, including SMB, AFP, NFS, FTP and WebDAV.
QuDedup introduces client-side deduplication to the QNAP ecosystem. This helps users at all levels to save on space on their NAS by avoiding redundant copies in storage. B2 users have something to look forward to as well since these savings carry over to cloud storage via the HBS 3.0 update.
QNAP continues to innovate and unlock the potential of B2 in the NAS ecosystem. We’re huge fans of these new updates and whatever else may come down the pipeline in the future. We’ll be sure to highlight any other exciting updates as they become available.
Backblaze’s data centers may not be the biggest in the world of data storage, but thanks to some chutzpah, transparency, and wily employees, we’re able to punch well above our weight when it comes to purchasing hard drives. No one knows this better than our Director of Supply Chain, Ariel Ellis.
As the person on staff ultimately responsible for sourcing the drives our data centers need to run—some 117,658 by his last count—Ariel knows a thing or two about purchasing petabytes-worth of storage. So we asked him to share his insights on the evaluation and purchasing process here at Backblaze. While we’re buying at a slightly larger volume than some of you might be, we hope you find Ariel’s approach useful and that you’ll share your own drive purchasing philosophies in the comments below.
An Interview with Ariel Ellis, Director of Supply Chain at Backblaze
Sourcing and Purchasing Drives
Backblaze: Thanks for making time, Ariel—we know staying ahead of the burn rate always keeps you busy. Let’s start with the basics: What kinds of hard drives do we use in our data centers, and where do we buy them?
Ariel: In the past, we purchased both consumer and enterprise hard drives. We bought the drives that gave us the best performance and longevity for the price, and we discovered that, in many cases, those were consumer drives.
Today, our purchasing volume is large enough that consumer drives are no longer an option. We simply can’t get enough. High capacity drives in high volume are only available to us in enterprise models. But, by sourcing large volume and negotiating prices directly with each manufacturer, we are able to achieve lower costs and better performance than we could when we were only buying in the consumer channel. Additionally, buying directly gives us five year warranties on the drives, which is essential for our use case.
We began to purchase direct around the launch of our Vault architecture, in 2015. Each Vault contains 1,200 drives and we have been deploying two to four, or more, Vaults each month. 4,800 drives are just not available through consumer distribution. So we now purchase drives from all three hard drive manufacturers: Western Digital, Toshiba, and Seagate.
Backblaze: Of the drives we’re purchasing, are they all 7200 RPM and 3.5” form factor? Is there any reason we’d consider slower drives or 2.5” drives?
Ariel: We use drives with varying speeds, though some power-conserving drives don’t disclose their drive speed. Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.
As far as 2.5″ drives, I’ve run the math and they’re not more cost effective than 3.5″ drives, so there’s no incentive for us to use them.
Backblaze: What about other drive types and modifications, like SSD, or helium enclosures, or SMR drives? What are we using and what have we tried beyond the old standards?
Ariel: When I started at Backblaze, SSDs were more than ten times the cost of conventional hard drives. Now they’re about three times the cost. But for Backblaze’s business, three times the cost is not viable for the pricing targets we have to meet. We do use some SSDs as boot drives, as well as in our backend systems, where they are used to speed up caching and boot times, but there are currently no flash drives in our Storage Pods—not in HDD or M.2 formats. We’ve looked at flash as a way to manage higher densities of drives in the future and we’ll continue to evaluate their usefulness to us.
Helium has its benefits, primarily lower power draw, but it makes drive service difficult when that’s necessary. That said, all the drives we have purchased that are larger than 8 TB have been helium—they’re just part of the picture for us. Higher capacity drives, sealed helium drives, and other new technologies that increase the density of the drives are essential to work with as we grow our data centers, but they also increase drive fragility, which is something we have to manage.
SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.
Soon we’ll be dealing with MAMR and HAMR drives as well. We plan to test both technologies in 2020. We’re also testing interesting new tech like Seagate’s MACH.2 Multi Actuator, which allows the host to request and receive data simultaneously from two areas of the drive in parallel, potentially doubling the input/output operations per second (IOPS) performance of each individual hard drive. This offsets issues of reduced data availability that would otherwise arise with higher drive capacities. The drive also can present itself as two independent drives. For example, a 16 TB drive can appear as two independent 8 TB drives. A Vault using 60 drives per pod could present as 120 drives per pod. That offers some interesting possibilities.
Backblaze: What does it take to deploy a full vault, financially speaking? Can you share the cost?
Ariel: The cost to deploy a single vault varies between $350,000 to $500,000, depending on the drive capacities being used. This is just the purchase price though. There is also the cost of data center space, power to house and run the hardware, the staff time to install everything, and the bandwidth used to fill it. All of that should be included in the total cost of filling a vault.
Evaluating and Testing New Drive Models
Backblaze: Okay, so when you get to the point where the tech seems like it will work in the data center, how do you evaluate new drive models to include in the Vaults?
Ariel: First, we select drives that fit our cost targets. These are usually high capacity drives being produced in large volumes for the cloud market. We always start with test batches that are separate from our production data storage. We don’t put customers’ data on the test drives. We evaluate read/write performance, power draw, and generally try to understand how the drives will behave in our application. Once we are comfortable with the drive’s performance, we start adding small amounts to production vaults, spread across tomes in a way that does not sacrifice parity. As drive capacities increase, we are putting more and more effort into this qualification process.
We used to be able to qualify new drive models in thirty days. Now we typically take several months. On one hand, this is because we’ve added more steps to pre- and post-production testing. As we scale up, we need to scale up our care, because the effect of any issues with drives increases in line with bigger and bigger implementations. Additionally, from a simple physics perspective, a vault that uses high capacity drives takes longer to fill and we want to monitor the new drive’s performance throughout the entire fill period.
Backblaze: When it comes to the evaluation of the cost, is there a formula for $/terabyte that you follow?
Ariel: My goal is to reduce cost per terabyte on a quarterly basis—in fact, it’s a part of how my job performance is evaluated. Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years. That savings is achieved in three primary ways: 1) lowering the actual cost of drives by negotiating with vendors, 2) occasionally moving to higher drive densities, and 3) increasing the slot density of pod chassis. (We moved from 45 drives to 60 drives in 2016, and as we look toward our next Storage Pod version we’ll consider adding more slots per chassis).
Meeting Storage Demand
Backblaze: When it comes to how this actually works in our operating environment, how do you stay ahead of the demand for storage capacity?
Ariel: We maintain several months of the drive space that we would need to meet capacity based on predicted demand from current customers as well as projected new customers. Those buffers are tied to what we expect will be the fill-time of our Vaults. As conditions change, we could decide to extend those buffers. Demand could increase unexpectedly, of course, so our goal is to reduce the fill-time for Vaults so we can bring more storage online as quickly as possible, if it’s needed.
Backblaze: Obviously we don’t operate in a vacuum, so do you worry about how trade challenges, weather, and other factors might affect your ability to obtain drives?
Ariel: (Laughs) Sure, I’ve got plenty to worry about. But we’ve proved to be pretty resourceful in the past when we’re challenged. For example: During the worldwide drive shortage, due to flooding in Southeast Asia, we recruited an army of family and friends to buy drives all over and send them to us. That kept us going during the shortage.
We are vulnerable, of course, if there’s a drive production shortage. Some data center hardware is manufactured in China, and I know that some of those prices have gone up. That said, all of our drives are manufactured in Thailand or Taiwan. Our Storage Pod chassis are made in the U.S.A. Big picture, we try to anticipate any shortages and plan accordingly if we can.
Backblaze: Time for a personal question… What does data durability mean to you? What do you do to help boost data durability, and spread drive hardware risk and exposure?
Ariel: That is personal. (Laughs). But also a good question, and not really personal at all: Everyone at Backblaze contributes to our data durability in different ways.
My role in maintaining eleven nines of durability is, first and foremost: Never running out of space. I achieve this by maintaining close relationships with manufacturers to ensure production supply isn’t interrupted; by improving our testing and qualification processes to catch problems before drives ever enter production; and finally by monitoring performance and replacing drives before they fail. Otherwise it’s just monitoring the company’s burn rates and managing the buffer between our drive capacity and our data under management.
When we are in a good state for space considerations, then I need to look to the future to ensure I’m providing for more long-term issues. This is where iterating on and improving our Storage Pod design comes in. I don’t think that gets factored into our durability calculus, but designing for the future is as important as anything else. We need to be prepared with hardware that can support ever-increasing hard drive capacities—and the fill- and rebuild times that come with those increases—effectively.
Backblaze: That begs the next question: As drive sizes get larger, rebuild times get longer when it’s necessary to recover data on a drive. Is that still a factor, given Backblaze’s durability architecture?
Ariel: We attempt to identify and replace problematic drives before they actually fail. When a drive starts failing, or is identified for replacement, the team always attempts to restore as much data as possible off of it because that ensures we have the most options for maintaining data durability. The rebuild times for larger drives are challenging, especially as we move to 16TB and beyond. We are looking to improve the throughput of our Pods before making the move to 20TB in order to maintain fast enough rebuild times.
And then, supporting all of this is our Vault architecture, which ensures that data will be intact even if individual drives fail. That’s the value of the architecture.
Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.
But now we’re preparing for how we’ll handle 18 and 20 TB drives, and improving system throughput will be extremely important to manage that density. We may even consider using SAS drives even though they are slightly more expensive. We need to consider all options in order to meet our scaling, durability and cost targets.
Backblaze’s Relationship with Drive Manufacturers
Backblaze: So, there’s an elephant in the room when it comes to Backblaze and hard drives: Our quarterly Hard Drive Stats reports. We’re the only company sharing that kind of data openly. How have the Drive Stats blog posts affected your purchasing relationship with the drive manufacturers?
Ariel: Due to the quantities we need and the visibility of the posts, drive manufacturers are motivated to give us their best possible product. We have a great purchasing relationship with all three companies and they update us on their plans and new drive models coming down the road.
Backblaze: Do you have any sense for what the hard drive manufacturers think of our Drive Stats blog posts?
Ariel: I know that every drive manufacturer reads our Drive Stats reports, including very senior management. I’ve heard stories of company management learning of the release of a new Drive Stats post and gathering together in a conference room to read it. I think that’s great.
Ultimately, we believe that Drive Stats is good for consumers. We wish more companies with large data centers did this. We believe it helps keep everyone open and honest. The adage is that competition is ultimately good for everyone, right?
It’s true that Western Digital, at one time, was put off by the visibility Drive Stats gave into how their models performed in our data centers (which we’ve always said is a lot different from how drives are used in homes and most businesses). Then they realized the marketing value for them—they get a lot of exposure in the blog posts—and they came around.
Backblaze: So, do you believe that the Drive Stats posts give Backblaze more influence with drive manufacturers?
Ariel: The truth is that most hard drives go directly into tier-one and -two data centers, and not into smaller data centers, homes, or businesses. The manufacturers are stamping out drives in exabyte chunks. A single tier-one data center consumes maybe 500,000 times what Backblaze does in drives. We can’t compare in purchasing power to those guys, but Drive Stats does give us visibility and some influence with the manufacturers. We have close communications with the manufacturers and we get early versions of new drives to evaluate and test. We’re on their radar and I believe they value their relationship with us, as we do with them.
Backblaze: A final question. In your opinion, are hard drives getting better?
Ariel: Yes. Drives are amazingly durable for how hard they’re used. Just think of the forces inside a hard drive, how hard they spin, and how much engineering it takes to write and read the data on the platters. I came from a background in precision optics, which requires incredibly precise tolerances, and was shocked to learn that hard drives are designed in an equally precise tolerance range, yet are made in the millions and sold as a commodity. Despite all that, they have only about a 2% annual failure rate in our centers. That’s pretty good, I think.
Thanks, Ariel. Here’s hoping the way we source petabytes of storage has been useful for your own terabyte, petabyte, or… exabyte storage needs? If you’re working on the latter, or anything between, we’d love to hear about what you’re up to in the comments.
In this blog series, we explore how you can master the nomadic life—whether for a long weekend, an extended working vacation, or maybe even the rest of your career. We profile professionals we’ve met who are stretching the boundaries of what (and where) an office can be, and glean lessons along the way to help you to follow in their footsteps. In our first post in the series, we provided practical tips for working on the road. In this edition, we profile Chris Aguilar, Amphibious Filmmaker.
There are people who do remote filming assignments, and then there’s Chris, the Producer/Director of Fin Films. For him, a normal day might begin with gathering all the equipment he’ll need—camera, lenses, gear, cases, batteries, digital storage—and securing it in a waterproof Pelican case which he’ll then strap to a paddleboard for a long swim to a race boat far out on the open ocean.
This is because Chris, a one-man team, is the preeminent cinematographer of professional paddleboard racing. When your work day involves operating from a beachside hotel, and being on location means bouncing up and down in a dinghy some 16 miles from shore, how do you succeed? We interviewed Chris to figure out.
Getting Ready for a Long Shoot
To save time in the field, Chris does as much prep work as he can. Knowing that he needs to be completely self-sufficient all day—he can’t connect to power or get additional equipment—he gathers and tests all of the cameras he’ll need for all the possible shots that might come up, packs enough SD camera cards, and grabs an SSD external drive large enough to store an entire day’s footage.
Chris edits in Adobe Premiere, so he preloads a template on his MacBook Pro to hold the day’s shots and orders everything by event so that he can drop his content in and start editing it down as quickly as possible. Typically, he chooses a compatible format that can hold all of the different content he’ll shoot. He builds a 4K timeline at 60 frames per second that can take clips from multiple cameras yet can export to other sizes and speeds as needed for delivery.
Days in the Life
Despite being in one of the most exotic and glamorous locations in the world (Hawaii), covering a 32-mile open-ocean race is grueling. Chris’s days start as early as 5AM with him grabbing shots as contestants gather, then filming as many as 35 interviews on race-day eve. He does quick edits of these to push content out as quickly as possible for avid fans all over the world.
The next morning, before race time, he double-checks all the equipment in his Pelican case, and, when there’s no dock, he swims out to the race- or camera boat. After that, Chris shoots as the race unfolds, constantly swapping out SD cards. When he’s back on dry land his first order of business is copying over all of the content to his external SSD drive.
Even after filming the race’s finish, awards ceremonies, and wrap-up interviews, he’s still not done: By 10PM he’s back at the hotel to cut a highlight reel of the day’s events and put together packages that sports press can use, including the Australian press that needs content for their morning sports shows.
For streaming content in the field, Chris relies on Google Fi through his phone because it can piggyback off of a diverse range of carriers. His backup network solution is a Verizon hotspot that usually covers him where Google Fi cannot. For editing and uploading, he’s found that he can usually rely on his hotel’s network. When that doesn’t work, he defaults to his hotspot, or a coffee shop. (His pro tip is that, for whatever reason, the Starbucks in Hawaii typically have great internet.)
Building a Case
After years of shooting open-ocean events, Chris has settled on a tried and true combination of gear—and it all fits in a single, waterproof Pelican 1510 case. His kit has evolved to be as simple and flexible as possible, allowing him to cover multiple shooting roles in a hostile environment including sand, extreme sun-glare on the water, haze, fog, and of course, the ever-present ocean water.
At the same time, his gear needs to accommodate widely varied shooting styles: Chris needs to be ready to capture up close and personal interviews; wide, dramatic shots of the pre-race ceremonies; as well as a combination of medium shots of several racers on the ocean and long, telephoto shots of individuals—all from a moving boat bobbing on the ocean. Here’s his “Waterproof Kit List”:
The Case Pelican 1510
Chris likes compact, rugged camcorders from Panasonic. They have extremely long battery life, and the latest generation have large sensor sizes, wide dynamic range and even built-in ND filter wheels to compensate for the glare on the water. He’ll also bring other cameras for special shots, like an 8mm film camera for artistic shots, or a GoPro for the classic ‘from under the sea to the waterline’ shots.
Primary Interview Camera
Panasonic EVA1 5.7K Compact Cinema Camcorder 4K 10b 4:2:2 with EF lens-mount (with rotating lens kit depending on the event)
Action Camera and B-Roll
Panasonic AG-CX350 (or EVA1 kitted out similarly if the CX350 isn’t available)
Stills and Video
Panasonic GH5 20.3MP and 4K 60fps 4:2:2 10-b Mirrorless ILC camera
Special Purpose and B-Roll Shots
Eumig Nautica Super 8 film self-sealed waterproof camera
4K GoPro in a waterproof dome housing
As a one-person show, Chris invests in enough SD cards for his cameras that can cover the entire day’s shooting without having to reuse cards. Chris will then copy all of those card’s content to a bus-powered SSD drive.
8-12 64GB or 128GB SD cards
1 TB SSD Glyph or G-Tech SSD drive
Multiple Neutral Density filters. These filters reduce the intensity of all wavelengths without affecting color. With ND filters the operator can dial in combinations of aperture, exposure time and sensor sensitivity without being overexposed, and delivers more ‘filmic’ looks, setting the aperture to a low value for sharper images, or wide open for a shallow depth-of-field
Extra batteries. Needless to say having extra batteries for his cameras and his phone is critical when he may not be able to recharge for 12 hours or more.
Now, The Real Work Begins
When wrapping up an event’s coverage, all of the content captured needs to be stored and managed. Chris’s previous workflow required transferring the raw and finished files to external drives for storage. That added up to a lot of drives. Chris estimates that over the years he had stored about 20 terabytes of footage on paddleboarding alone.
Managing all those drives proved to be too big of a task for someone who is rarely in his production office. Chris needed access to his files from wherever he was, and a way to view, catalog, and share the content with collaborators.
As he got his approach dialed to accommodate remote broadband speed, storage drive wrangling, inexpensive cloud storage, and cloud-based digital asset management systems, putting all his content into the cloud became an option for Chris. Using Backblaze’s B2 Cloud Storage along with iconik content management software, what used to take several days in the office searching through hard drives for specific footage to edit or share with a collaborator now involves just a few keyword searches and a matter of minutes to share via iconik.
For a digital media nomad like Chris, digitally native solutions based in the cloud make a lot of sense. Plus, Chris knows that the content is safely and securely stored, and not exposed to transport challenges, accidents (including those involving water), and other difficulties that could spoil both his day and that of his clients.
Learn More About How Chris Works Remotely
You can learn more about Chris, Fin Film Company, and how he works from the road in our case study on Fin Films. We’ve also linked to Chris’s Kit Page for those of you who just can’t get enough of this gear…
We’d Love to Hear Your Digital Nomad Stories
If you consider yourself a digital nomad and have an interesting story about using Backblaze Cloud Backup or B2 Cloud Storage from the road (or wherever), we’d love to hear about it, and perhaps feature your story on the blog. Tell us what you’ve been doing on the road at firstname.lastname@example.org.
You can view all the posts in this series on the Digital Nomads page in our Blog Archives.
Everything that makes working at a creative agency exciting also makes it challenging. With each new client, creative teams are working on something different. One day they’re on site, shooting a video for a local business, the next they’re sifting through last year’s concert footage for highlights to promote this year’s event. When their juices are flowing, it’s as easy for them to lose track of the files they need as it is for them to lose track of time.
If you’re tasked with making sure a team’s content is protected every day, as well as ensuring that it’s organized and saved for the future, we have some tips to make your job easier. Because we know you’d rather be working on your own projects, not babysitting backups or fetching years-old content from a dusty archive closet.
Since we’re sure you’re not making obvious mistakes—like expecting creatives to manually archive their own content, or not having a 3-2-1 backup strategy—we’ll focus on the not-so-obvious tips. Many of these come straight from our own creative agency customers who learned the hard way, before they rolled out a cloud-based backup and archive solution.
Tip #1—Save everything when a client’s project is completed
For successful creative agencies, there’s no such thing as “former” clients, only clients that you haven’t worked with lately. That means your job managing client data isn’t over when the project is delivered. You need to properly archive everything: not just the finished videos, images or layouts, but all the individual assets created for the project and all the raw footage.
It’s not unusual for clients to request raw footage, even years after the project is complete. If you only saved master copies and can’t send them all of their source footage, your client may question how you manage their content, which could impact their trust in you for future projects.
The good news is that if you have an organized, accessible content archive, it’s easy to send a drive or even a download link to a client. It may even be possible for you to charge clients to retrieve and deliver their content to them.
Tip #2—Stop using external drives for backup or archive
If your agency uses external disk drives to back up or archive your projects, you’re not alone. Creative teams do it because it’s dead simple: you plug the drive in, copy project files to it, unplug the drive, and put it on a shelf or in a drawer. But there are some big problems with this.
First, since external drives are removable, they’re easily misplaced. It’s not unusual for someone to take a drive offsite to work on a project and forget to return it. Second, removable drives can fail over time after being damaged by physical impacts, water, magnetic fields, or even “bit rot” from just sitting on a shelf. Finally, locating client files in a stack of drives can be like finding a needle in a haystack, especially if the editor who worked on the project has left the agency.
Tip #3—Organize your archive for self-service access
Oh, the frustration of knowing you already have a clip that would be perfect for a new project, but… who knows where it is? With the right tools in place, a producer’s frustration doesn’t mean you’ll have to drop everything and join their search party. Even if you’re not sure you need a full-featured MAM, your time would be well-spent to find a solution that allows creatives to search and retrieve files from the archive on their own.
Look for software that lets them browse through thumbnails and proxies instead of file names, and allows them to search based on metadata. Your archive storage shouldn’t force you to be on site and instantly available to load LTO tapes and retrieve those clips the editor absolutely and positively has to have today.
Tip #4—Schedule regular tests for backup restores and archive retrievals
When you first set up your backup system, I’m sure you checked that the backups were firing off on schedule, and tested restoring files and folders. But have you done it lately? Since the last time you checked, any number of things could have changed that would break your backups.
Maybe you added another file share that wasn’t included in the initial set up. Perhaps your backup storage has reached capacity. Maybe an operating system upgrade on a workstation is incompatible with your backup software. Perhaps the automated bill payment for a backup vendor failed. Bad things can happen when you’re not looking, so it’s smart to schedule time at least once a month to test your backups and restores. Ditto for testing your archives.
Tip #5 – Plan for long-term archive media refresh
If your agency has been in business more than a handful of years, you probably have content stored on media that’s past its expiration date. (Raise your hand if you still have client content stored on Betacam.) Drive failures increase significantly after 4 years (see our data center’s latest hard drive stats), and tape starts to degrade around 15 years. Even if the media is intact, file formats and other technologies can become obsolete quicker than you can say LTO-8. The only way to ensure access to archived content is to migrate it to newer media and/or technologies. This unglamorous task sounds simple—reading the data off the old media and copying it to new media—but the devil is in the details.
Of course, if you backup or archive to Backblaze B2 cloud storage, we’ll migrate your data to newer disk drives for you as needed over time. It all happens behind the scenes so you don’t ever need to think about it. And it’s included free with our service.
Want to see how all these tips works together? Join our live webinar co-hosted with Archiware on Tuesday, December 10, and we’ll show you how Baron & Baron, the agency behind the world’s top luxury brands from Armani to Zara, solved their backup and archive challenges.
As of September 30, 2019, Backblaze had 115,151 spinning hard drives spread across four data centers on two continents. Of that number, there were 2,098 boot drives and 113,053 data drives. We’ll look at the lifetime hard drive failure rates of the data drive models currently in operation in our data centers, but first we’ll cover the events that occurred in Q3 that potentially affected the drive stats for that period. As always, we’ll publish the data we use in these reports on our Hard Drive Test Data web page and we look forward to your comments.
Hard Drive Stats for Q3 2019
At this point in prior hard drive stats reports we would reveal the quarterly hard drive stats table. This time we are only going to present the Lifetime Hard Drive Failure table, which you can see if you jump to the end of this report. For the Q3 table, the data which we typically use to create that report may have been indirectly affected by one of our utility programs which performs data integrity checks. While we don’t believe the long-term data is impacted, we felt you should know. Below, we will dig into the particulars in an attempt to explain what happened in Q3 and what we think it all means.
What is a Drive Failure?
Over the years we have stated that a drive failure occurs when a drive stops spinning, won’t stay as a member of a RAID array, or demonstrates continuous degradation over time as informed by SMART stats and other system checks. For example, a drive that reports a rapidly increasing or egregious number of media read errors is a candidate for being replaced as a failed drive. These types of errors are usually seen in the SMART stats we record as non-zero values for SMART 197 and 198 which log the discovery and correctability of bad disk sectors, typically due to media errors. We monitor other SMART stats as well, but these two are the most relevant to this discussion.
What might not be obvious is that changes in some SMART attributes only occur when specific actions occur. Using SMART 197 and 198 as examples again, these values are only affected when a read or write operation occurs on a disk sector whose media is damaged or otherwise won’t allow the operation. In short, SMART stats 197 and 198 that have a value of zero today will not change unless a bad sector is encountered during normal disk operations. These two SMART stats don’t cause read and writes to occur, they only log aberrant behavior from those operations.
Protecting Stored Data
When a file, or group of files, arrives at a Backblaze data center, the file is divided into pieces we call shards. For more information on how shards are created and used in the Backblaze architecture, please refer to Backblaze Vault and Backblaze Erasure Coding blog posts. For simplicity’s sake, let’s say a shard is a blob of data that resides on a disk in our system.
As each shard is stored on a hard drive, we create and store a one-way hash of the contents. For reasons ranging from media damage to bit rot to gamma rays, we check the integrity of these shards regularly by recomputing the hash and comparing it to the stored value. To recompute the shard hash value, a utility known as a shard integrity check reads the data in the shard. If there is an inconsistency between the newly computed and the stored hash values, we rebuild the shard using the other shards as described in the Backblaze Vault blog post.
Shard Integrity Checks
The shard integrity check utility runs as a utility task on each Storage Pod. In late June, we decided to increase the rate of the shard integrity checks across the data farm to cause the checks to run as often as possible on a given drive while still maintaining the drive’s performance. We increased the frequency of the shard integrity checks to account for the growing number of larger-capacity drives that had been deployed recently.
The Consequences for Drive Stats
Once we write data to a disk, that section of disk remains untouched until the data is read by the user, the data is read by the shard integrity check process to recompute the hash, or the data is deleted and written over. As a consequence, there are no updates regarding that section of disk sent to SMART stats until one of those three actions occur. By speeding up the frequency of the shard integrity checks on a disk, the disk is read more often. Errors discovered during the read operation of the shard integrity check utility are captured by the appropriate SMART attributes. Putting together the pieces, a problem that would have been discovered in the future—under our previous shard integrity check cadence—would now be captured by the SMART stats when the process reads that section of disk today.
By increasing the shard integrity check rate, we potentially moved failures that were going to be found in the future into Q3. While discovering potential problems earlier is a good thing, it is possible that the hard drive failures recorded in Q3 could then be artificially high as future failures were dragged forward into the quarter. Given that our Annualized Failure Rate calculation is based on Drive Days and Drive Failures, potentially moving up some number of failures into Q3 could cause an artificial spike in the Q3 Annualized Failure Rates. This is what we will be monitoring over the coming quarters.
There are a couple of things to note as we consider the effect of the accelerated shard integrity checks on the Q3 data for Drive Stats:
The number of drive failures over the lifetime of a given drive model should not increase. At best we just moved the failures around a bit.
It is possible that the shard integrity checks did nothing to increase the number of drive failures that occurred in Q3. The quarterly failure rates didn’t vary wildly from previous quarters, but we didn’t feel comfortable publishing them at this time given the discussion above.
Lifetime Hard Drive Stats through Q3 2019
Below are the lifetime failure rates for all of our drive models in service as of September 30, 2019.
The lifetime failure rate for the drive models in production rose slightly, from 1.70% at the end of Q2 to 1.73% at the end of Q3. This trivial increase would seem to indicate that the effect of the potential Q3 data issue noted above is minimal and well within a normal variation. However, we’re not satisfied that is true yet and we have a plan for making sure as we’ll see in the next section.
What’s Next for Drive Stats?
We will continue to publish our Hard Drive Stats each quarter, and next quarter we expect to include the quarterly (Q4) chart as well. For the foreseeable future, we will have a little extra work to do internally as we will be tracking two different groups of drives. One group will be the drives that “went through the wormhole,” so to speak, as they were present during the accelerated shard integrity checks. The other group will be those drives that were placed into production after the shard integrity check setting was reduced. We’ll compare these two datasets to see if there was indeed any effect of the increased shard integrity checks on the Q3 hard drive failure rates. We’ll let you know what we find in subsequent drive stats reports.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know what you find.
As always, we look forward to your thoughts and questions in the comments.
Editor’s Note: Since 2013, Backblaze has published statistics and insights based on the hard drives in our data centers. Why? Well, we like to be helpful, and we thought sharing would help others who rely on hard drives but don’t have reliable data on performance to make informed purchasing decisions. We also hoped the data might aid manufacturers in improving their products. Given the millions of people who’ve read our Hard Drive Stats posts and the increasingly collaborative relationships we have with manufacturers, it seems we might have been right.
But we don’t only share our take on the numbers, we also provide the raw data underlying our reports so that anyone who wants to can reproduce them or draw their own conclusions, and many have. We love it when people reframe our reports, question our logic (maybe even our sanity?), and provide their own take on what we should do next. That’s why we’re featuring Ryan Smith today.
Ryan has held a lot of different roles in tech, but lately he’s been dwelling in the world of storage as a product strategist for Hitachi. On a personal level, he explains that he has, “passion for data, finding insights from data, and helping others see how easy and rewarding it can be to look under the covers.” It shows.
A few months ago we happened on a post by Ryan with an appealing header featuring our logo with an EXPOSED stamp superimposed in red over our humble name. It looked like we had been caught in a sting operation. As a company that loves transparency, we were delighted. Reading on we found a lot to love and plenty to argue over, but more than anything, we appreciated how Ryan took data we use to analyze hard drive failure rates and extrapolated out all sorts of other gleanings about our business. As he puts it, “it’s not the value at the surface but the story that can be told by tying data together.” So, we thought we’d share his original post with you to (hopefully) incite some more arguments and some more tying together of data.
While we think his conclusions are reasonable based on the data available to him, the views and analysis below are entirely Ryan’s. We appreciate how he flagged some areas of uncertainty, but thought it most interesting to share his thoughts without rebuttal. If you’re curious about how he reached them, you can find his notes on process here. He doesn’t have the full story, but we think he did amazing work with the public data.
Our 2019 Q3 Hard Drive Stats post will be out in a few weeks, and we hope some of you will take Ryan’s lead and do your own deep dive into the reporting when it’s public. For those of you who can’t wait, we’re hoping this will tide you over for a little while.
If you’re interested in taking a look at the data yourselves, here’s our Hard Drive Data and Stats webpage that has links to all our past Hard Drive Stats posts and zip files of the raw data.
Ryan Smith Uses Backblaze’s SMART Stats to Illustrate the Power of Data
It is now common practice for end-customers to share telemetry (call home) data with their vendors. My analysis below shares some insights about your business that vendors might gain from seemingly innocent data that you are sending them every day.
On a daily basis, Backblaze (a cloud backup and storage provider) logs all its drive health data (aka SMART data) for over 100,000 of its hard drives. With 100K+ records a day, each year can produce over 30 million records. They share this raw data on their website, but most people probably don’t really dig into it much. I decided to see what this data could tell me and what I found was fascinating.
Rather than looking at nearly 100 million records, I decided to only look at just over one million which consisted of the last day of every quarter from Q1’16 to Q1’19. This would give me enough granularity to see what is happening inside Backblaze’s cloud backup storage business. For those interested, I used MySQL to import and transform the data into something easy to work with (see more details on my SQL query); I then imported the data into Excel where I could easily pivot the data and look for insights. Below are the results of this effort.
User Data vs Physical Capacity
I grabbed the publicly posted “Petabytes stored” that BackBlaze claims on their website (“User Petabytes”) and compared that to the total capacity from the SMART data they log (“Physical Petabytes”) and then compared them against each other to see how much overhead or unused capacity they have. The Theoretical Max (green line) is based on their ECC protection scheme (13+2 and/or 17+3) that they use to protect user data. If the “% User Petabytes” is below that max then this means Backblaze either has unused capacity or they didn’t update their website with the actual data stored.
Data Read/Written vs Capacity Growth
Looking at the last two years, by quarter, you can see a healthy amount of year-over-year growth in their write workload; roughly 80% over the last four quarters! This is good since writes likely correlate with new user data, which means broader adoption of their offering. For some reason their read workloads spiked in Q2’17 and have maintained a higher read workload since then (as indicated by the YoY spikes from Q2’17 to Q1’18, and then settling back to less than 50% YoY since); my guess is this was likely driven by a change to their internal workload rather than a migration because I didn’t see subsequent negative YoY reads.
Now let’s look at some performance insights. A quick note: Only Seagate hard drives track the needed information in their SMART data in order to get insights about performance. Fortunately, roughly 80% of Backblaze’s drive population (both capacity and units) are Seagate so it’s a large enough population to represent the overall drive population. Going forward, it does look like the new 12 TB WD HGST drive is starting to track bytes read/written.
Pod (Storage Enclosure) Performance
Looking at Power-on-hours of each drive, I was able to calculate the vintage of each drive and the number of drives in each “pod” (this is the terminology that Backblaze gives to its storage enclosures). This lets me calculate the number of pods that Backblaze has in its data centers. Their original pods stored 45 drives and this improved to 60 drives in ~Q2’16 (according to past blog posts by Backblaze). The power-on-date allowed me to place the drive into the appropriate enclosure type and provide you with pod statistics like the Mbps per pod. This is definitely an educated guess as some newer vintage drives are replacement drives into older enclosures but the overall percentage of drives that fail is low enough to where these figures should be pretty accurate.
Overall, Backblaze’s data centers are handling over 100 GB/s of throughput across all their pods which is quite an impressive figure. This number keeps climbing and is a result of new pods as well as overall higher performance per pod. From quick research, this is across three different data centers (Sacramento x 2, Phoenix x 1) and maybe a fourth on its way in Europe.
Hard Drive Performance
Since each pod holds between 45 and 60 drives, with an overall max pod performance of 1 Gbps, I wasn’t surprised to see such average low drive performance. You can see that Backblaze’s workload is read heavy with less than 1 MB/s and writes only a third of that. Just to put that in perspective, these drives can deliver over 100 MB/s, so Backblaze is not pushing the limits of these hard drives.
As discussed earlier, you can also see how the read workload changed significantly in Q2’17 and has not reverted back since.
As I expected, the read and write performance is highly correlated to the drive capacity point. So, it appears that most of the growth in read/write performance per drive is really driven by the adoption of higher density drives. This is very typical of public storage-as-a-service (STaaS) offerings where it’s really about $/GB, IOPS/GB, MBs/GB, etc.
As a side note, the black dashed lines (average between all densities) should correlate with the previous chart showing overall read/write performance per drive.
Switching gears, let’s look at Backblaze’s purchasing history. This will help suppliers look at trends within Backblaze to predict future purchasing activities. I used power-on-hours to calculate when a drive entered the drive population.
Hard Drives Purchased by Density, by Year
This chart helps you see how Backblaze normalized on 4 TB, 8 TB, and now 12 TB densities. The number of drives that Backblaze purchases every year has been climbing until 2018 where it saw its first decline in units. However, this is mainly due to the efficiencies of the capacity per drive.
A question to ponder: Did 2018 reach a point where capacity growth per HDD surpassed the actual demand required to maintain unit growth of HDDs? Or is this trend limited to Backblaze?
Petabytes Purchased by Quarter
This looks at the number of drives purchased over the last five years, along with the amount of capacity added. It’s not quite regular enough to spot a trend, but you can quickly spot that the amount of capacity purchased over the last two years has grown dramatically compared to previous years.
HDD Vendor Market Share
Western Digital/WDC, Toshiba/TOSYY, Seagate/STX
Seagate is definitely the preferred vendor, capturing almost 100% of the market share save for a few quarters where WD HGST wins 50% of the business. This information could be used by Seagate or its competitors to understand where it stands within the account for future bids. However, the industry is monopolistic so it’s not hard to guess who won the business if a given HDD vendor didn’t.
Drive Population by Quarter
This shows the total drive population over the past three years. Even though the number of drives being purchased has been falling lately, the overall drive population is still growing.
You can quickly see that 4 TB drives saw its peak population in Q1’17 and has rapidly declined. In fact, let’s look at the same data but with a different type of chart.
That’s better. We can see that 12 TBs really had a dramatic effect on both 4 TB and 8 TB adoption. In fact, Backblaze has been proactively retiring 4 TB drives. This is likely due to the desire to slow the growth of their data center footprint which comes with costs (more on this later).
As a drive vendor, I could use this data to use the 4 TB trend to calculate how much drive replacement will be occurring next quarter, along with natural PB growth. I will look more into Backblaze’s drive/pod retirement later.
Current Drive Population, by Deployed Date
Be careful when interpreting this graph. What we are looking at here is the Q1’19 drive population where the date on the x-axis is the date the drive entered the population. This helps you see of all the drives in Backblaze’s population today, in which the oldest drives are from 2015 (with the exception of a few stragglers).
This indicates that the useful life of drives within Backblaze’s data centers are ~4 years. In fact, a later chart will look at how drives/pods are phased out, by year.
Along the top of the chart, I noted when the 60-drive pods started entering into the mix. The rack density is much more efficient with this design (rather than the 45-drive pod). Combine this, along with the 4 TB to 12 TB efficiency, Backblaze has aggressively been retiring its 4 TB/45-drive enclosures. There is still a large population of these remaining so expect some further migration to occur.
Boot Drive Population
This is the overall boot drive population over time. You can see that it is currently dominated by the 500 GB with only a few remaining smaller densities in the population today. For some reason, Toshiba has been the preferred vendor with Seagate only recently gaining some new business.
The boot drive population is also an interesting data point to use for verifying the number of pods in the population. For example, there were 1,909 boot drives in Q1’19 and my calculation of pods based on the 45/60-drive pod mix was 1,905. I was able to use the total boot drives each quarter to double check my mix of pods.
Pods (Drive Enclosures)
As discussed earlier, pods are the drive enclosures that house all of Backblaze’s hard drives. Let’s take a look at a few more trends that show what’s going on within the walls of their data center.
Pods Population by Deployment Date
This one is interesting. Each line in the graph indicates a particular snapshot in time of the total population. And the x-axis represents the vintage of the pods for that snapshot. By comparing snapshots, this allows you to see changes over time to the population. Namely, new pods being deployed and old pods being retired. To capture this, I looked at the last day of Q1 data for the last four years and calculated the date the drives entered the population. Using the “Power On Date” I was able to deduce the type of pod (45 or 60 drive) it was deployed in.
Some insights from this chart:
From Q2’16 to Q1’17, they retired some pods from 2010-11
From Q2’17 to Q1’18, they retired a significant number of pods from 2011-14
From Q2’18 to Q1’19, they retired pods from 2013-2015
Pods that were deployed since late 2015 have been untouched (you can tell this by seeing the lines overlap with each other)
The most pods deployed in a quarter was 185 in Q2’16
Since Q2’16, the number of pods deployed has been declining, on average; this is due to the increase in # of drives per pod and density of each drive
There are still a significant number of 45-drive pods to retire
Totaling up all the new pods being deployed and retired, it is easier to see the yearly changes happening within Backblaze’s operation. Keep in mind that these are all calculations and may erroneously include drive replacements as new pods; but I don’t expect it to vary significantly from what is shown here.
The data shows that any new pods that have been deployed in the past few years have mainly been driven by replacing older, less dense pods. In fact, the pod population has plateaued at around 1,900 pods.
Based on blog posts, Backblaze’s pods are all designed at 4U (4 rack units) and pictures on their site indicate 10 pods fit in a rack; this equates to 40U racks. Using this information, along with the drive population and the power-on-date, I was able to calculate the number of pods on any given date as well as the total number of racks. I did not include their networking racks in which I believe they have two of these racks per row in their data center.
You can quickly see that Backblaze has done a great job at slowing the growth of the racks in their data center. This all results in lower costs for their customers.
What interested me when looking at Backblaze’s SMART data was the fact that drives were being retired more than they were failing. This means the cost of failures is fairly insignificant in the scheme of things. It is actually efficiencies driven by technology improvements such as drive and enclosure densities that drove most of the costs. However, the benefits must outweigh the costs. Being that Backblaze uses Sungard AS for its data centers, let’s try to visualize the benefit of retiring drives/pods.
Colocation Costs, Assuming a Given Density
This shows the total capacity over time in Backblaze’s data centers, along with the colocation costs assuming all the drives were a given density. As you can see, in Q1’19 it would take $7.7M a year to pay for colocating costs of 861 PB if all the drives were 4 TB in size. By moving the entire population to 12 TB this can be reduced to $2.6M. So, just changing the drive density can have significant impacts on Backblaze’s operational costs. I did assume $45/RU costs in the analysis which their costs may be as low as $15/RU based on the scale of their operation.
I threw in 32 TB densities to illustrate a hypothetical SSD-type density so you can see the colocation cost savings by moving to SSDs. Although lower, the acquisition costs are far too high at the moment to justify a move to SSDs.
Break-Even Analysis of Retiring Pods
This chart helps illustrate the math behind deciding to retire older drives/pods based on the break-even point.
Let’s break down how to read this chart:
This chart is looking at whether Backblaze should replace older drives with the newer 12 TB drives
Assuming a cost of $0.02/GB for a 12 TB drive, that is a $20/TB acquisition cost you see on the far left
Each line represents the cumulative cost over time (acquisition + operational costs)
The grey lines (4 TB and 8 TB) all assume they were already acquired so they only represent operational costs ($0 acquisition cost) since we are deciding on replacement costs
The operational costs (incremental yearly increase shown) is calculated off of the $45 per RU colocation cost and how many of this drive/enclosure density fits per rack unit. The more TBs you can cram into a rack unit, the lower your colocation costs are
Assuming you are still with me, this shows that the break-even point for retiring 4 TB 4U45 pods is just over two years! And 4 TB 4U60 pods at 3 years! It’s a no brainer to kill the 4 TB enclosures and replace them with 12 TB drives. Remember that this assumes a $45RU colocation cost so the break-even point will shift to the right if the colocation costs are lower (which they surely are). You can see that the math to replace 8 TB drives with 12 TB doesn’t make as much sense so we may see Backblaze’s retirement strategy slow down dramatically after it retires the 4 TB capacity points.
As hard drive densities get larger and $/GB decreases, I expect the cumulative costs to start lower (less acquisition cost) and rise slower (less RU operational costs) making future drive retirements more attractive. Eyeballing it, it would be once $/GB approaches $0.01/GB to $0.015/GB.
Things Backblaze Should Look Into
Top of mind, Backblaze should look into these areas:
The architecture around performance is not balanced; investigate having a caching tier to handle bursts and put more drives behind each storage node to reduce “enclosure/slot tax” costs.
Look into designs like 5U84 from Seagate/Xyratex providing 16.8 drives per RU versus the 15 being achieved on Backblaze’s own 4U60 design; Another 12% efficiency!
5U allows for 8 pods to fit per rack versus the 10.
Look at when SSDs will be attractive to replace HDDs at a given $/GB, density, idle costs, # of drives that fit per RU (using 2.5” drives instead of 3.5”) so that they can stay on top of this trend [there is no rush on this one].
Performance and endurance of SSDs is irrelevant since the performance requirements are so low and the WPD is almost non-existence, making QLC and beyond a great candidate.
Look at allowing pods to be more flexible in handling different capacity drives to handle drive failures more cost efficiently without having to retire pods. Having concepts of “virtual pods” that don’t have physical limits will better accommodate the future that Backblaze has where it won’t be retiring pods as aggressively, yet still let them grow their pod densities seamlessly.
It is kind of ironic that the reason Backblaze posted all their SMART data is to share insights around failures when I didn’t even analyze failures once! There is much more analysis that could be done around this data set which I may revisit as time permits.
As you can see, even simple health data from drives, along with a little help from other data sources, can help expose a lot more than you would initially think. I have long felt that people have yet to understand the full power of giving data freely to businesses (e.g. Facebook, Google Maps, LinkedIn, Mint, Personal Capital, News Feeds, Amazon). I often hear things like, “I have nothing to hide,” which indicates the lack of value they assign to their data. It’s not the value at its surface but the story that can be told by tying data together.
Until next time, Ryan Smith.
• • •
Ryan Smith is currently a product strategist at Hitachi Vantara. Previously, he served as the director of NAND product marketing at Samsung Semiconductor, Inc. He is extremely passionate about uncovering insights from just about any data set. He just likes to have fun by making a notable difference, influencing others, and working with smart people.
Backblaze likes to talk about hard drive failures — a lot. What we haven’t talked much about is how we deal with those failures: the daily dance of temp drives, replacement drives, and all the clones that it takes to keep over 100,000 drives healthy. Let’s go behind the scenes and take a look at that dance from the eyes of one Backblaze hard drive.
After sitting still for what seemed like forever, ZCH007BZ was on the move. ZCH007BZ, let’s call him Zach, is a Seagate 12 TB hard drive. For the last few weeks, Zach and over 6,000 friends were securely sealed inside their protective cases in the ready storage area of a Backblaze data center. Being a hard disk drive, Zach’s modest dream was to be installed in a system, spin merrily, and store data for many years to come. And now the wait was nearly over, or was it?
The Life of Zach
Zach was born in a factory in Singapore and shipped to the US, eventually finding his way to Backblaze, but he didn’t know that. He had sat sealed in the dark for weeks. Now Zach and boxes of other drives were removed from their protective cases and gently stacked on a cart. Zack was near the bottom of the pile, but even he could see endless columns of beautiful red boxes stacked seemingly to the sky. “Backblaze!” one of the drives on the cart whispered. All the other drives gasped with recognition. Thank goodness the noise-cancelling headphones worn by all Backblaze Data Center Techs covered the drives’ collective excitement.
While sitting in the dark, the drives had gossiped about where they were: a data center, a distribution warehouse, a Costco, or Best Buy. Backblaze came up a few times, but that was squashed — they couldn’t be that lucky. After all, Backblaze was the only place where a drive could be famous. Before Backblaze, hard drives labored in anonymity. Occasionally, one or two would be seen in a hard drive tear down article, but even that sort of exposure had died out a couple of years ago. But Backblaze publishes everything about their drives, their model numbers, their serial numbers, heck even their S.M.A.R.T. statistics. There was a rumor that hard drives worked extra hard at Backblaze because they knew they would be in the public eye. With red Backblaze Storage Pods as far as the eye could see, Zach and friends were about to find out.
The cart Zach and his friends were on glided to a stop at the production build facility. This is where storage pods are filled with drives and tested before being deployed. The cart stopped by the first of twenty V6.0 Backblaze Storage Pods that together would form a Backblaze Vault. At each Storage Pod station 60 drives were unloaded from the cart. The serial number of each drive was recorded along with the Storage Pod ID and drive location in the pod. Finally, each drive was fitted with a pair of drive guides and slid into its new home as a production drive in a Backblaze Storage Pod. “Spin long and prosper,” Zach said quietly each time the lid of a Storage Pod snapped in place covering the 60 giddy hard drives inside. The process was repeated for the remaining 19 Storage Pods, and when it was done Zach remained on the cart. He would not be installed in a production system today.
The Clone Room
Zach and the remaining drives on the cart were slowly wheeled down the hall. Bewildered, they were rolled in the clone room. “What’s a clone room,” Zach asked to himself? The drives on the cart were divided into two groups, with one group being placed on the clone table, and the other being placed on the test table. Zach was on the test table.
Almost as soon as Zach was placed on the test table, the DC Tech picked him up again and placed him and several other drives into a machine. He was about to get formatted. The entire formatting process only took a few minutes for Zach, as it did for all of the other drives on the test table. Zach counted 25 drives, including himself.
Still confused and a little sore from the formatting, Zach and two other drives were picked up from the bench by a different DC Tech. She recorded their vitals — serial number, manufacturer, and model — and left the clone room with all three drives on a different cart.
Dreams of a Test Drive
The three drives were back on the data center floor with red Storage Pods all around. The DC Tech had maneuvered Luigi, the local Storage Pod lift unit, to hold a Storage Pod she was sliding from a data center rack. The lid was opened, the tech attached a grounding clip, and then removed one of the drives in the Storage Pod. She recorded the vitals of the removed drive. While she was doing so, Zach could hear the removed drive breathlessly mumble something about media errors, but before Zach could respond, the tech picked him up, attached drive guides to his frame and gently slide him into the Storage Pod. The tech updated her records, closed the lid, and slide the pod back into place. A few seconds later, Zach felt a jolt of electricity pass through his circuits and he and 59 other drives spun to life. Zach was now part of a production Backblaze Storage Pod.
First, Zach was introduced to the other 19 members of his tome. There are 20 drives in a tome, with each living in a separate Storage Pod. Files are divided (sharded) across these 20 drives using Backblaze’s open-sourced erasure code algorithm.
Zach’s first task was to rebuild all of the files that were stored on the drive he replaced. He’d do this by asking for pieces (shards) of all the files from the 19 other drives in his tome. He only needed 17 of the pieces to rebuild a file, but he asked everyone in case there was a problem. Rebuilding was hard work, and the other drives were often busy with reading files, performing shard integrity checks, and so on. Depending on how busy the system was, and how full the drives were, it might take Zach a couple of weeks to rebuild the files and get him up to speed with his contemporaries.
Nightmares of a Test Drive
Little did he know, but at this point, Zach was still considered a temp replacement drive. The dysfunctional drive that he replaced was making its way back to the clone room where a pair of cloning units, named Harold and Maude in this case, waited. The tech would attempt to clone the contents of the failed drive to a new drive assigned to the clone table. The primary reason for trying to clone a failed drive was recovery speed. A drive can be cloned in a couple of days, but as noted above, it can take up to a couple of weeks to rebuild a drive, especially large drives on busy systems. In short, a successful clone would speed up the recovery process.
For nearly two days straight, Zach was rebuilding. He barely had time to meet his pod neighbors, Cheryl and Carlos. Since they were not rebuilding, they had plenty of time to marvel at how hard Zach was working. He was 25 % done and going strong when the Storage Pod powered down. Moments later, the pod was slid out of the rack and the lid popped open. Zach assumed that another drive in the pod had failed, when he felt the spindly, cold fingers of the tech grab him and yank firmly. He was being replaced.
Zach had done nothing wrong. It was just that the clone was successful, with nearly all the files being copied from the previous drive to the smiling clone drive that was putting on Zach’s drive guides and gently being inserted in Zach’s old slot. “Goodbye,” he managed to eek out as he was placed on the cart and watched the tech bring the Storage Pod back to life. Confused, angry, and mostly exhausted, Zach quickly fell asleep.
Zach woke up just in time to see he was in the formatting machine again. The data he had worked so hard to rebuild was being ripped from his platters and replaced randomly with ones and zeroes. This happened multiple times and just as Zach was ready to scream, it stopped, and he was removed from his torture and stacked neatly with a few other drives.
After a while he looked around, and once the lights went out the stories started. Zach wasn’t alone. Several of the other temp drives had pretty much the same story; they thought they had found a home, only to be replaced by some uppity clone drive. One of the temp drives, Lin, said she had been in three different systems only to be replaced each time by a clone drive. No one wanted to believe her, but no one knew what was next either.
The Day the Clone Died
Zach found out the truth a few days later when he was selected, inspected, and injected as a temp drive into another Storage Pod. Then three days later he was removed, wiped, reformatted, and placed back in the temp pool. He began to resign himself to life as a temp drive. Not exactly glamorous, but he did get his serial number in the Backblaze Drive Stats data tables while he was a temp. That was more than the millions of other drives in the world that would forever be unknown.
On his third temp drive stint, he was barely in the pod a day when the lid opened and he was unceremoniously removed. This was the life of temp drive, and when the lid opened on the fourth day of his fourth temp drive shift, he just closed his eyes and waited for his dream to end again. Except, this time, the tech’s hand reached past him and grabbed a drive a few slots away. That unfortunate drive had passed the night before, a full-fledged crash. Zach, like all the other drives nearby, had heard the screams.
Another temp drive Zach knew from the temp table replaced the dead drive, then the lid was closed, the pod slid back into place, and power was restored. With that Zach, doubled down on getting rebuilt — maybe if he could get done before the clone was finished then he could stay. What Zach didn’t know was that the clone process for the drive he had replaced had failed. This happens about half the time. Zach was home free; he just didn’t know it.
In a couple of days, Zach was finished rebuilding and become a real member of a production Backblaze Storage Pod. He now spends his days storing and retrieving data, getting his bits tested by shard integrity checks, and having his S.M.A.R.T. stats logged for the Backblaze Drive Stats. His hard drive life is better than he ever dreamed.
The only problem: both hosted storage (through existing cloud services) and purchased hardware (buying servers from Dell or Microsoft) were too expensive to hit this price point. Enter Tim Nufire, aka: The Podfather.
Tim led the effort to build what we at Backblaze call the Storage Pod: The physical hardware our company has relied on for data storage for more than a decade. On the occasion of the decade anniversary of the open sourcing of our Storage Pod 1.0 design, we sat down with Tim to relive the twists and turns that led from a crew of backup enthusiasts in an apartment in Palo Alto to a company with four data centers spread across the world holding 2100 storage pods and closing in on an exabyte of storage.
✣ ✣ ✣
Editors: So Tim, it all started with the $5 price point. I know we did market research and that was the price at which most people shrugged and said they’d pay for backup. But it was so audacious! The tech didn’t exist to offer that price. Why do you start there?
Tim Nufire: It was the pricing given to us by the competitors, they didn’t give us a lot of choice. But it was never a challenge of if we should do it, but how we would do it. I had been managing my own backups for my entire career; I cared about backups. So it’s not like backup was new, or particularly hard. I mean, I firmly believe Brian Wilson’s (Backblaze’s Chief Technical Officer) top line: You read a byte, you write a byte. You can read the byte more gently than other services so as to not impact the system someone is working on. You might be able to read a byte a little faster. But at the end of the day, it’s an execution game not a technology game. We simply had to out execute the competition.
E: Easy to say now, with a company of 113 employees and more than a decade of success behind us. But at that time, you were five guys crammed into a Palo Alto apartment with no funding and barely any budget and the competition — Dell, HP, Amazon, Google, and Microsoft — they were huge! How do you approach that?
TN: We always knew we could do it for less. We knew that the math worked. We knew what the cost of a 1 TB hard drive was, so we knew how much it should cost to store data. We knew what those markups were. We knew, looking at a Dell 2900, how much the margin was in that box. We knew they were overcharging. At that time, I could not build a desktop computer for less than Dell could build it. But I could build a server at half their cost.
I don’t think Dell or anyone else was being irrational. As long as they have customers willing to pay their hard margins, they can’t adjust for the potential market. They have to get to the point where they have no choice. We didn’t have that luxury.
So, at the beginning, we were reluctant hardware manufacturers. We were manufacturing because we couldn’t afford to pay what people were charging, not because we had any passion for hardware design.
E: Okay, so you came on at that point to build a cloud. Is that where your title comes from? Chief Cloud Officer? The pods were a little ways down the road, so Podfather couldn’t have been your name yet. …
TN: This was something like December, 2007. Gleb (Budman, the Chief Executive Officer of Backblaze) and I went snowboarding up in Tahoe, and he talked me into joining the team. … My title at first was all wrong, I never became the VP of Engineering, in any sense of the word. That was never who I was. I held the title for maybe five years, six years before we finally changed it. Chief Cloud Officer means nothing, but it fits better than anything else.
E: It does! You built the cloud for Backblaze with the Storage Pod as your water molecule (if we’re going to beat the cloud metaphor to death). But how does it all begin? Take us back to that moment: the podception.
TN: Well, the first pod, per se, was just a bunch of USB drives strapped to a shelf in the data center attached to two Dell 2900 towers. It didn’t last more than an hour in production. As soon as it got hit with load, it just collapsed. Seriously! We went live on this and it lasted an hour. It was a complete meltdown.
Two things happened: The bus was completely unstable, so the USB drives were unstable. Second, the DRDB (Distributed Replicated Block Device) — which is designed to protect your data by live mirroring it between the two towers — immediately fell apart. You implement a DRDB not because it works in a well-running situation, but because it covers you in the failure mode. And in failure mode it just unraveled — in an hour. It went into a split-brain mode under the hardware failures that the USB drives were causing. A well-running DRDB is fully mirrored, and split-brained mode is when the two sides simply give up and start acting autonomously because they don’t know what the other side is doing and they’re not sure who is boss. The data is essentially inconsistent at that point because you can choose A or B but the two sides are not in agreement.
While the USB specs say you can connect something like 256 or 128 drives to a hub, we were never able to do more than like, five. After something like five or six, the drives just start dropping out. We never really figured it out because we abandoned the approach. I just took the drives out and shoved them inside of the Dells, and those two became pods number 0 and 1. The Dells had room for 10 or 8 drives apiece, and so we brought that system live.
That was what the first six years of this company was like, just a never-ending stream of those kind of moments — mostly not panic inducing, mostly just: you put your head down and you start working through the problems. There’s a little bit of adrenaline, that feeling before a big race of an impending moment. But you have to just keep going.
E: Wait, so this wasn’t in testing? You were running this live?
TN: Totally! We were in friends-and-family beta at the time. But the software was all written. We didn’t have a lot of customers, but we had launched, and we managed to recover the files: whatever was backed up. The system has always had self healing built into the client.
E: So where do you go from there? What’s the next step?
TN: These were the early days. We were terrified of any commitments. So I think we had leased a half cabinet at the 365 Main facility in San Francisco, because that was the most we could imagine committing to in a contract: We committed to a year’s worth of this tiny little space.
We had those first two pods — the two Dell Towers (0 and 1) — which we eventually built out using external exclosures. So those guys had 40 or 45 drives by the end, with these little black boxes attached to them.
Pod number 2 was the plywood pod, which was another moment of sitting in the data center with a piece of hardware that just didn’t work out of the gate. This was Chris Robertson’s prototype. I credit him with the shape of the basic pod design, because he’s the one that came up with the top loaded 45 drives design. He mocked it up in his home woodshop (also known as a garage).
E: Wood in a data center? Come on, that’s crazy, right?
TN: It was what we had! We didn’t have a metal shop in our garage, we had a woodshop in our garage, so we built a prototype out of plywood, painted it white, and brought it to the data center. But when I went to deploy the system, I ended up having to recable and rewire and reconfigure it on the fly, sitting there on the floor of the data center, kinda similar to the first day.
The plywood pod was originally designed to be 45 drives, top loaded with port multipliers — we didn’t have backplanes. The port multipliers were these little cards that took one set of cables in and five cables out. They were cabled from the top. That design never worked. So what actually got launched was a fifteen drive system that had these little five drive enclosures that we shoved into the face of the plywood pod. It came up as a 15 drive, traditionally front-mounted design with no port multipliers. Nothing fancy there. Those boxes literally have five SATA connections on the back, just a one-to-one cabling.
E: What happened to the plywood pod? Clearly it’s cast in bronze somewhere, right?
TN: That got thrown out in the trash in Palo Alto. I still defend the decision. We were in a small one-bedroom apartment in Palo Alto and all this was cruft.
E: Brutal! But I feel like this is indicative of how you were working. There was no looking back.
TN: We didn’t have time to ask the question of whether this was going to work. We just stayed ahead of the problems: Pods 0 and 1 continued to run, pod 2 came up as a 15 drive chassis, and runs.
The next three pods are the first where we worked with Protocase. These are the first run of metal — the ones where we forgot a hole for the power button, so you’ll see the pried open spots where we forced the button in. These are also the first three with the port-multiplier backplane. So we built a chassis around that, and we had horrible drive instability.
We were using the Western Digital Green, 1 TB drives. But we couldn’t keep them in the RAID. We wrote these little scripts so that in the middle of the night, every time a drive dropped out of the array, the script would put it back in. It was this constant motion and churn creating a very unstable system.
We suspected the problem was with power. So we made the octopus pod. We drilled holes in the bottom, and ran it off of three PSUs beneath it. We thought: “If we don’t have enough power, we’ll just hit it with a hammer.” Same thing on cooling: “What if it’s getting too hot?” So we put a box fan on top and blew a lot of air into it. We were just trying to figure out what it was that was causing trouble and grief. Interestingly, the array in the plywood pod was stable, but when we replaced the enclosure with steel, it became unstable as well!
We slowly circled in on vibration as the problem. That plywood pod had actual disk enclosure with caddies and good locking mechanisms, so we thought the lack of caddies and locking mechanisms could be the issue. I was working with Western Digital at the time, too, and they were telling me that they also suspected vibration as the culprit. And I kept telling them, ‘They are hard drives! They should work!’
At the time, Western Digital was pushing me to buy enterprise drives, and they finally just gave me a round of enterprise drives. They were worse than the consumer drives! So they came over to the office to pick up the drives because they had accelerometers and lot of other stuff to give us data on what was wrong, and we never heard from them again.
We learned later that, when they showed up in an office in a one bedroom apartment in Palo Alto with five guys and a dog, they decided that we weren’t serious. It was hard to get a call back from them after that … I’ll admit, I was probably very hard to deal with at the time. I was this ignorant wannabe hardware engineer on the phone yelling at them about their hard drives. In hindsight, they were right; the chassis needed work.
But I just didn’t believe that vibration was the problem. It’s just 45 drives in a chassis. I mean, I have a vibration app on my phone, and I stuck the phone on the chassis and there’s vibration, but it’s not like we’re trying to run this inside a race car doing multiple Gs around corners, it was a metal box on a desk with hard drives spinning at 5400 or 7200 rpm. This was not a seismic shake table!
The early hard drives were secured with EPDM rubber bands. It turns out that real rubber (latex) turns into powder in about two months in a chassis, probably from the heat. We discovered this very quickly after buying rubber bands at Staples that just completely disintegrated. We eventually got better bands, but they never really worked. The hope was that they would secure a hard drive so it couldn’t vibrate its neighbors, and yet we were still seeing drives dropping out.
At some point we started using clamp down lids. We came to understand that we weren’t trying to isolate vibration between the drives, but we were actually trying to mechanically hold the drives in place. It was less about vibration isolation, which is what I thought the rubber was going to do, and more about stabilizing the SATA connector on the backend, as in: You don’t want the drive moving around in the SATA connector. We were also getting early reports from Seagate at the time. They took our chassis and did vibration analysis and, over time, we got better and better at stabilizing the drives.
We started to notice something else at this time: The Western Digital drives had these model numbers followed by extension numbers. We realized that drives that stayed in the array tended to have the same set of extensions. We began to suspect that those extensions were manufacturing codes, something to do with which backend factory they were built in. So there were subtle differences in manufacturing processes that dictated whether the drives were tolerant of vibration or not. Central Computer was our dominant source of hard drives at the time, and so we were very aggressively trying to get specific runs of hard drives. We only wanted drives with a certain extension. This was before the Thailand drive crisis, before we had a real sense of what the supply chain looked like. At that point we just knew some drives were better than others.
E: So you were iterating with inconsistent drives? Wasn’t that insanely frustrating?
TN: No, just gave me a few more gray hairs. I didn’t really have time to dwell on it. We didn’t have a choice of whether or not to grow the storage pod. The only path was forward. There was no plan B. Our data was growing and we needed the pods to hold it. There was never a moment where everything was solved, it was a constant stream of working on whatever the problem was. It was just a string of problems to be solved, just “wheels on the bus.” If the wheels fall off, put them back on and keep driving.
E: So what did the next set of wheels look like then?
TN: We went ahead with a second small run of steel pods. These had a single Zippy power supply, with the boot drive hanging over the motherboard. This design worked until we went to 1.5TB drives and the chassis would not boot. Clearly a power issue, so Brian Wilson and I sat there and stared at the non-functioning chassis trying to figure out how to get more power in.
The issue with power was not that we were running out of power on the 12V rail. The 5V rail was the issue. All the high end, high-power PSUs give you more and more power on 12V because that’s what the gamers need — it’s what their CPUs and the graphics card need, so you can get a 1000W or a 1500W power supply and it gives you a ton of power on 12V, but still only 25 amps on 5V. As a result, it’s really hard to get more power on the 5V rail, and a hard drive takes 12V and 5V: 12V to spin the motor and 5V to power the circuit board. We were running out of the 5V.
So our solution was two power supplies, and Brian and I were sitting there trying to visually imagine where you could put another power supply. Where are you gonna put it? We can put it were the boot drive is, and move the boot drive to the side, and just kind of hang the PSU up and over the motherboard. But the biggest consequence with this was, again, vibration. Mounting the boot drive to the side of a vibrating chassis isn’t the best place for a boot drive. So we had higher than normal boot drive failures in those nine.
So the next generation, after pod number 8, was the beginning of Storage Pod 1.0. We were still using rubber bands, but it had two power supplies, 45 drives, and we built 20 of them, total. Casey Jones, as our designer, also weighed in at this point to establish how they would look. He developed the faceplate design and doubled down on the deeper shade of red. But all of this was expensive and scary for us: We’re gonna spend $10 grand!? We don’t have much money. We had been two years without salary at this point.
We talked to Ken Raab from Sonic Manufacturing, and he convinced us that he could build our chassis, all in, for less than we were paying. He would take the task off my plate, I wouldn’t have to build the chassis, and he would build the whole thing for less than I would spend on parts … and it worked. He had better backend supplier connections, so he could shave a little expense off of everything and was able to mark up 20%.
We fixed the technology and the human processes. On the technology side, we were figuring out the hardware and hard drives, we were getting more and more stable. Which was required. We couldn’t have the same failure rates we were having on the first three pods. In order to reduce (or at least maintain) the total number of problems per day, you have to reduce the number of problems per chassis, because there’s 32 of them now.
We were also learning how to adapt our procedures so that the humans could live. By “the Humans,” I mean me and Sean Harris who joined me in 2010. There are physiological and psychological limits to what is sustainable and we were nearing our wits end.… So, in addition to stabilizing the chassis design, we got better at limiting the type of issues that would wake us up in the middle of the night.
E: So you reached some semblance of stability in your prototype and in your business. You’d been sprinting with no pay for a few years to get to this point and then … you decide to give away all your work for free? You open sourced Storage Pod 1.0 on September 9th, 2009. Were you a nervous wreck that someone was going to run away with all your good work?
TN: Not at all. We were dying for press. We were ready to tell the world anything they would listen to. We had no shame. My only regret is that we didn’t do more. We open sourced our design before anyone was doing that, but we didn’t build a community around it or anything.
Remember, we didn’t want to be a manufacturer. We would have killed for someone to build our pods better and cheaper than we could. Our hope from the beginning was always that we would build our own platform until the major vendors did for the server market what they did in the personal computing market. Until Dell would sell me the box that I wanted at the price I could afford, I was going to continue to build my chassis. But I always assumed they would do it faster than a decade.
Supermicro tried to give us a complete chassis at one point, but their problem wasn’t high margin; they were targeting too high of performance. I needed two things: Someone to sell me a box and not make too much profit off of me, and I needed someone who would wrap hard drives in a minimum performance enclosure and not try to make it too redundant or high performance. Put in one RAID controller, not two; daisy chain all the drives; let us suffer a little! I don’t need any of the hardware that can support SSDs. But no matter how much we ask for barebones servers, no one’s been able to build them for us yet.
So we’ve continued to build our own. And the design has iterated and scaled with our business. So we’ll just keep iterating and scaling until someone can make something better than we can.
E: Which is exactly what we’ve done, leading from Storage Pod 1.0 to 2.0, 3.0, 4.0, 4.5, 5.0, to 6.0 (if you want to learn more about these generations, check out our Pod Museum), preparing the way for more than 800 petabytes of data in management.
✣ ✣ ✣
But while Tim is still waiting to pass along the official Podfather baton, he’s not alone. There was the early help from Brian Wilson, Casey Jones, Sean Harris, and a host of others, and then in 2014, Ariel Ellis came aboard to wrangle our supply chain. He grew in that role over time until he took over the responsibility over charting the future of the Pod via Backblaze Labs, becoming the Podson, so to speak. Today, he’s sketching the future of Storage Pod 7.0, and — provided no one builds anything better in the meantime — he’ll tell you all about it on our blog.
This post is for all of the storage geeks out there who have followed the adventures of Backblaze and our Storage Pods over the years. The rest of you are welcome to come along for the ride.
It has been 10 years since Backblaze introduced our Storage Pod to the world. In September 2009, we announced our hulking, eye-catching, red 4U storage server equipped with 45 hard drives delivering 67 terabytes of storage for just $7,867 — that was about $0.11 a gigabyte. As part of that announcement, we open-sourced the design for what we dubbed Storage Pods, telling you and everyone like you how to build one, and many of you did.
Backblaze Storage Pod version 1 was announced on our blog with little fanfare. We thought it would be interesting to a handful of folks — readers like you. In fact, it wasn’t even called version 1, as no one had ever considered there would be a version 2, much less a version 3, 4, 4.5, 5, or 6. We were wrong. The Backblaze Storage Pod struck a chord with many IT and storage folks who were offended by having to pay a king’s ransom for a high density storage system. “I can build that for a tenth of the price,” you could almost hear them muttering to themselves. Mutter or not, we thought the same thing, and version 1 was born.
Tim, the “Podfather” as we know him, was the Backblaze lead in creating the first Storage Pod. He had design help from our friends at Protocase, who built the first three generations of Storage Pods for Backblaze and also spun out a company named 45 Drives to sell their own versions of the Storage Pod — that’s open source at its best. Before we decided on the version 1 design, there were a few experiments along the way:
The original Storage Pod was prototyped by building a wooden pod or two. We needed to test the software while the first metal pods were being constructed.
The Octopod was a quick and dirty response to receiving the wrong SATA cables — ones that were too long and glowed. Yes, there are holes drilled in the bottom of the pod.
The original faceplate shown above was used on about 10 pre-1.0 Storage Pods. It was updated to the three circle design just prior to Storage Pod 1.0.
Why are Storage Pods red? When we had the first ones built, the manufacturer had a batch of red paint left over that could be used on our pods, and it was free.
Back in 2007, when we started Backblaze, there wasn’t a whole lot of affordable choices for storing large quantities of data. Our goal was to charge $5/month for unlimited data storage for one computer. We decided to build our own storage servers when it became apparent that, if we were to use the other solutions available, we’d have to charge a whole lot more money. Storage Pod 1.0 allowed us to store one petabyte of data for about $81,000. Today we’ve lowered that to about $35,000 with Storage Pod 6.0. When you take into account that the average amount of data per user has nearly tripled in that same time period and our price is now $6/month for unlimited storage, the math works out about the same today as it did in 2009.
We Must Have Done Something Right
The Backblaze Storage Pod was more than just affordable data storage. Version 1.0 introduced or popularized three fundamental changes to storage design: 1) You could build a system out of commodity parts and it would work, 2) You could mount hard drives vertically and they would still spin, and 3) You could use consumer hard drives in the system. It’s hard to determine which of these three features offended and/or excited more people. It is fair to say that ten years out, things worked out in our favor, as we currently have about 900 petabytes of storage in production on the platform.
Over the last 10 years, people have warmed up to our design, or at least elements of the design. Starting with 45 Drives, multitudes of companies have worked on and introduced various designs for high density storage systems ranging from 45 to 102 drives in a 4U chassis, so today the list of high-density storage systems that use vertically mounted drives is pretty impressive:
Exos AP 4U100
Thunder SX FA100-B7118
Viking Enterprise Solutions
Viking Enterprise Solutions
Viking Enterprise Solutions
Another driver in the development of some of these systems is the Open Compute Project (OCP). Formed in 2011, they gather and share ideas and designs for data storage, rack designs, and related technologies. The group is managed by The Open Compute Project Foundation as a 501(c)(6) and counts many industry luminaries in the storage business as members.
What Have We Done Lately?
In technology land, 10 years of anything is a long time. What was exciting then is expected now. And the same thing has happened to our beloved Storage Pod. We have introduced updates and upgrades over the years twisting the usual dials: cost down, speed up, capacity up, vibration down, and so on. All good things. But, we can’t fool you, especially if you’ve read this far. You know that Storage Pod 6.0 was introduced in April 2016 and quite frankly it’s been crickets ever since as it relates to Storage Pods. Three plus years of non-innovation. Why?
If it ain’t broke, don’t fix it. Storage Pod 6.0 is built in the US by Equus Compute Solutions, our contract manufacturer, and it works great. Production costs are well understood, performance is fine, and the new higher density drives perform quite well in the 6.0 chassis.
Disk migrations kept us busy. From Q2 2016 through Q2 2019 we migrated over 53,000 drives. We replaced 2, 3, and 4 terabyte drives with 8, 10, and 12 terabyte drives, doubling, tripling and sometimes quadrupling the storage density of a storage pod.
Lots of data kept us busy. In Q2 2016, we had 250 petabytes of data storage in production. Today, we have 900 petabytes. That’s a lot of data you folks gave us (thank you by the way) and a lot of new systems to deploy. The chart below shows the challenge our data center techs faced.
In other words, our data center folks were really, really busy, and not interested in shiny new things. Now that we’ve hired a bunch more DC techs, let’s talk about what’s next.
Storage Pod Version 7.0 — Almost
Yes, there is a Backblaze Storage Pod 7.0 on the drawing board. Here is a short list of some of the features we are looking at:
Updating the motherboard
Upgrade the CPU and consider using an AMD CPU
Updating the power supply units, perhaps moving to one unit
Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
Upgrading the SATA cards
Modifying the tool-less lid design
The timeframe is still being decided, but early 2020 is a good time to ask us about it.
“That’s nice,” you say out loud, but what you are really thinking is, “Is that it? Where’s the Backblaze in all this?” And that’s where you come in.
The Next Generation Backblaze Storage Pod
We are not out of ideas, but one of the things that we realized over the years is that many of you are really clever. From the moment we open sourced the Storage Pod design back in 2009, we’ve received countless interesting, well thought out, and occasionally odd ideas to improve the design. As we look to the future, we’d be stupid not to ask for your thoughts. Besides, you’ll tell us anyway on Reddit or HackerNews or wherever you’re reading this post, so let’s just cut to the chase.
Build or Buy
The two basic choices are: We design and build our own storage servers or we buy them from someone else. Here are some of the criteria as we think about this:
Cost: We’d like the cost of a storage server to be about $0.030 – $0.035 per gigabyte of storage (or less of course). That includes the server and the drives inside. For example, using off-the-shelf Seagate 12 TB drives (model: ST12000NM0007) in a 6.0 Storage Pod costs about $0.032-$0.034/gigabyte depending on the price of the drives on a given day.
Maintenance: Things should be easy to fix or replace — especially the drives.
Commodity Parts: Wherever possible, the parts should be easy to purchase, ideally from multiple vendors.
Racks: We’d prefer to keep using 42” deep cabinets, but make a good case for something deeper and we’ll consider it.
Possible Today: No DNA drives or other wistful technologies. We need to store data today, not in the year 2061.
Scale: Nothing in the solution should limit the ability to scale the systems. For example, we should be able to upgrade drives to higher densities over the next 5-7 years.
Other than that there are no limitations. Any of the following acronyms, words, and phrases could be part of your proposed solution and we won’t be offended: SAS, JBOD, IOPS, SSD, redundancy, compute node, 2U chassis, 3U chassis, horizontal mounted drives, direct wire, caching layers, appliance, edge storage units, PCIe, fibre channel, SDS, etc.
The solution does not have to be a Backblaze one. As the list from earlier in this post shows, Dell, HP, and many others make high density storage platforms we could leverage. Make a good case for any of those units, or any others you like, and we’ll take a look.
What Will We Do With All Your Input?
We’ve already started by cranking up Backblaze Labs again and have tried a few experiments. Over the coming months we’ll share with you what’s happening as we move this project forward. Maybe we’ll introduce Storage Pod X or perhaps take some of those Storage Pod knockoffs for a spin. Regardless, we’ll keep you posted. Thanks in advance for your ideas and thanks for all your support over the past ten years.
Prost!Skål!Cheers! Celebrate with us as we travel to Amsterdam for IBC, the premier conference and expo for media and entertainment technology in Europe. The show gives us a chance to raise a glass with our partners, customers, and future customers across the pond. And we’re especially pleased that IBC coincides with the opening of our new European data center.
How will we celebrate? With the Backblaze Partner Crawl, a rolling series of parties on the show floor from 13-16 September. Four of our Europe-based integration partners have graciously invited us to co-host drinks and bites in their stands throughout the show.
If you can make the trip to IBC, you’re invited to toast us with a skål! with our Swedish friends at Cantemo on Friday, a prost! with our German friends at Archiware on Saturday, or a cheers! with UK-based friends at Ortana and GB Labs on Sunday or Monday, respectively. Or drop in every day and keep the Backblaze Partner Crawl rolling. And if you can’t make it to IBC this time, we encourage you to raise a glass and toast anyway.
Skål! on Friday With Cantemo
Cantemo’s iconik media management makes sharing and collaborating on media effortless, regardless of wherever you want to do business. Cantemo announced the integration of iconik with Backblaze’s B2 Cloud Storage last fall, and since then we’ve been amazed by customers like Everwell, who replaced all their on-premises storage with a fully cloud-based production workflow. For existing Backblaze customers, iconik can speed up your deployment by ingesting content already uploaded to B2 without having to download files and upload them again. You can also stop by the Cantemo booth anytime during IBC to see a live demo of iconik and Backblaze in action. Or schedule an appointment and we’ll have a special gift waiting for you.
Join us at Cantemo on Friday 13 September from 16:30-18:00 at Hall 7 — 7.D67
Prost! on Saturday With Archiware
With the latest release of their P5 Archive featuring B2 support, Archiware makes archiving to the cloud even easier. Archiware customers with large existing archives can use the Backblaze Fireball to rapidly import archived content directly to their B2 account. At IBC, we’re also unveiling our latest joint customer, Baron & Baron, a creative agency that turned to P2 and B2 to back up and archive their dazzling array of fashion and luxury brand content.
Join us at Archiware on Saturday 14 September from 16:30-18:00 at Hall 7 — 7.D35
Cheers! on Sunday With Ortana
Ortana integrated their Cubix media asset management and orchestration platform with B2 way back in 2016 during B2’s beta period, making them among our first media workflow partners. More recently, Ortana joined our Migrate or Diewebinar and blog series, detailing strategies for how you can migrate archived content from legacy platforms before they go extinct.
Join us at Ortana on Sunday 15 September from 16:30-18:00 at Hall 7 — 7.C63
Cheers! on Monday With GB Labs
If you were at the NAB Show last April, you may have heard GB Labs was integrating their automation tools with B2. It’s official now, as detailed in their announcement in June. GB Labs’ automation allows you to streamline tasks that would otherwise require tedious and repetitive manual processes, and now supports moving files to and from your B2 account.
Join us at GB Labs Monday 16 September from 17:00-18:00 at Hall 7 — 7.B26
Say Hello Anytime to Our Friends at CatDV
CatDV media asset management helps teams organize, communicate, and collaborate effectively, including archiving content to B2. CatDV has been integrated with B2 for over two years, allowing us to serve customers like UC Silicon Valley, who built an end-to-end collaborative workflow for a 22 member team creating online learning videos.
Stop by CatDV anytime at Hall 7 — 7.A51
But we’re not the only ones making a long trek to Amsterdam for IBC. While you’re roaming around Hall 7, be sure to stop by our other partners traveling from near and far to learn what our joint solutions can do for you:
EditShare (shared storage with MAM) Hall 7 — 7.A35
ProMax (shared storage with MAM) Hall 7 — 7.D55
StorageDNA (smart migration and storage) Hall 7 — 7.A32
FileCatalyst (large file transfer) Hall 7 — 7.D18
eMAM (web-based DAM) Hall 7 — 7.D27
Facilis Technology (shared storage) Hall 7 — 7.B48
GrayMeta (metadata extraction and insight) Hall 7 — 7.D25
Hedge (backup software) Hall 7 — 7.A56
axle ai (asset management) Hall 7 — 7.D33
Tiger Technology (tiered data management) Hall 7 — 7.B58
We’re hoping you’ll join us for one or more of our Partner Crawl parties. If you want a quieter place and time to discuss how B2 can streamline your workflow, please schedule an appointment with us so we can give you the attention you need.
Finally, if you can’t join us in Amsterdam, open a beer, pour a glass of wine or other drink, and toast to our new European data center, wherever you are, in whatever language you speak. As we say here in the States, Bottoms up!
Imagine a globe spinning (or simply look at the top of this blog post). When you start out on a data center search, you could consider almost any corner of the globe. For Backblaze, we knew we wanted to find an anchor location in the European Union. For a variety of reasons, we quickly narrowed in on Amsterdam, Brussels and Dublin as the most likely locations. While we were able to generate a list of 40 qualified locations, narrowed it down to ten for physical visits, and then narrowed it yet again to three finalists, the question remained: How would we choose our ultimate partner? Data center searches have changed a lot since 2012 when we circulated our RFP for a previous expansion.
The good news is we knew our top line requirements would be met. Thinking back to the 2×2 that our Chief Cloud Officer, Tim Nufire, had drawn on the board at the early stages of our search, we felt good that we had weighed the tradeoffs appropriately.
Similarly to hiring an employee, after the screening and the interviews, one runs reference checks. In the case of data centers, that means both validating certain assertions and going into the gory details on certain operational capabilities. For example, in our second post in the EU DC series, we mentioned environmental risks. If one is looking to reduce the probability of catastrophe, making sure that your DC is outside of a flood zone is generally advisable. Of course, the best environmental risk factor reports are much more nuanced and account for changes in the environment.
To help us investigate those sorts of issues, we partnered with PTS Consulting. By engaging with third party experts, we get dispassionate, unbiased, thorough reporting about the locations we are considering. Based on PTS’s reporting, we eliminated one of our finalists. To be clear, there was nothing inherently wrong with the finalist, but it was unlikely that particular location would sustainably meet our long term requirements without significant infrastructure upgrades on their end.
In our prior posts, we mentioned another partner, UpStack. Their platform helped us with the sourcing and narrowing down to a list of finalists. Importantly, their advisory services were crucial in this final stage of diligence. Specifically, UpStack brought in electrical engineering expertise to give us a deep, detailed assessment of the electrical mechanical single line diagrams. For those less versed in the aspects of DC power, that means UpStack was able to go into incredible granularity in looking at the reliability and durability of the power sources of our DCs.
Ultimately, it came down to two finalists:
DC 3: Interxion Amsterdam
DC 4: The pre-trip favorite
DC four had a lot of things going for it. The pricing was the most affordable and the facility had more modern features and functionality. The biggest downsides were open issues around sourcing and training what would become our remote hands team.
Which gets us back to our matrix of tradeoffs. While more expensive than DC three, Interxion facility graded out equally well during diligence. Ultimately, the people at Interxion and confidence in the ability to build out a sturdy remote hands team made the choice of Interxion clear.
Looking back at Tim’s 2×2, DC four presented as financially more affordable, but operationally a little more risky (since we had questions about our ability to effectively operate on a day to day basis).
Interxion, while a little more financially expensive, reduced our operational risks. When thinking of our anchor location in Europe, that felt like the right tradeoff to be making.
Ready, Set, More Work!
The site selection only represented part of the journey. In parallel, our sourcing team has had to learn how to get pods and drives into Europe. Our Tech Ops & Engineering teams have worked through any number of issues around latency, performance, and functionality. Finance & Legal has worked through the implications of having a physical international footprint. And that’s just to name a few things.
If you’re in the EU, we’ll be at IBC 2019 in Amsterdam from September 13 to September 17. If you’re interested in making an appointment to chat further, use our form to reserve a time at IBC, or drop by stand 7.D67 at IBC (our friends from Cantemo are hosting us). Or, if you prefer, feel free to leave any questions in the comments below!
Ten locations, three countries, three days. Even the hardest working person in show business wouldn’t take on challenges like that. But for our COO, John Tran, and UpStack’s CEO, Chris Trapp, that’s exactly what they decided to do.
In yesterday’s post, we discussed the path to getting 40 bids from vendors that could meet our criteria for our new European data center (DC). This was a remarkable accomplishment in itself, but still only part way to our objective of actually opening a DC. We needed to narrow down the list.
With help from UpStack, we began to filter the list based on some qualitative characteristics: vendor reputation, vendor business focus, etc. Chris managed to get us down to a list of 10. The wonders of technology today, like the UpStack platform, help people get more information and cast wider nets then at any other time in human history. The downside of that is you get a lot of information on paper, but that is a poor substitute to what you can gather in person. If you’re looking for a good, long term partner then understanding things like how they operate and their company DNA is imperative to finding the right match. So, to find our newest partner, we needed to go for a trip.
Chris took the lead on booking appointments. The majority of the shortlist clustered in the Netherlands and Ireland. The others were in Belgium and with the magic of Google Maps, one could begin to envision an efficient trip to all three countries. The feeling was it could all be done with just three days on the ground in Europe. Going in, they knew it would be a compressed schedule and that they would be on the move. As experienced travelers, they brought small bags that easily fit in the overhead and the right power adapters.
Hitting the Road
On July 23rd, 2018, John left San Francisco International Airport (SFO) at 7:40 a.m. on a non-stop to Amsterdam. Taking into account the 5,448 miles between the two cities and the time change, John landed at Amsterdam Airport Schiphol (AMS) hours at 7:35 a.m. on July 24th. He would land back home on July 27th at 6:45 p.m.
Tuesday (Day One)
The first day officially started when John’s redeye touched down in Amsterdam at 7:35 a.m. local. Thankfully, Chris’ flight from New York’s La Guardia was also on time. With both flights on time, they were able to meet at the airport: literally, for they had never met before.
Both adjourned to the airport men’s room to change out of their travel clothes and into their suits — choosing a data center is serious business, after all. While airport bathroom changes are best left for spy novels, John and Chris made short work of it and headed to the rental car area.
That day, they’ll ended up touring four DCs. One of the biggest takeaways of the trip was that it turned out visiting data centers is similar to wine tasting. While some of the differences can be divined from the specs on paper, when trying to figure out the difference between A and B, it’s very helpful to compare side by side. Also similar to wine tasting, there’s a fine line between understanding nuances between multiple things and it all starting to blend together. In both cases, after a full day of doing it, you feel like you probably shouldn’t operate heavy machinery.
On day one, our team saw a wide range of options. The physical plant is itself one area of differentiation. While we have requirements for things like power, bandwidth, and security, there’s still a lot of room for tradeoffs among those DCs that exceed the requirement. And that’s just the physical space. The first phase of successful screening (discussed in our prior post) is being effective at examining non-emotional decision variables — specs, price, reputation — but not the people. Every DC is staffed by human beings and cultural fit is important with any partnership. Throughout the day, one of the biggest differences we noticed was the culture of each specific DC.
The third stop of the day was Interxion Amsterdam. While we didn’t know it at the time, they would end up being our partner of choice. On paper, it was clear that Interxion would be a contender. Its impressive facility meets all our requirements and, by happenstance, happens to have a footprint available that is almost exactly to the spec of what we were looking for. During our visit, the facility was impressive, as expected. But the connection we felt with the team there would prove to be the thing that would ultimately be the difference.
After leaving the last DC tour around 7pm, our team drove from Amsterdam to Brussels. Day 2 would be another morning start and, after arriving in Brussels a little after 9pm, they had earned some rest!
Insider Tip:Earlier in his career, John had spent a good amount of time in Europe and, specifically, Brussels. One of his favorite spots is the Grand Place (Brussels’ Central Market). If in the neighborhood, he recommends you go and enjoy a Belgium beer sitting at one of the restaurants in the market. The smart move is to take the advice. Chris, newer to Brussels, gave John’s tour a favorable TripAdvisor rating.
Wednesday (Day Two)
After getting a well-deserved couple hours of sleep, the day officially started with an 8:30 a.m. meeting for the first DC of the day. Major DC operators generally have multiple locations and DCs five and six are operated by companies that also operate sites visited on day one. It was remarkable, culturally, to compare the teams and operational variability across multiple locations. Even within the same company, teams at different locations have unique personalities and operating styles, which all serves to reinforce the need to physically visit your proposed partners before making a decision.
After two morning DC visits, John and Chris hustled to the Brussels airport to catch their flight to Dublin. At some point during the drive, it was realized that tickets to Dublin hadn’t actually been purchased. Smartphones and connectivity are transformative on road trips like this.
The flight itself was uneventful. When they landed, they got to the rental car area and their car was waiting for them. Oh, by the way, minor detail but the steering wheel was on the wrong side of the car! Chris buckled in tightly and John had flashbacks of driver’s ed having never driven on the right side of the car. Shortly after leaving the airport, it was realized that one also drives on the left side of the road in Ireland. Smartphones and connectivity were not required for this discovery. Thankfully, the drive was uneventful and the hotel was reached without incident. After work and family check ins, another day was put on the books.
Our team checked into their hotel and headed over to the Brazenhead for dinner. Ireland’s oldest pub is worth the visit. It’s here that we come across our it really is a small world nomination for the trip. After starting a conversation with their neighbors at dinner, our team was asked what they were doing in Dublin. John introduced himself as Backblaze’s COO and the conversation seemed to cool a bit. Apparently their neighbor was someone from another large cloud storage provider. Apparently, not all companies like sharing information as much as we do.
Thursday (Day Three)
The day again started with an 8:30 a.m. hotel departure. Bear in mind, during all of this, John and Chris both had their day jobs and families back home to stay in touch with. Today would feature four DC tours. One interesting note about the trip: operating a data center requires a fair amount of infrastructure. In a perfect world, power and bandwidth come in at multiple locations from multiple vendors. This often causes DCs to cluster around infrastructure hubs. Today’s first two DCs were across the street from one another. We’re assuming, but could not verify, a fierce inter-company football rivalry.
While walking across the street was interesting, in the case of the final two DCs, they literally shared the same space; the smaller provider subleasing space from the larger. Here, again, the operating personalities differentiated the companies. It’s not necessarily that one was worse than the other, it is a question of whom you think will be a better partnership match for your own style. In this case, the smaller of the two providers stood out because of the passion and enthusiasm we felt from the team there, and it didn’t hurt that they are long time Hard Drive Stats enthusiasts (flattery will get you everywhere!).
While the trip, and this post, were focused on finding our new DC location, opening up our first physical operations outside of the U.S. had any number of business ramifications. As such, John made sure to swing by the local office of our global accounting firm to take the opportunity to get to know them.
The meeting wrapped up just in time for Chris and John to make it to the Guinness factory by 6:15 p.m. Upon arrival, it was then realized that the last entry into the Guinness factory is 6 p.m. Smartphones and connectivity really can be transformative on road trips like this. All that said, without implicating any of the specific actors, our fearless travelers managed to finagle their way in and could file the report home that they were able to grab a pint or two at St. James’ place.
The team would leave for their respective homes early the next morning. John made it back to California in time for a (late) dinner with his family and a well earned weekend.
After a long, productive trip, we had our list of the three finalists. Tomorrow, we’ll discuss how we narrowed it down from three to one. Until then, slainte (cheers)!
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.