All posts by Patrick Thomas

A Guide to Clouds: Object, File, and Block

Post Syndicated from Patrick Thomas original

Types of Cloud Storage

“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.” –The Editors

What is Cloud Storage?

The term cloud storage is used in popular media as though everyone knows exactly what it means. But ask anyone to list and define the different types of cloud storage, and you’re likely to get some blank looks. This, despite the fact that understanding the different varieties of cloud is essential to deciding the storage solution that is right for your business. With that in mind, we’re going to take a look at the three primary types of cloud storage. Below, you’ll find a quick and easy-to-use field guide to the three basic types of cloud storage being used today: object, file, and block storage.

Your business has certain needs. Maybe you need to share content with a number of contributors, producers, or editors based around the world. Or possibly you have a complex and huge database of sales metrics you need to process or manipulate that is stressing your on-site capabilities. Or you might simply have data you need to archive. Regardless, while people are quick to recommend the “cloud” for any business scenario involving data, you need to know which cloud is right for your scenario. Read on to learn more.

The Three Types of Cloud Storage

Object Storage

Object Storage

In cloud storage, the definition of an ‘object’ is pretty simple. Object storage is literally some assemblage of data with one unique identifier and an infinite amount of metadata.

Maybe that doesn’t sound so simple after all. Let’s break it down into its components to try to make it clearer.

The Data

This data that makes up an object could be anything—an advertising jingle’s audio file, the photo album from your company party, a 300-page software manual, or simply a related grouping of bits and bytes.

The Identifier

When that data is added to object storage, it typically receives an identifier that is referred to as a Universally Unique Identifier (UUID) or a Globally Unique Identifier (GUID). These identifiers are 128-bit integers. In layman’s terms, this means that the identifier—the “name” of the object—is a complex number, of sorts. The identifier is so complex, in fact, that it allows for each identifier to be considered unique.

The Metadata

The third and final component of an object is its metadata—literally “the data about the data”—which can be any information that is used to classify or characterize the data in a particular object. This metadata could be the jingle’s name, a collection of the geographical coordinates where a set of digital pictures were taken, or the name of the author who wrote the user manual.

The Advantages of Object Storage

The primary advantages of object storage—and the reason it’s used by the majority of cloud storage providers—is that it enables the storage of massive amounts of unstructured data while still maintaining easy data accessibility. The greater amount of storage is achieved thanks to its flat structure—by using GUIDs instead of the hierarchies characteristic of file storage or block storage, object storage allows for infinite scalability. In other words, by doing away with structure, there’s more room for data.

The higher level of accessibility is largely thanks to the metadata, which is infinitely customizable. Think of the metadata as a set of labels for your data. Because this metadata can be refined and rewritten and expanded infinitely, the objects in object storage can easily be reorganized and scaled, based on different metadata criteria.

This last point is what makes object storage so popular for backup and archiving functions. Metadata’s unrestricted nature allows storage administrators to easily implement their own policies for data preservation, retention, and deletion, making it easier to reinforce data and create better “disaster recovery” strategies like Reed-Solomon Erasure Coding and Vault Architecture.

The Primary Uses of Object Storage

The main use cases for object storage include:

  • Storage of unstructured data like multimedia files
  • Storage of large data sets
  • Storage of large quantities of media assets like video footage as an archive in place of local tape drives

The prime use cases for object storage generally include storing large amounts of data that businesses need to access only periodically. For instance, if your business does a lot of production work in any medium, you probably need a lot of space to store your finished projects after their useful life is complete, but you probably also need access to the files in case you or a client need them again in the future.

This sort of accessible archive is perfect for object storage because the data doesn’t need to be highly structured. For example, KLRU, the Austin Public Television station responsible for broadcasting the famous musical showcase “Austin City Limits,” recently opted to migrate their 40+ year archive of footage into cloud storage. Object storage provided a cheap, but reliable, archive for all of their work. And their ability to organize the content with metadata meant they could easily distribute it to their network of licensees (or anyone else interested in using the content).

The scalability and flexibility of object storage has made it the go-to choice for many businesses who are transitioning to cloud solutions. That said, the relative complexity of the naming schema for the objects—that 128-bit identifier isn’t exactly user-friendly for most of us—and the metadata management approach can prove to be too complex or ill suited for certain use-cases. For media production companies and agencies this will often lead to the use of third party software (including Media Asset Managers (MAM) and Digital Asset Managers (DAM) that layer organizational schema over the top of the object store. When this doesn’t work, many turn to file storage, which we’ll discuss next.

File Storage

File Storage

For administrators in need of a friendlier user interface but smaller storage requirements—think millions of files, instead of billions—file storage might be the answer.

So what is file storage? In file storage, the data is stored in files. These files are, in turn, organized in folders, and these folders are then arranged into directories and subdirectories in a hierarchical fashion. To access a file, users or machines only need the path from directory to subdirectory to folder to file.

Because all the data stored in such a system is already organized in a hierarchical directory tree, like the files on your hard drive, it’s therefore easy to name, delete, or otherwise manipulate files without any additional interface. If you have used practically any operating system (whether Windows or Apple iOS, or whatever else), then you’re likely already familiar with these types of file and folder trees and are more than capable of working within them.

The Advantages of File Storage

The approachability of file storage is often seen as its primary advantage. But, using file storage in the cloud adds one key element: sharing. In cloud file storage, like on an individual computer, an administrator can easily set access as well as editing permissions across files and trees such that security and version control are far easier to manage. This allows for easy access sharing and thereby easy collaboration.

The disadvantage of file storage systems, however, is that if you plan for your data to grow, there is a certain point at which the hierarchy and permissions will become complex enough to slow the system significantly.

The Use Cases for File Storage

Common use cases for file storage are:

  • Storage of files for an office or directory in a content repository
  • Storage of files in a small development or data center environment that is a cost effective option for local archiving
  • Storage of data that requires data protection and easy deployment

Generally speaking, discrete amounts of structured data work well in file storage systems. If this describes your organization’s data profile, and you need robust sharing, cloud file storage could be right for you. Specific examples would include businesses that require web-based applications. In this instance, a file storage system can allow multiple users who need to manipulate files at the same time the access they need, while also clearly delineating who can make changes. Another example is data analytics operations, which often require multiple servers to modify multiple files at the same time. These requirements make file storage systems a good solution for that use case as well.

Now that you have a better idea of the differences between object and file storage, let’s take a look at block storage and its special use cases.

Block Storage

Block Storage

A lot of cloud-based enterprise workloads are currently using block storage. In this type of system, data is broken up into pieces called blocks, and then stored across a system that can be physically distributed to maximize efficiency. Each block receives a unique identifier, which allows the storage system to put the blocks back together when the data they contain are needed.

The Advantages of Block Storage

A block storage system in the cloud is used in scenarios where it’s important to be able to quickly retrieve and manipulate data, with an operating system accessing these data points directly across block volumes.

Block storage also decouples data from user environments, allowing that data to be spread across multiple environments. This creates multiple paths to the data and allows the user to retrieve it quickly. When a user or application requests data from a block storage system, the underlying storage system reassembles the data blocks and presents the data to the user or application.

The primary disadvantages of block storage are its lack of metadata, which limits organizational flexibility, and its higher price and complexity—as compared to the other solutions we’ve discussed.

The Use Cases for Block Storage

Primary use cases for block storage are:

  • Storage of databases
  • Storage for RAID volumes
  • Storage of data for critical systems that impact business operations
  • Storage of data as file systems for operating systems for virtualization software vendors

The relatively fast, reliable performance of block storage systems make them the preferred technology for databases. For the same reason block storage works well for databases, it also provides good support for enterprise applications: for transaction-based business applications, block storage ensures users are serviced quickly and reliably. Virtual machine file systems (VMFS) like VMware also tend to use block storage because of the way data is distributed across multiple volumes.

Making a Choice Between Different Types of Cloud Storage

So which cloud storage system is right for you? Block or file storage could be useful if you’re dealing with a lot of data that members of a team have to change frequently. You might find that block storage works best for you if you need to store an organized collection of data that you can access quickly. File storage has the advantage that the data is easy to manipulate directly without a custom-built interface. But if you need highly scalable storage units for relatively unstructured data, that is where object storage shines. Whatever path you decide, now you have a sense of the use cases, advantages, and disadvantages of different storage types to weigh your next step into the cloud storage ecosystem.

The post A Guide to Clouds: Object, File, and Block appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Defining an Exabyte

Post Syndicated from Patrick Thomas original

Defining an Exabyte

What is an Exabyte?

An exabyte is made up of bytes, which themselves are units of digital storage. A byte is made up of 8 bits. A bit—short for “binary digit”—is a single unit of data. Namely a 1, or a 0.

The International System of Units (SI) denotes “exa” as a multiplication by the sixth power of 1000 or (1018).

In other words, 1 exabyte (EB) = 1018bytes = 1,0006bytes = 1000000000000000000 bytes = 1,000 petabytes = 1 million terabytes = 1 billion gigabytes. Overwhelmed by numbers yet?

Why don’t we give you some examples of what these numbers actually look like? We created this infographic to help put it in perspective.

How Big is an Exabyte?

Share this Image On Your Site

Please include attribution to with this graphic.

Interested in learning more about how we got here? Check out the recent profile of Backblaze in Inc. magazine, free to our blog readers.

The Road to an Exabyte of Cloud Storage

So now that you know what an exabyte looks like, let’s look at how Backblaze got there.

Way back in 2010, we had 10 petabytes of customer data under management. It was a big deal for us, it took us two years to accomplish and, more importantly, it was a sign that thousands of customers trusted us with their data.

It meant a lot! But when we decided to tell the world about it, we had a hard time quantifying just how big 10 petabytes were, so naturally we made an infographic.

10 Petabytes Visualized

That’s a lot of hard drives. A Burj Khalifa of drives, in fact.

In what felt like the blink of an eye, it was two years later, and we had 75 petabytes of data. The Burj was out. And, because it was 2013, we quantified that amount of data like this…

At 3MB per song, Backblaze would store 25 billion songs.

Pop songs now average around 3:30 in length, which means if you tried to listen to this imaginary musical archive, it would take you 167,000 years. And sadly, the total number of recorded songs is only the tens to hundreds of millions, so you’d have some repeats.

That’s a lot of songs! But more importantly, our data under management had grown by 750%! But we could barely take time to enjoy it because five months later we hit 100 petabytes, and we had to call it out. Stacking up to the Burj Khalifa was in the past! Now, we rivaled Mt. Shasta…

Stacked on end they would be 9,941 feet, about the same height as Mt. Shasta from the base.

But stacking drives was rapidly becoming less effective as a measurement. Simply put, the comparison was no longer apples to apples: the 3,000 drives we stacked up in 2010 only held one terabyte of data. If you were to take those same 3,000 drives and use the average drive size we had in 2013, about 4 terabytes of data per drive, the size of the stack would stay the same, as hard drives had not physically grown, but the density of the storage inside the drives had grown by 400%.

Regardless, the years went by, we launched an award-winning cloud storage service (Backblaze B2), and the incoming petabytes kept on accelerating—150 petabytes in early 2015, 200 before we reached 2016. Around there, we decided we needed to wait until the next big moment, and in February 2018, we hit 500 petabytes.

It took us two years to store 10 petabytes of data.

Over the next 7 years, by 2018, we stored another 500 petabytes.

And today, we reset the clock, because in the last two years, we’ve added another 500 petabytes. Which means we’re turning the clock back to 1…

1 exabyte.

Today, across 125,000 hard drives, Backblaze is managing an exabyte of customer data.

And what does that mean? Well, you should ask Ahin.

The post Defining an Exabyte appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps

Post Syndicated from Patrick Thomas original

Big Data illustration

“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.” –The Editors

What is Cloud Storage?

“Big Data” is a phrase people love to throw around in advertising and planning documents, despite the fact that the term itself is rarely defined the same way by any two businesses, even among industry leaders. However, everyone can agree about its rapidly growing importance—understanding Big Data and how to leverage it for the greatest value will be of critical organizational concern for the foreseeable future.

So then what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? What makes it so big, anyway? Let’s bring Big Data down to size.

What is Big Data?

First things first, for purposes of this discussion, “Big” means any amount of data that exceeds the storage capacity of a single organization. “Data” refers to information stored or processed on a computer. Collectively, then, “Big Data” is a massive volume of both structured or unstructured (or both) data that is too large to effectively process using traditional relational database management systems or applications. In more general terms, when your infrastructure is too small to handle the data your business is generating—either because the volume of data is too large, it moves too fast, or it simply exceeds the current processing capacity of your systems—you’ve entered the realm of Big Data.

Let’s take a look at the defining characteristics.

Characteristics of Big Data

Current definitions of Big Data often reference a “triple (or in some cases quadruple) V” construct for detailing its characteristics. The “V”s reference velocity, volume, variety, and variability. We’ll define them for you here:


Velocity refers to the speed of generation of the data—the pace at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, mobile devices, etc. This speed determines how rapidly data must be processed to meet business demands, which determines the real potential for the data.


The term Big Data itself obviously references significant volume. But beyond just being “big,” the relative size of a data set is a fundamental factor in determining its value. The volume of data stored by an organization is used to ascertain its scalability, accessibility, and ease or difficulty of management. A few examples of high volume data sets are all of the credit card transactions in the United States on a given day; the entire collection of medical records in Europe; and every video uploaded to YouTube in an hour. A small to moderate volume might be the total number of credit card transactions in your business.


Variety refers to how many disparate or separate data sources contribute to an organization’s Big Data, along with the intrinsic nature of the data coming from each source. This relates to both structured and unstructured data. Years ago, spreadsheets and databases were the primary sources of data handled by the majority of applications. Today, data is generated in a multitude of formats such as email, photos, videos, monitoring devices, PDFs, audio, etc.,—all of which demand different considerations in analysis applications. This variety of formats can potentially create issues for storage, mining, and analyzing data.


This concerns any inconsistencies in the data formats coming from any one source. Where variety considers different inputs from different sources, variability considers different inputs from one data source. These differences can complicate the effective management of the data store. Variability may also refer to differences in the speed of the data flow into your storage systems. Where velocity refers to the speed of all of your data, variability refers to how different data sets might move at different speeds. Variability can be a concern when the data itself has inconsistencies despite the architecture remaining constant.

An example from the health sector would be the variances within influenza epidemics (when and where they happen, how they’re reported in different health systems) and vaccinations (where they are/aren’t available) from year to year.

Understanding the makeup of Big Data in terms of Velocity, Volume, Variety, and Variability is key when strategizing big data solutions. This fundamental terminology will help you to effectively communicate among all players involved in decision making when you bring Big Data solutions to your team or your wider business. Whether pitching solutions, engaging consultants or vendors, or hearing out the proposals of the IT group, a shared terminology is crucial.

What is Big Data?

What is Big Data Used For?

Businesses use Big Data to try to predict future customer behavior based on past patterns and trends. Effective predictive analytics are the metaphorical crystal ball that organizations seek about what their customers want and when they want it. Theoretically, the more data collected, the more patterns and trends the business can identify. This information can potentially make all the difference for a successful strategy in customer acquisition and retention, and create loyal advocates for a business.

In this case, bigger is definitely better! But, the method an organization chooses to address its Big Data needs will be a pivotal marker for success in the coming years. Choosing your approach begins with understanding the sources of your data.

Sources of Big Data

Today’s world is incontestably digital: an endless array of gadgets and devices function as our trusted allies on a daily basis. While helpful, these constant companions are also responsible for generating more and more data every day. Smartphones, GPS technology, social media, surveillance cameras, machine sensors (and the growing number of users behind them) are all producing reams of data on a moment-to-moment basis that has increased exponentially, from 1 Zetabyte of customer data produced in 2009 to more than 35 Zetabytes in 2020.

If your business uses an app to receive and process orders for customers, or if you log extensive point-of-sale retail data, or if you have massive email marketing campaigns, you could have sources for untapped insight into your customers.

Once you understand the sources of your data, the next step is understanding the methods for housing and managing it. Data Warehouses and Data Lakes are two of the primary types of storage and maintenance systems that you should be familiar with.

illustration of multiple server stacks

Where Is Big Data Stored? Data Warehouses & Data Lakes

Although both Data Lakes and Data Warehouses are widely used for Big Data storage they are not interchangeable terms.

A Data Warehouse is an electronic system used to organize information. A Data Warehouse goes beyond the capabilities of a traditional relational database’s function of housing and organizing data generated from a single source only.

How Do Data Warehouses Work?

A Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose. A warehouse combines information from multiple sources into a single comprehensive database.

For example, in the retail world, a data warehouse may consolidate customer info from point-of-sale systems, the company website, consumer comment cards, and mailing lists. This information can then be used for distribution and marketing purposes, to track inventory movements, customer buying habits, manage promotions, and to determine pricing policies.

Additionally, the Data Warehouse may also incorporate information about company employees such as demographic data, salaries, schedules, and so on. This type of information can be used to inform hiring practices, set Human Resources policies and help guide other internal practices.

Data Warehouses are fundamental in the efficiency of modern life. For instance:

Have a plane to catch?

Airline systems rely on Data Warehouses for many operational functions like route analysis, crew assignments, frequent flyer programs, and more.

Have a headache?

The healthcare sector uses Data Warehouses to aid organizational strategy, help predict patient outcomes, generate treatment reports, and cross-share information with insurance companies, medical aid services, and so forth.

Are you a solid citizen?

In the public sector, Data Warehouses are mainly used for gathering intelligence and assisting government agencies in maintaining and analyzing individual tax and health records.

Playing it safe?

In investment and insurance sectors, the warehouses are mainly used to detect and analyze data patterns reflecting customer trends, and to continuously track market fluctuations.

Have a call to make?

The telecommunications industry makes use of Data Warehouses for management of product promotions, to drive sales strategies, and to make distribution decisions.

Need a room for the night?

The hospitality industry utilizes Data Warehouse capabilities in the tailored design and cost-effective implementation of advertising and marketing programs targeted to reflect client feedback and travel habits.

Data Warehouses are integral in many aspects of the business of everyday life. That said, they aren’t capable of handling the inflow of data in its raw format, like object files or blobs. A Data Lake is the type of repository needed to make use of this raw data. Let’s examine Data Lakes next.

Data lake illustration

What is a Data Lake?

A Data Lake is a vast pool of raw data, the purpose for which is not yet defined. This data can be both structured and unstructured. The prime attributes of a Data Lake are a secure and adaptable data storage and maintenance system distinguished by its flexibility, agility, and ease of use.

If you’re considering a business approach that involves Data Lakes, you’ll want to look for solutions that have the following characteristics: they should retain all data and support all data types; they should easily adapt to change; and they should provide quick insights to as wide a range of users as you require.

Use Cases for Data Lakes

Data Lakes are most helpful when working with streaming data, like the sorts of information gathered from machine sensors, live event-based data streams, clickstream tracking, or product/server logs.

Deployments of Data Lakes typically address one or more of the following business use cases:

  • Business intelligence and analytics – analyzing streams of data to determine high-level trends and granular, record-level insights. A good example of this is the oil and gas industry, which has used the nearly 1.5 Terabytes of data they generate on a daily basis to increase their efficiency.
  • Data science – unstructured data allows for more possibilities in analysis and exploration, enabling innovative applications of machine learning, advanced statistics and predictive algorithms. State, city, and federal governments around the world are using data science to dig more deeply into the massive amount of data they collect regarding traffic, utilities, and pedestrian behavior to design safer, smarter cities.
  • Data serving – Data Lakes are usually an integral part of high-performance architectures for applications that rely on fresh or real-time data, including recommender systems, predictive decision engines or fraud detection tools. A good example of this use case are the different Customer Data Platforms available that pull information from many behavioral and transactional data sources to highly refine and target marketing to individual customers.

When considered together, the different potential applications for Data Lakes in your business seem to promise an endless source of revolutionary insights. But the ongoing maintenance and technical upgrades required for these data sources to retain relevance and value is massive. If neglected or mismanaged, Data Lakes quickly devolve. As such, one of the biggest considerations to weigh when considering this approach is whether you have the financial and personnel capacity to manage Data Lakes over the long term.

What is a Data Swamp?

A Data Swamp, put simply, is a Data Lake that no one cared to manage appropriately. They arise when a Data Lake is being treated as storage only, with a lack of curation, management, retention and lifecycle policies, and metadata. And if you decided to work Data Lake derived insights into your business planning, and end up with a Swamp, you are going to be sorely disappointed. You’re paying the same amount to store all of your data, but returning zero effective intelligence to your bottom line.

Final Thoughts on Big Data Maintenance

Any business or organization considering entry into Big Data country will want to be very careful and planful as they consider how they will store, maintain, and analyze their data. Making the right choices at the outset will ensure you’re able to traverse the developing digital landscape with strategic insights that enable informed decisions to keep you ahead of your competitors. We hope this primer on Big Data gives you the confidence to take the appropriate first steps.

The post The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

To Buy, Or Not to Buy? CapEx Versus OpEx is the Question

Post Syndicated from Patrick Thomas original

Debating CapEx and Opex: How to Make the Decision

When you work in a resource-constricted business or organization, committing to any kind of investment that doesn’t directly contribute to your mission, your bottom line (or both) is incredibly hard. You only have a certain amount of cash, and if you’re not using it to build something that immediately returns value to your bottom line, you feel like you’re stealing from your own growth.

This is the reality for many nonprofits, media companies, foundations, production houses, and arts organizations. They often invest in cash-intensive products—music, video, literature, plays, service efforts, and/or historical or cultural preservation—that take months, if not years, to turn back into cash or grants the company can use for future efforts. Which means that, when it’s time to make investments in the company’s infrastructure, the decision making process can be brutal.

Do you invest in your office: the metaphorical leaking roof over your head? New technology that might (or might not) make your job easier? And how do you invest? Do you buy or rent? It all feels like a distraction from what you’re supposed to be working on, and possibly a misallocation of funds that could be supporting your core mission.

We want to help you with the decision.

A growing reality for businesses is the cost of protecting their legacy: All of the video, audio, text, and other files that their organization has created over the years came at great expense, so there’s no question that it needs to be appropriately stored and archived.

But, while the size of this data is growing, your budget to protect that data is most likely not. You have a decision to make: Try to manage this growth with an on-prem investment, or pay a cloud-based service to take care of it for you. In more simple terms, you’re making an age-old budgeting decision:

Whether to make an extensive capital investment, or to commit to an ongoing operational expense. In financial jargon, you’re debating “CapEx” vs. “OpEx”—and you’re not alone.

Let’s break down the decision one step at a time. First things first, let’s hammer out some terms in case you’re already lost. (If you’re roughly familiar with basic accounting principals, you can skip the following section. Otherwise, read on…)

Capital Expense (CapEx) vs. Operating Expense (OpEx)

NAS box

What is CapEx?

Capital Expense (CapEx) (or expenditure, if you want to be fancy) is the price of buying or fixing an asset that will hold its value over time—think: vehicles, buildings, equipment, land, or say… a network-attached server (NAS) or some other piece of technology. You’re buying something that you will use over some period of time (often called its “useful life”).

What’s important to note is that this “expense” does not immediately hit your operating budget. At first, this is entirely a balance sheet item: the cash you spent for it simply moves from “Current Assets” on your balance sheet to “Equipment” under your “Long-term Assets.”

Nice deal, right? No net change! My hardware is now “free” for me except for the upkeep and power expenses. Not so fast.

Every type of capital asset has a “depreciation schedule.” This is accountant lingo for: Things fall apart and aren’t worth what you paid for them. So, the depreciation schedule is an estimation for how quickly or slowly an item loses its value to your business.

Technology typically depreciates on a 3 or 5 year basis. What this means is, if you buy a $10,000 NAS on January 1st, and your auditor determines its useful life to be 3 years, then your depreciation schedule will recognize a loss of $3,333.33 in value for the asset at the end of the year for the next three years. Once you reach three years the asset can be removed from your depreciation schedule and it isn’t “worth” anything on your balance sheet anymore (though it may still be quite functional and an essential part of your IT infrastructure).

But, as we know: The whole idea in a balance sheet is that it “balances” right? So if your assets are decreasing in value, you need to acknowledge that loss somewhere or your finances will be out of whack. So where does the decrease go? Into your operating budget! You have to acknowledge depreciation as an operating expense.

This is where your Capital Expense becomes an Operating Expense: The depreciation value has to go somewhere, and that somewhere is your bottom line. Accounting principles dictate that you have to incorporate the amount your assets have depreciated into your operating budget as an expense.

And that’s how Capital Expense works (at least, from a very simple perspective). You buy something—often pulling directly from your cash—and then you acknowledge its expense incrementally, over a set schedule, in your operating budget.

Side note: Some companies account for depreciation on a monthly basis, to avoid year-end surprises in heavy depreciation bills.

What is OpEx?

Operating Expense (OpEx) is typically easier for folks to wrap their head around. These are the ongoing costs you incur to run your business or organization. Think: rent, internet, office supplies, kibble for the office cat—that sort of thing. Unlike your capital expenses, the operating expenses hit your bottom line immediately, typically flowing through your general or administrative lines. No smoke and mirrors here—you spend the money, you get what you paid for, your monthly financials reflect the expense, and hopefully it contributes positively to your bottom line!

Backblaze cloud banner

Why Do CapEx and Opex Matter for Data Storage?

We get it, the ins and outs of a balance sheet are tantalizing to a small number of people (sheepishly, I count myself amongst them). BUT: The implications of the difference between CapEx and OpEx can be hugely impactful when it comes to fulfilling your organization’s mission or your businesses’ goals.

Let’s see if we can bring the implications into more useful perspective:

Let’s say that you’re an organization with a growing archive of data, and you have a reliable projection for your data growth rate over the coming years. Maybe you’re a music education nonprofit, and part of your mission is archiving the performance recordings of all of your students for their use in future school applications or tryouts.

You have around 200TB worth of data that you need to get onto a more reliable, accessible storage media ASAP (your volunteer librarian just retired and now nobody knows how she organized the tape archive… bummer). What do you do?

You call your freelance IT consultant, and they give you two options:

1) An On-Prem Storage Solution: The simplest way to describe an on-premises solution is: A machine that is visible. It’s in a space you own or rent, it is “bought and paid for,” and your auditor or bookkeeper flagged that it should be depreciated.

Data storage comes in “on-prem” shapes and sizes too. Whether it’s a Network Attached Server (NAS) or a Storage Area Network (SAN), your IT guy will quote out something for you to plug in onsite that will take care of your archiving issue. This type of storage is a capital expense.

2) A Cloud Storage Solution: The “cloud” is often described as “someone else’s computer.” It’s a pretty apt description in this case. In a cloud storage solution, you pay another company—typically on a monthly basis—to store, protect, and maintain your data. So if on-prem is a “visible” solution, this is an “invisible” solution. And because it’s a service, it’s a simple monthly operating expense. Cloud storage doesn’t depreciate because the company providing is constantly paying to maintain.

What is This Really Going to Cost?

A Question of Clouds

Let’s say you have roughly 200TB you need to get into a better archiving solution. This is about 25 years worth of content, so you’re expecting (with a little room for data inflation) to add around 30 Terabytes a year of data. What will your two options really cost you, once we pull apart the financial jargon?

On-Prem Storage Solution:

For your server hardware, your hard drives, and a reliable power supply unit, you’ll likely end up paying around $25,000. Let’s assume your audit determines a useful life of three years for the server, that means you’ll have a depreciation of $694 per month.

Don’t forget, however, that you’ll need to pay for power and possibly cooling (estimated around $100 per month for this size of server), some IT assistance to help with upgrades and maintenance (let’s say $50 per month), and you have to pay for the space to hold it. We’ll leave the latter at zero—you probably have a closet somewhere.

So all in, you’ll need to lay out $25K in cash at the outset, and then you’ll be recognizing the expense in your operating expense to the tune of $850 every month for the next 3 years.

Cloud Storage Solution:

The biggest expense on the front end of a cloud solution is the expense of ingesting data. There are a number of services you can use, but it’s easiest for us to quote out our own: B2 Cloud Storage. If you want to move quickly, you can rent our Fireballs, which we’ll ship to you at a price of $550 per month, and you can upload 70TB at a time and ship them back to us for upload. So, all together, you’d pay $1,650 for the trouble of moving the data at the speed of FedEx, which is typically far faster than the internet. (If you have the time to let your data upload over the internet for months, then you can do that for far less). We’ll spread this expense across 3 years to make our comparison more apples-to-apples. So let’s say $46 a month.

After this initial expense, you simply have to pay a monthly data storage bill. We’ll use Backblaze B2 for estimating. At 200TB, you’ll be paying $1,000 a month, right now, but that number will grow along with your data, at $5/TB.

So you’re laying out $1,500 in your first month, then $1,000 a month, growing at $5/TB per month.

Balance Sheet Implications and Monthly Operating Budget Implications tables

Side note: you can calculate your own costs using the B2 cloud storage calculator.

What’s the Real Difference? Or, What Are a Few Hundred Dollars Worth?

To emerge from the weeds for a moment: The simple difference between these two options is a few hundred dollars on a month to month basis. Let’s explore what those hundreds get you when it comes to your cash flow, long-term flexibility, maintenance, and your real estate bill.

Cash Flow

On-Prem Storage Solution: On day one working with your NAS or SAN, you’re out $25,000. And unless the hardware is defective, you will never be able to get that full value back.

Cloud Storage Solution: By the end of “month one” for Cloud, you might be out $1,000, maybe $2,000, depending on the upload service you use and the upload speeds you achieve.

So the question here is: How important is cash flow to you? What’s the opportunity cost of not being able to use that $22K for something else?

Long-Term Flexibility

How good of a forecaster are you? It may sound like an odd question but it’s actually critical to choosing the right solution. If you can accurately forecast the data needs of your company for the next five years you should be able to pick an on-prem solution that will match those needs. On the other hand if you are off in either direction you will end up spending money inefficiently. Let’s take a look at how both scenarios play out.

Overestimating Data Needs:

On-Prem Storage Solution: On day two working with your NAS or SAN, you are locked in to your investment of $25,000. You can’t return it, you can’t make it smaller. And the decrease in cash is only the beginning of your commitment. On an annual or monthly basis, going forward, you will need to recognize your investment’s depreciation, which will be a fixed amount no matter how long the hardware is in operation.

Cloud Storage Solution: On day two working in the cloud, you could choose to deprecate some of your data or recategorize it, and decrease your monthly spend. Alternatively, you could change cloud services if your existing arrangement isn’t working. The important thing is, you are not locked in to investment, and you can exit the arrangement at any time.

Underestimating Data Needs:

On-Prem Storage Solution: On day 425 you might realize that your NAS or SAN isn’t going to be enough storage for your operations. Most organizations make budgets for their operations, and most of them wouldn’t mind beating those projections. The problem is, if you’re able to achieve more in a given year, you will likely also generate more data in that year. If your business takes off and you’re locked into an on-prem solution, your only remedy will be to invest in higher capacity drives, or even to add an additional storage solution, both of which incur additional upfront cash outlays.

Cloud Storage Solution: If you reach day 425 and your organization is unexpectedly beating projections by a significant margin, your cloud storage service can easily scale to match your needs. Your monthly expense will increase but only at a fractional percentage since no new equipment will need to be purchased.

The questions here are: Are you ready to project how your data will scale over the next three to five years? And are you ready to cover the costs if you’re wrong?

Upkeep, Updates, and Repairs

On-Prem Storage Solution: On day 483, your NAS or SAN may not keep up with technology, or it may disagree with some other tech you need. But you’re stuck with it, and you’re stuck with the tab of paying to maintain and troubleshoot the aging technology. Whether you have your own IT staff, or hire consultants or an IT service, this could add significant cost to your overall on-prem storage budget. And it goes without saying that your server could simply cease functioning at any time.

Cloud Storage Solution: A cloud storage service will continue to run. It will always be improved. It will always be maintained. And in some cases, depending on how you’re uploading the data, it can even be self-healing. Some cloud storage solutions will perform checks to ensure that the data they’re storing is not degrading or hasn’t been lost. As such, these services are considered “self-healing,” because when they discover an inconsistency, they’ll call the users files to re-upload or use their own built-in redundancies to fix or replace the affected files.

And, of course, a cloud solution will have a staff of people—experts, often—working day in and day out to maintain and improve your storage. You don’t pay anything additional for their services. You don’t have to recruit, train, or manage them. They’re on the job entirely to ensure your service is never disrupted.

Real Estate, Energy, and Security

On-Prem Storage Solution: As your data grows, so will your need to maintain the space for it. You will need to devote increasingly large amounts of real estate to the footprint of your on site storage. You’ll have to provide adequate cooling and energy, and you’ll need to think differently about your approach to security. If you don’t have the space on site, that means you’ll need to expand or find additional real-estate elsewhere to make more room for servers. Depending on the real estate market you’re in, the costs for such an expansion could be well more than the monthly depreciation for your hardware.

Cloud Storage Solution: Cloud storage services are in the business of providing real estate, climate control, and over-the-top security for your data. Most cloud storage service providers have multiple data centers in order to quickly scale to meet your growing storage needs.

Projected Costs Comparison: On-Prem Vs. Cloud

Final Thoughts on CapEx vs. OpEx

Viewed simply, setting up an on-prem archiving solution seems like a good deal. You’ve got your asset on site, the immediate budget implications are shunted off to your balance sheet, and everything else is IT’s problem. But, when you look past day one of your purchase, it might be less attractive: You have cash tied up in a vulnerable asset. You have a financial commitment that you can’t easily scale up or down. You have a tool that is only going to be maintained or improved when something goes very wrong. You have another presence in your office that is a space and energy hog.

On the flip side, with cloud storage, you have a monthly payment that takes care of all of this for you. Your cash is more available and flexible. You have zero concerns about scaling issues. Security, reliability, durability, upgrades, up-time, energy use, file maintenance—all of these are part of what you’re paying for on a monthly basis.

Maybe you love tinkering with hardware and being able to see where your data is resting. If that’s you, well, then CapEx is your jam. But if all of the concerns listed above are things you’d rather not worry about (I mean, you probably have more interesting things to work on, right?), then cloud storage is probably a better deal. CapEx vs. OpEx, that is the question—which is wiser? We’ll leave the answer up to you.

The post To Buy, Or Not to Buy? CapEx Versus OpEx is the Question appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Argument Against Commission: Why Backblaze Pays Sales a Full Salary

Post Syndicated from Patrick Thomas original

runners with the Backblaze logo racing towards a finish line

We’ve all been there: You enter some kind of retail environment—a clothing shop, an electronics big box store, or even a software company’s website—you get a few seconds of peace to look around, and then suddenly you’re being swarmed by extremely helpful sales people. That isn’t the worst thing in the world at face value, but somehow the “helpful” attitude feels like a pretty thin cover for the salesperson’s need to sell you something (ideally an expensive something) fast. It’s not a good feeling—you’ve gone from being a potential customer to acting as data point on that employee’s pay stub.

Many of us have been on the other side, too: we’ve been hired to move product, but the only way we can reach a salary that will pay the bills is to push hard on the customer, even in situations where they don’t necessarily need what we’re selling. I’ve been on both sides, and it’s hard to say which feels worse.

It’s commission sales—and it’s the norm across the majority of businesses, but especially in tech, where it’s standard for salespeople to earn a base salary of just 50 – 60% of their on-target earnings (OTE). OTE is essentially HR-speak for what a person should be paid given their experience, the industry they are working in, and what they’re selling—in other words, their current “market rate.”

This means that most salespeople you speak to have to earn back 40 – 50% of their salary, every year. No matter the state of the economy, manufacturing issues, natural disasters, or any other unforeseeable challenges, sales teams working on commission still need to meet strict benchmarks—just to earn what they’re worth.

Well, that’s the norm pretty much everywhere else. But at Backblaze, our sales team is paid 100% of their salary. Sounds weird, right? Well, we don’t think so. Everyone else in our shop earns a salary based on the market for their skills—and we think our teammates on the sales team should too. Here’s why.

A Good Idea

Humans Are Not Coin Operated

Nilay Patel, Backblaze’s Vice President of Sales, got his start in tech as a Sales Engineer. “SEs,” as they’re often called, use technical expertise to help sales representatives explain how a product will work for a specific potential customer. The main difference between SEs and Sales Reps is that Sales Engineers typically are not commission-based. But they are on the front lines to witness the odd effects that commission-based compensation has on organizations and individuals.

In a recent interview, Nilay related how, at one of his early jobs, the VP of Sales often referred to “coin-operated” sales departments. What Nilay’s colleague meant, in short, was that if you wanted more sales, you simply needed either more draconian, or more lucrative commission structures to incentivize higher sales. Nilay was immediately struck by how disrespectful this phrasing was and what a manipulative managerial structure it suggested. As he sat in on management conversations and hiring panels, he saw the “coin-operated” philosophy evoked again and again, seemingly suggesting that, to motivate a sales team, you just had to dangle their paycheck a few feet ahead of them.

At the end of the day, these conversations weren’t about the mission, or serving customers, or achieving some greater business goal—they were solely about money. Over time, Nilay came to accept that this was just how sales worked across the industry, but he never felt particularly comfortable with the practice.

Teamwork in Sales

Commission Impossible

When Nilay returned to Backblaze in 2015 to lead sales of our new product, Backblaze B2 Cloud Storage, he opened up a conversation with Gleb Budman, Backblaze’s CEO, about how commission would work in his new role. He assumed that, because compensation for pretty much every other job in Silicon Valley was commission-based, this one would be too. But Gleb responded with a straightforward, “That sounds complicated, sure you want to go that route?” And the more Nilay thought about it, he realized he didn’t.

Hindsight is 20/20, but even at the beginning, Nilay knew some of the weaknesses that a commission-based sales force would create. The rest, he’s learned with time. And now, at the five year mark of running a no-commission shop, he’s convinced he made the right call. These are the reasons he believed, and continues to believe, it was the right one:

It’s Good for Trust Among the Backblaze Team

We pride ourselves on being a supportive, collaborative workplace, and we succeed: last year, all 133 of us were surveyed and rated the company a 99% great place to work. Partially, this is because we all trust that everyone is pulling in the same direction. When the sales team asks engineering for help with a customer issue, or they need materials from marketing, nobody wonders whether they’re asking for selfish reasons. Because we’re all paid standard salaries, we know that everyone’s working to serve the customer and ensure that all of Backblaze is successful.

“When I ask for help from engineering and marketing, I want folks to feel like my requests are rooted in wanting to help our customers—not to line my pockets.”
James Fleishman, Backblaze Partner Manager

It’s Good for Innovation

In a traditional sales operation, new products are more of something to fear, rather than celebrate. Something new, unproven, and not known to the customer could cut into a team’s sales success, and therefore also their salary. As a result, innovation often encounters friction when it hits the marketplace. Not so at Backblaze, where the sales team is far more likely to get excited about how a new offering might enhance the experience of their customers.

It’s Good for Our Business

Committing to a cloud storage provider is a shockingly big deal. Companies aren’t just buying new software, they’re moving priceless intellectual property into a solution that they need to be able to trust to work, and to last. We’ve provided backup and storage for well over a decade now, and we manage nearly an exabyte of customer data, so there’s little question of our ability to stand the test of time.

But customers also need to know that we’ll aid them in the process of transitioning to our service and that we’ll stand by them as their needs grow and evolve over time. Because our sales force doesn’t have to worry about their paycheck or “closing” a sale, they’re unleashed to give customers the level of care they need to set their business up right and to build a long-lasting, productive working relationship.

When we’re dealing with prospects, we don’t look at them and immediately see dollar signs, but rather an opportunity to better the situation for both parties: they get a solution that works, and we continue to grow our business.
Daniel Lloyd Pias, Backblaze Sales Manager

It’s Good for the Sales staff

One of Nilay’s early managers had previously worked for IBM on the typewriter sales team when the PC was launched. Nobody in the computer sales division wanted to risk their commission on “Personal” computers because they assumed the product would never succeed. So the typewriter sales team got the job… and they were wildly successful. For their first reporting period, they all got to take home a king’s ransom in commission. But once they showed what they could do, their commissions were restructured to ensure they never saw commission rates like that again.

The fact is, no company is going to intentionally create big commission payouts. And when big payouts do happen, they’re going to change the rules as fast as possible to ensure it doesn’t happen again. So, while conventional assumptions would suggest that commissions draw top talent, our results don’t support those assumptions. Nilay believed that smart sales staffers would understand that being able to count on 100% of a competitively indexed OTE is a much better deal than having to contend with unpredictable commission payouts. And the composition of our sales team bears out his belief.

Today, we have a highly collaborative team of people who work together to ensure that our customers receive functional, long-term solutions that work for them. But the collaboration doesn’t end with sales prospects. The team also works cross-functionally among one another, and across the full company, to learn and grow their professional capacity. All of this, and they’re also a hell of a lot of fun to spend time with!

It’s Good for the Customer

Working on commission can be very stressful for employees, especially when they are counting on hitting their OTE to make ends meet. This means the team is focused on their paycheck, rather than their customer’s challenges. According to James Fleishman, our Partner Manager, “the sales team appreciates working in an environment that encourages them to focus on meeting the customer’s needs and not their own.”

A no-commission policy carries many other benefits. It allows Sales employees to utilize the same benefits as other employees. I can go on a real paid vacation, or get an important medical procedure done, without losing 60% of my income to do so.
Crystal Matina, Backblaze Account Executive

It’s The Right Thing to Do

In the “Mission & Values” document every new hire at Backblaze receives, value #1 is “Fair & Good.” We try really hard to be the good guys in everything we do, and part of that is being equitable and fair in our pay structure. It isn’t any more complicated than that. Backblaze doesn’t do anything because “it’s just the way it’s done,” we do it because it’s the right way to do it. (And it just so happens that being “Cleverly Unconventional” is value #5).

Headed the Right Direction

Focusing on the Long View

Nilay freely admits that the no-commission model probably won’t work for everyone, but the leadership team at Backblaze is committed to it for the long term. Another one of our five company values is to build a sustainable business that is set up for long-term success. Sadly, this also does not always seem to be the norm in tech. But for us, fulfilling our values requires building a sales team that wants to work together to move our mission forward over the long term, and that means paying people what they’re worth, in full.

So the next time you land on the B2 Cloud Storage page and Victoria or Vincent strike up a chat with you; or when you’re looped into a call with Shaneika, James, Crystal, Mike M., or Alex; or you’re working through a solution with Mike F., Udara, or Pavithra, and you’re wondering—why are these people so kind and patient? Well, now you know.

And if you’re in the process of setting up your own business, maybe Nilay’s perspective will help you think differently about how your sales team’s compensation structure might affect your brand, and your bottom line.

The post The Argument Against Commission: Why Backblaze Pays Sales a Full Salary appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A Sandbox in the Clouds: Software Testing and Development in Cloud Storage

Post Syndicated from Patrick Thomas original

A Sandbox full of App Icons in the Clouds

“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.”
–The Editors

The words “testing and development” bring to mind engineers in white lab coats marking clipboards as they hover over a buzzing, whirring machine. The reality in app development is more often a DevOps team in a rented room virtually poking and prodding at something living on a server somewhere. But how does that really work?

Think of testing in the cloud like taking your app or software program to train at an Olympic-sized facility instead of your neighbor’s pool. In app development, building the infrastructure for testing in a local environment can be costly and time-consuming. Cloud-based software development, on the other hand, gives you the ability to scale up resources when needed without investing in the infrastructure to, say, simulate thousands of users.

But first things first…

What Is Cloud Software Testing?

Cloud software testing uses cloud environments and infrastructure to simulate realistic user traffic scenarios to measure software performance, functionality, and security. In cloud testing, someone else owns the hardware, runs the test, and delivers the test results. On-premise testing is limited by budgets, deadlines, and capacity, especially when that capacity may not be needed in the future.

An App Waiting in a Cloud Storage Sandbox

Types of Software Testing

Any software testing done in a local test environment can be done in the cloud, some much more efficiently. The cloud is a big sandbox, and software testing tools are the shovels and rakes and little toy dump trucks you need to create a well-functioning app. Here are a few examples of how to test software. Keep in mind, this is by no means an exhaustive list.

Stress Testing

Stress tests measure how software responds under heavy traffic. They show what happens when traffic spikes (a spike test) or when high traffic lasts a long time (a soak test). Imagine putting your app on a treadmill to run a marathon with no training, then forcing it to sprint for the finish. Stress testing software in an on-premise environment involves a significant amount of capital build-out—servers, software, dedicated networks. Cloud testing is a cost-effective and scalable way to truly push your app to the limit. Companies that deal with big spikes in traffic find stress testing particularly useful. After experiencing ticketing issues, the Royal Opera House in London turned to cloud stress testing to prepare for ticket releases when traffic can spike to 3,000 concurrent users. Stress testing in the cloud enables them to make sure their website and ticketing app can handle the traffic on sales days.

Load Testing

If stress testing is a treadmill, load testing is a bench press. Like stress testing, load testing measures performance. Unlike stress testing, where the software is tested beyond the breaking point, load testing finds that breaking point by steadily increasing demands on the system until it reaches a limit. You keep adding weight until your app can’t possibly do another rep. Blue Ridge Networks, a cybersecurity solutions provider based in Virginia, needed a way to test one of their products against traffic in the millions. They could already load test in the hundreds of thousands but looked to the cloud to scale up. With cloud testing, they found that their product could handle up to 25 million new users and up to 80 million updates per hour.

Performance Testing

Stress and load tests are subsets of software performance testing—the physical fitness of the testing world. The goal of performance testing is not to find bugs or defects, but rather to set benchmarks for functionality (i.e., load speed, response time, data throughput, and breaking points). Cloud testing is particularly well-suited to software performance testing because it allows testers to create high-traffic simulations without building the infrastructure to do so from scratch. Piksel, a video asset management company serving the broadcast media industry, runs performance tests each night and for every new release of their software. By testing in the cloud, they can simulate higher loads and more concurrent users than they could on-premise to ensure stability.

Latency Testing

If stress testing is like training on a treadmill, latency testing is race day. It measures the time it takes an app to perform specific functions under different operating conditions. For example, how long it takes to load a page under different connection speeds. You want your app to be first across the finish line, even under less than ideal conditions. The American Red Cross relies on its websites to get critical information to relief workers on the ground in emergencies. They need to know those sites are responsive, especially in places where connection speeds may not be very fast. They employ a cloud-based monitoring system to notify them when latency lags.

Functional Testing

If performance testing is like physical training, functional testing is like a routine physical. It checks to see if things are working as expected. When a user logs in, functional testing makes sure their account is displayed correctly, for example. It focuses on user experience and business requirements. Healthcare software provider Care Logistics employs automated functional testing to test the functionality of their software whenever updates are rolled out. By moving to the cloud and automating their testing, they reduced their testing time by 50 percent. Functional testing in the cloud is especially useful when business requirements change frequently because the resources to run new tests are instantly available.

Compatibility Testing

Compatibility testing checks to see if software works across different operating systems and browsers. In cloud testing, as opposed to on-premise testing, you can simulate more browsers and operating systems to ensure your app works no matter who uses it. Mobile meeting provider LogMeIn uses the cloud to test it’s GoToMeeting app on 60 different kinds of mobile devices and test their web-based apps daily across multiple browsers.

Smoke Testing

In the early days of technology development, a piece of hardware passed the smoke test if it didn’t catch on fire (hence, smoke). Today, smoke testing in software testing makes sure the most critical functions of an app work before moving on to more specific testing. The grocery chain Supervalu turned to cloud testing to reduce the time they spent smoke testing by 93 percent. And event management platform Eventbrite uses the cloud to run 20 smoke tests on every software build before running an additional 700 automated tests.

An App Waiting in a Cloud Storage Sandbox

Advantages of Cloud Development vs. Traditional Software Development (and Some Drawbacks)

  • Savings – Only pay for the resources you need rather than investing in infrastructure build-out and maintenance, saving money and time spent developing a local test environment.
  • Scope – Broaden the number of different scenarios you can test — more browsers, more operating systems — to make sure your software works for as many users as possible.
  • Scalability – Effortlessly scale your resources up or down based on testing needs from initial smoke testing to enterprise software development in the cloud.
  • Speed – Test software on different operating systems, platforms, browsers, and devices simultaneously, reducing testing time.
  • Automation – Easily employ automated software testing tools rather than dedicating an employee or team to test software manually.
  • Collaboration – As more and more companies abandon waterfall in favor of agile software development, the role of development, operations, and QA continues to blend. In the cloud, developers can push out new configurations or features, and QA can run tests against them right away, making agile development more manageable. For example, cloud testing allowed the Georgia Lottery System to transition from releasing one to two software updates per year with waterfall development to 10+ releases each quarter with agile.

Moving your testing to the cloud is not without some drawbacks. Before you make the move, consider the following:

  • Outages – In March of 2019, Amazon Web Services (AWS) suffered an outage at one of their data centers in Virginia. The blackout affected major companies like Slack, Atlassian, and Capital One. For a few hours, not only were their services affected, those companies couldn’t test any web properties or apps running on AWS.
  • Access – The nature of cloud services means that companies pay for the access they need. It’s an advantage to building infrastructure on-site, but it puts the onus on companies to determine who needs access to the testing environments housed on the cloud and what level of access they need to keep cloud testing affordable.
  • Lack of universal processes – Because each cloud provider develops its own infrastructure and systems (and most are very hush-hush about it), companies who want to switch providers face the burden of reconfiguring their internal systems and data to meet new provider requirements.

An App Waiting in a Cloud Storage Sandbox

What Does Cloud Testing Cost?

Most cloud service providers offer a tiered pricing structure. Providers might charge per device minute (in mobile testing) or a flat fee for unlimited testing. Flat fees start around $100 per month up to $500 per month or more. Many also offer private testing at a higher rate. Start by determining what kind of testing you need and what tier makes the most sense for you.

Who Uses the Cloud for Software Testing?

As shown in the examples above, organizations that use the cloud for testing are as varied as they come. From nonprofits to grocery chains to state lottery systems, any company that wants to provide a software application to improve customer service or user experience can benefit from testing in the cloud.

No longer limited to tech start-ups and industry insiders, testing in the cloud makes good business sense for more and more companies working to bring their apps and software solutions to the world.

The post A Sandbox in the Clouds: Software Testing and Development in Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backing Up the Death Star: How Cloud Storage Explains the Rise of Skywalker

Post Syndicated from Patrick Thomas original

Backing up the Death Star

It’s come to our attention here at Backblaze that there’s a movie coming out later this week that some of you are excited about. A few of us around the office might be looking forward to it, too, and it just so happens that we have some special insight into key plot elements.

For instance, did you know that George Lucas was actually a data backup and cloud storage enthusiast? It’s true, and once you start to look, you can see it everywhere in the Star Wars storyline. If you aren’t yet aware of this deeper narrative thread, we’d encourage you to consider the following lessons to ensure you don’t suffer the same disruptions that Darth Sidious (AKA the Emperor, AKA Sheev Palpatine) and the Skywalkers have struggled with over the past 60 years of their adventures.

Because, whether you run a small business, an enterprise, the First Order, or the Rebel Alliance, your data—how you work with it, secure it, and back it up—can be the difference between galactic domination and having your precious battle station scattered into a million pieces across the cold, dark void of space.

Spoiler Alert: If you haven’t seen any of the movies we’ll reference below, well, you’ve got some work to do: about 22 hours and 30 minutes of movies, somewhere around 75 hours of animated and live action series, a few video games, and more novels than we can list here (don’t even start with the Canon and Legends division)… If you’d like to try, however, now is the time to close this tab.

Though we all know the old adage about “trying”…


Any good backup strategy begins with a solid approach to data security. If you have that in place, you significantly lower your chance of ever having to rely on your backups. Unfortunately, the simplest forms of security were often overlooked during the first eight installments of the Star Wars story…

Impossible. Perhaps the archives are incomplete.

“Lost a planet, Master Obi-Wan has. How embarrassing!”
–Master Yoda

The history of the Jedi Council is rife with infosec issues, but possibly the most egregious is called out when Obi-Wan looks into the origins of a Kamino Saberdart. Looking for the location of the planet Kamino itself within the Jedi Archives, he finds nothing but empty space. Having evidently failed out of physics at the Jedi Academy, Master Kenobi needs Yoda to point out that, if there’s a gravity well suggesting the presence of a planet—the planet has likely been improperly deleted from the archives. And indeed that seems to have been the case.

How does the galactic peacekeeping force stand a chance against the Sith when they can’t even keep their own library safe?

Some might argue that, since the Force is required to manipulate the Jedi Archives, then Jedi training was a certain type of password protection. But there were thousands of trained Jedi in the galaxy at that time, not to mention the fact that their sworn enemies were force users. This would be like Google and Amazon corporate offices sharing the same keycards—not exactly secure! So, at their most powerful, the Jedi had weak password protection with no permissions management. And what happened to them? Well, as we now know, even the Younglings didn’t make it… That’s on the Jedi Archivists, who evidently thought they were too good for IT.

The Destruction of Jedha

“Most unfortunate about the security breach on Jedha, Director Krennic.”
—Grand Moff Tarkin

Of course, while the Jedi may have stumbled, the Empire certainly didn’t seem to learn from their mistakes. At first glance, the Imperial databank on Scarif was head-and-shoulders above the Jedi Archives. As we’ve noted before, that Shield Gate was one heck of a firewall! But Jyn Urso and Cassian Andor exploited a consistent issue in the Empire’s systems: Imperial Clearance Codes. I mean, did anyone in the galaxy not have a set of Clearance Codes on hand? It seems like every rebel ship had a few lying around. If only they had better password management, all of those contractors working on Death Star II might still be pulling in a solid paycheck.

To avoid bad actors poking around your archives or databanks, you should conduct regular reviews of your data security strategies to make sure you’re not leaving any glaring holes open for someone else to take advantage of. Regularly change passwords. Use two factor authentication. Use encryption. Here’s more on how we use encryption, and a little advice about ransomware.

3-2-1 Backup

But of course, we’ve seen that data security can fail, in huge ways. By our count, insufficient security management on both sides of this conflict has led to the destruction of 6 planets, the pretty brutal maiming of 2 others, a couple stars being sucked dry (which surely led to other planets’ destruction), and the obliteration of a handful of super weapons. There is a right way folks, and what we’re learning here is, they didn’t know it a long time ago in a galaxy far, far away. But even when your security is set up perfectly, disaster can strike. That’s why backups are an essential accompaniment to any security.

The best approach is a 3-2-1 backup strategy: For every piece of data, you have the data itself (typically on your computer), a backup copy on site (in a NAS or simply an external hard drive), and you keep one copy in the cloud. It’s the most reasonable approach for most average use cases. Lets see how the Empire managed their use case, when the stakes (the fate of much of existence) couldn’t have been higher:

Dooku's Death Star Plans

“I will take the designs with me to Coruscant. They will be much safer there with my master.”—Count Dooku

We first see the plans for the “super weapon based on Geonosian designs” when Count Dooku, before departing Geonosis, decides that they would be safer housed on Coruscant with Darth Sidious. How wrong he was! He was thinking about securing his files, but it seems he stumbled en route to actually doing so.

By the time Jynn Erso learns of the “Stardust” version of the plans for the Death Star, it seems that Scarif is the only place in the Galaxy, other than on the Death Star itself, presumably, that a person could find a copy of the plans… Seriously? Technically, the copy on Scarif functioned as the Empire’s “copy in the cloud,” but it’s not like the Death Star had an external hard drive trailing it through space with another copy of the plans.

If you only have one backup, it’s better than nothing—but not by much. When your use case involves even a remote chance that Grand Moff Tarkin might use your data center for target practice, you probably need to be extra careful about redundancy in your approach. If the Rebel Alliance, or just extremely competitive corporate leaders, are a potential threat to your business, definitely ensure that you follow 3-2-1, but also consider a multi-cloud approach with backups distributed in different geographic regions. (For the Empire, we’d recommend different planets…)

Version Control

There’s being backed up, and then there’s being sure you have the right thing backed up. One thing we learn from the plans used to defeat the first Death Star is that the Empire didn’t manage version control very well. Take a close look at the Death Star schematic that Jyn and Cassian absconded with. Notice anything…off?

Yeah, that’s right. The focus lens for the superlaser is equatorial. Now, everyone knows that the Death Star’s superlaser is actually on the northern hemisphere. Which goes to show you that this backup was not even up to date! A good backup solution will run on a daily basis, or even more frequently depending on use cases. It’s clear that whatever backup strategy the Death Star team had, it had gone awry some time ago.

Death Star II Plans

“The rebels managed to destroy the first Death Star. By rebuilding the Death Star, and using it as many times as necessary to restore order, we prove that their luck only goes so far. We prove that we are the only galactic authority and always will be.”―Lieutenant Nash Windrider

We can only imagine that the architects who were tasked with quickly recreating the Death Star immediately contacted the Records Department to obtain the most recent version of the original plans. Imagine their surprise when they learned that Tarkin had destroyed the databank and they needed to work from memory. Given the Empire’s legendarily bad personnel management strategies—force-choking is a rough approach to motivation, after all—it’s easy to assume that there were corners cut to get the job done on the Emperor’s schedule.

Of course, it’s not always the case that the most recent version of a file will be the most useful. This is where Version History comes into the picture. Version History allows users to maintain multiple versions of a file over extended periods of time (including forever). If the design team from the Empire had set up Version History before bringing Galen Erso back on board, they could have reverted to the pre-final plans that didn’t have a “Insert Proton Torpedo Here To Destroy” sign on them.

To their credit, the Death Star II designers did avoid the two-meter-wide thermal exhaust port exploited by Luke Skywalker at the Battle of Yavin. Instead, they incorporated millions of millimeter-sized heat-dispersion tubes. Great idea! And yet, someone seemed to think it was okay to incorporate Millenium Falcon-sized access tunnels to their shockingly fragile reactor core? This shocking oversight seems to be either a sign of an architectural team clearly stressed by the lack of reliable planning materials, or possibly it was their quiet protest at the number of their coworkers who Darth Vader tossed around during one of his emotional outbursts.

Cloud Storage Among the Power (Force) Users

At this point it is more than clear that the rank-and-file of pretty much every major power during this era of galactic strife was terrible at data security and backup. What about the authorities, though? How do they rank? And how does their approach to backup potentially affect what we’ll learn about the future of the Galaxy in the concluding chapter of the Star Wars saga, “The Rise of Skywalker”?

There are plenty of moderately talented Jedi out there, but only a few with the kind of power marshaled by Yoda, Obi-Wan, and Luke. Just so, there are some of us for whom computer backup is about the deepest we’ll ever dive into the technology that Backblaze offers. For the more ambitious, however, there’s B2 Cloud Storage. Bear with us here, but, is it possible that these Master Jedis could be similar to the sysadmins and developers who so masterfully manipulate B2 to create archives, backup, compute projects, and more, in the cloud? Have the Master Jedis manipulated the force in a similar way to use it as a sort of cloud storage for their consciousness?

Force Ghosts

“If you strike me down, I shall become more powerful than you can possibly imagine.”—Obi-Wan Kenobi

Over many years, we’ve watched as force ghosts accumulate on the sidelines: First Obi-Wan, then Yoda, Anakin Skywalker, and, presumably, Luke Skywalker himself at the end of “The Last Jedi.” (Even Qui-Gon Jinn evidently figured it out after some post-mortem education.) If our base level theory that Star Wars is actually an extended metaphor for the importance of a good backup strategy, then who better to redeem the atrocious backup track record so far than the strongest Jedi the galaxy has ever known? In backing themselves up to the cloud, does “Team Force Ghost” actually present a viable recovery strategy from Darth Sidious’ unbalancing of the force? If so, we could be witnessing one of the greatest arguments for cloud storage and computing ever imagined!

“Long have I waited…”—Darth Sidious

Of course, there’s a flip-side to this argument. If our favorite Jedi Masters were expert practitioners of cloud storage solutions, then how the heck did someone as evil as Darth Sidious find himself alive after falling to his death in the second Death Star’s reactor core? Well, there is precedent for Sith Masters’ improbable survival after falling down lengthy access shafts. Darth Maul survived being tossed down a well and being cut in half by Obi-Wan when Darth Vader was just a glimmer in Anakin Skywalker’s eye. But that was clearly a case of conveniently cauterized wounds and some amazing triage work. No, given the Imperial Fleet’s response to Darth Sidious’ death, the man was not alive at the end of the Battle of Endor by any conventional definition.

One thing we do know, thanks to Qui-Gon’s conversations with Yoda after his death, is that Dark Siders can’t become force ghosts. In short, to make the transition, one has to give in to the will of the Force—something that practitioners of the Dark Side just can’t abide.

Most theories point to the idea that the Sith can bind themselves to objects or even people during death as a means of lingering among the living. And of course there is the scene in “Revenge of the Sith” wherein Darth Sidious (disguised as Sheev Palpatine) explains how Darth Plagueis the Wise learned to cheat death. How, exactly, this was achieved is unclear, but it’s possible that his method was similar to other Sith. This is why, many speculate, we see our intrepid heroes gathering at the wreckage of the second Death Star: Because Darth Sidious’ body is tied, somehow, to the wreckage. Classic! Leave it up to old Sidious to count on a simple physical backup, in the belief that he can’t trust the cloud…

Frustrated Darth Sidious
That feeling when you realize you’re not backed up to the cloud…

You Are One With The Force, And The Force Is With You

Are we certain how the final battle of the Star Wars story will shape up? Will Light Side force wielders using Cloud Storage to restore their former power, aid Rey and the rest of our intrepid heroes, and defeat the Sith, who have foolishly relied on on-prem storage? No, we’re not, but from our perspective it seems likely that, when the torch was passed, George Lucas sat J.J. Abrams down and said, “J.J., let me tell you what Star Wars is really all about… data storage.”

We are certain, however, that data security and backup doesn’t need to be a battle. Develop a strategy that works for you, make sure your data is safe and sound, and check it once in awhile to make sure it’s up to date and complete. That way, just like the Force, your data will be with you, always.

The post Backing Up the Death Star: How Cloud Storage Explains the Rise of Skywalker appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How Backblaze Buys Hard Drives

Post Syndicated from Patrick Thomas original

A hand holding hard drives up.

Backblaze’s data centers may not be the biggest in the world of data storage, but thanks to some chutzpah, transparency, and wily employees, we’re able to punch well above our weight when it comes to purchasing hard drives. No one knows this better than our Director of Supply Chain, Ariel Ellis.

As the person on staff ultimately responsible for sourcing the drives our data centers need to run—some 117,658 by his last count—Ariel knows a thing or two about purchasing petabytes-worth of storage. So we asked him to share his insights on the evaluation and purchasing process here at Backblaze. While we’re buying at a slightly larger volume than some of you might be, we hope you find Ariel’s approach useful and that you’ll share your own drive purchasing philosophies in the comments below.

An Interview with Ariel Ellis, Director of Supply Chain at Backblaze

Sourcing and Purchasing Drives

Backblaze: Thanks for making time, Ariel—we know staying ahead of the burn rate always keeps you busy. Let’s start with the basics: What kinds of hard drives do we use in our data centers, and where do we buy them?

Ariel: In the past, we purchased both consumer and enterprise hard drives. We bought the drives that gave us the best performance and longevity for the price, and we discovered that, in many cases, those were consumer drives.

Today, our purchasing volume is large enough that consumer drives are no longer an option. We simply can’t get enough. High capacity drives in high volume are only available to us in enterprise models. But, by sourcing large volume and negotiating prices directly with each manufacturer, we are able to achieve lower costs and better performance than we could when we were only buying in the consumer channel. Additionally, buying directly gives us five year warranties on the drives, which is essential for our use case.

We began to purchase direct around the launch of our Vault architecture, in 2015. Each Vault contains 1,200 drives and we have been deploying two to four, or more, Vaults each month. 4,800 drives are just not available through consumer distribution. So we now purchase drives from all three hard drive manufacturers: Western Digital, Toshiba, and Seagate.

Backblaze: Of the drives we’re purchasing, are they all 7200 RPM and 3.5” form factor? Is there any reason we’d consider slower drives or 2.5” drives?

Ariel: We use drives with varying speeds, though some power-conserving drives don’t disclose their drive speed. Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.

As far as 2.5″ drives, I’ve run the math and they’re not more cost effective than 3.5″ drives, so there’s no incentive for us to use them.

Backblaze: What about other drive types and modifications, like SSD, or helium enclosures, or SMR drives? What are we using and what have we tried beyond the old standards?

Ariel: When I started at Backblaze, SSDs were more than ten times the cost of conventional hard drives. Now they’re about three times the cost. But for Backblaze’s business, three times the cost is not viable for the pricing targets we have to meet. We do use some SSDs as boot drives, as well as in our backend systems, where they are used to speed up caching and boot times, but there are currently no flash drives in our Storage Pods—not in HDD or M.2 formats. We’ve looked at flash as a way to manage higher densities of drives in the future and we’ll continue to evaluate their usefulness to us.

Helium has its benefits, primarily lower power draw, but it makes drive service difficult when that’s necessary. That said, all the drives we have purchased that are larger than 8 TB have been helium—they’re just part of the picture for us. Higher capacity drives, sealed helium drives, and other new technologies that increase the density of the drives are essential to work with as we grow our data centers, but they also increase drive fragility, which is something we have to manage.

SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.

Soon we’ll be dealing with MAMR and HAMR drives as well. We plan to test both technologies in 2020. We’re also testing interesting new tech like Seagate’s MACH.2 Multi Actuator, which allows the host to request and receive data simultaneously from two areas of the drive in parallel, potentially doubling the input/output operations per second (IOPS) performance of each individual hard drive. This offsets issues of reduced data availability that would otherwise arise with higher drive capacities. The drive also can present itself as two independent drives. For example, a 16 TB drive can appear as two independent 8 TB drives. A Vault using 60 drives per pod could present as 120 drives per pod. That offers some interesting possibilities.

Backblaze: What does it take to deploy a full vault, financially speaking? Can you share the cost?

Ariel: The cost to deploy a single vault varies between $350,000 to $500,000, depending on the drive capacities being used. This is just the purchase price though. There is also the cost of data center space, power to house and run the hardware, the staff time to install everything, and the bandwidth used to fill it. All of that should be included in the total cost of filling a vault.

Data center cold aisle
These racks don’t fill themselves.

Evaluating and Testing New Drive Models

Backblaze: Okay, so when you get to the point where the tech seems like it will work in the data center, how do you evaluate new drive models to include in the Vaults?

Ariel: First, we select drives that fit our cost targets. These are usually high capacity drives being produced in large volumes for the cloud market. We always start with test batches that are separate from our production data storage. We don’t put customers’ data on the test drives. We evaluate read/write performance, power draw, and generally try to understand how the drives will behave in our application. Once we are comfortable with the drive’s performance, we start adding small amounts to production vaults, spread across tomes in a way that does not sacrifice parity. As drive capacities increase, we are putting more and more effort into this qualification process.

We used to be able to qualify new drive models in thirty days. Now we typically take several months. On one hand, this is because we’ve added more steps to pre- and post-production testing. As we scale up, we need to scale up our care, because the effect of any issues with drives increases in line with bigger and bigger implementations. Additionally, from a simple physics perspective, a vault that uses high capacity drives takes longer to fill and we want to monitor the new drive’s performance throughout the entire fill period.

Backblaze: When it comes to the evaluation of the cost, is there a formula for $/terabyte that you follow?

Ariel: My goal is to reduce cost per terabyte on a quarterly basis—in fact, it’s a part of how my job performance is evaluated. Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years. That savings is achieved in three primary ways: 1) lowering the actual cost of drives by negotiating with vendors, 2) occasionally moving to higher drive densities, and 3) increasing the slot density of pod chassis. (We moved from 45 drives to 60 drives in 2016, and as we look toward our next Storage Pod version we’ll consider adding more slots per chassis).

Backblaze Director of Supply Chain holding World's Largest SSD Nimbus Data ExaDrive DC100 100TB
Backblaze Director of Supply Chain, considering the future…

Meeting Storage Demand

Backblaze: When it comes to how this actually works in our operating environment, how do you stay ahead of the demand for storage capacity?

Ariel: We maintain several months of the drive space that we would need to meet capacity based on predicted demand from current customers as well as projected new customers. Those buffers are tied to what we expect will be the fill-time of our Vaults. As conditions change, we could decide to extend those buffers. Demand could increase unexpectedly, of course, so our goal is to reduce the fill-time for Vaults so we can bring more storage online as quickly as possible, if it’s needed.

Backblaze: Obviously we don’t operate in a vacuum, so do you worry about how trade challenges, weather, and other factors might affect your ability to obtain drives?

Ariel: (Laughs) Sure, I’ve got plenty to worry about. But we’ve proved to be pretty resourceful in the past when we’re challenged. For example: During the worldwide drive shortage, due to flooding in Southeast Asia, we recruited an army of family and friends to buy drives all over and send them to us. That kept us going during the shortage.

We are vulnerable, of course, if there’s a drive production shortage. Some data center hardware is manufactured in China, and I know that some of those prices have gone up. That said, all of our drives are manufactured in Thailand or Taiwan. Our Storage Pod chassis are made in the U.S.A. Big picture, we try to anticipate any shortages and plan accordingly if we can.

A pile of consumer hard drives still in packaging
A Hard Drive Farming Harvest.

Data Durability

Backblaze: Time for a personal question… What does data durability mean to you? What do you do to help boost data durability, and spread drive hardware risk and exposure?

Ariel: That is personal. (Laughs). But also a good question, and not really personal at all: Everyone at Backblaze contributes to our data durability in different ways.

My role in maintaining eleven nines of durability is, first and foremost: Never running out of space. I achieve this by maintaining close relationships with manufacturers to ensure production supply isn’t interrupted; by improving our testing and qualification processes to catch problems before drives ever enter production; and finally by monitoring performance and replacing drives before they fail. Otherwise it’s just monitoring the company’s burn rates and managing the buffer between our drive capacity and our data under management.

When we are in a good state for space considerations, then I need to look to the future to ensure I’m providing for more long-term issues. This is where iterating on and improving our Storage Pod design comes in. I don’t think that gets factored into our durability calculus, but designing for the future is as important as anything else. We need to be prepared with hardware that can support ever-increasing hard drive capacities—and the fill- and rebuild times that come with those increases—effectively.

Backblaze: That begs the next question: As drive sizes get larger, rebuild times get longer when it’s necessary to recover data on a drive. Is that still a factor, given Backblaze’s durability architecture?

Ariel: We attempt to identify and replace problematic drives before they actually fail. When a drive starts failing, or is identified for replacement, the team always attempts to restore as much data as possible off of it because that ensures we have the most options for maintaining data durability. The rebuild times for larger drives are challenging, especially as we move to 16TB and beyond. We are looking to improve the throughput of our Pods before making the move to 20TB in order to maintain fast enough rebuild times.

And then, supporting all of this is our Vault architecture, which ensures that data will be intact even if individual drives fail. That’s the value of the architecture.

Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.

But now we’re preparing for how we’ll handle 18 and 20 TB drives, and improving system throughput will be extremely important to manage that density. We may even consider using SAS drives even though they are slightly more expensive. We need to consider all options in order to meet our scaling, durability and cost targets.

Hard drives in wrappers
A Vault in the Making.

Backblaze’s Relationship with Drive Manufacturers

Backblaze: So, there’s an elephant in the room when it comes to Backblaze and hard drives: Our quarterly Hard Drive Stats reports. We’re the only company sharing that kind of data openly. How have the Drive Stats blog posts affected your purchasing relationship with the drive manufacturers?

Ariel: Due to the quantities we need and the visibility of the posts, drive manufacturers are motivated to give us their best possible product. We have a great purchasing relationship with all three companies and they update us on their plans and new drive models coming down the road.

Backblaze: Do you have any sense for what the hard drive manufacturers think of our Drive Stats blog posts?

Ariel: I know that every drive manufacturer reads our Drive Stats reports, including very senior management. I’ve heard stories of company management learning of the release of a new Drive Stats post and gathering together in a conference room to read it. I think that’s great.

Ultimately, we believe that Drive Stats is good for consumers. We wish more companies with large data centers did this. We believe it helps keep everyone open and honest. The adage is that competition is ultimately good for everyone, right?

It’s true that Western Digital, at one time, was put off by the visibility Drive Stats gave into how their models performed in our data centers (which we’ve always said is a lot different from how drives are used in homes and most businesses). Then they realized the marketing value for them—they get a lot of exposure in the blog posts—and they came around.

Backblaze: So, do you believe that the Drive Stats posts give Backblaze more influence with drive manufacturers?

Ariel: The truth is that most hard drives go directly into tier-one and -two data centers, and not into smaller data centers, homes, or businesses. The manufacturers are stamping out drives in exabyte chunks. A single tier-one data center consumes maybe 500,000 times what Backblaze does in drives. We can’t compare in purchasing power to those guys, but Drive Stats does give us visibility and some influence with the manufacturers. We have close communications with the manufacturers and we get early versions of new drives to evaluate and test. We’re on their radar and I believe they value their relationship with us, as we do with them.

Backblaze: A final question. In your opinion, are hard drives getting better?

Ariel: Yes. Drives are amazingly durable for how hard they’re used. Just think of the forces inside a hard drive, how hard they spin, and how much engineering it takes to write and read the data on the platters. I came from a background in precision optics, which requires incredibly precise tolerances, and was shocked to learn that hard drives are designed in an equally precise tolerance range, yet are made in the millions and sold as a commodity. Despite all that, they have only about a 2% annual failure rate in our centers. That’s pretty good, I think.

Thanks, Ariel. Here’s hoping the way we source petabytes of storage has been useful for your own terabyte, petabyte, or… exabyte storage needs? If you’re working on the latter, or anything between, we’d love to hear about what you’re up to in the comments.

The post How Backblaze Buys Hard Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Entrepreneur’s Guide to Backing Up: Startup Edition

Post Syndicated from Patrick Thomas original

a vault with Backblaze Storage Pods inside

Editor’s Note: At Backblaze, the entrepreneurial spirit is in our DNA: Our founders started out in a cramped Palo Alto apartment, worked two years without pay, and bootstrapped their way to the stable, thriving startup we are today. We’ve even written a series about some of the lessons we learned along the way.

But Backblaze is a sample size of one, so we periodically reach out to other experts to expand our startup test cases and offer more insights for you. In that regard, we’re happy to invite Lars Lofgren to the blog.

As the CEO of Quick Sprout—a business focused on helping founders launch and optimize their businesses—Lars is a wellspring of case studies on how businesses both do and do not succeed. We asked Lars for advice on one subject near and dear to our hearts: Business Backup. He’s boiled down his learnings into a quick guide for backing up your business today. We hope you find Lars’ guidance and stories useful—if you have any tips or experience with business backup please share them in the comments below.

Proof of Backblaze's start in a Palo Alto Apartment
Backblaze’s first office. All computers pictured were definitely backed up.

How to Make Your Business Unbreakable

by Lars Lofgren, CEO, Quick Sprout

Launching a new business is thrilling. As someone who has been in on the ground floor of several startups, there’s nothing else like it: You’re eager to get started and growth seemingly can’t come soon enough.

But in the midst of all of this excitement, there’s a to-do list that’s one-thousand tasks deep. You’ve already gone through the tedious process of registering your business, applying for an EIN, opening new bank accounts, and launching your website. So many entrepreneurs want to dive right into generating revenue, but there’s still a ton to do.

Backing anything up is usually the last thing anyone wants to think about. But backups need to be a priority for every new business owner, because losing precious data or records could be beyond detrimental to your success. A computer accident, office fire, flood, ransomware attack, or some other unforeseen calamity could set you back months, years, or in many cases, end your business entirely. Earlier this week, I watched a fire in my neighborhood completely engulf a building. Three businesses went up in smoke and the fire department declared a total loss for all three. I hope they had backups.

Spending a little time on backups in the early stages of your launch won’t just save your company from disaster, it will make your business unbreakable.

Even if your company has been in business for a while, there’s still time for you to implement a data backup plan before it’s too late. And knowing what to back up will save you time, money, and countless headaches. Here are six simple steps to guide you:

Backing Up Hard Drives

Hard drives are like lightbulbs. It’s not a matter of if they will go out, it’s a matter of when.

Drives Have 3 Distinct Failure Rates

As more time passes, there becomes a greater chance that your hard drives will fail. For those of you who are interested in learning more about hard drive failures, Andy Klein, Backblaze’s Director of Compliance, recently published the most recent hard drive statistics here.

Take a moment to think about all of the crucial information that’s been compiled on your hard drive over the last few months. Now, imagine that information getting wiped clean. One morning you wake up and it’s just gone without a trace. In the blink of an eye, you’re starting from nothing. It’s a scary thought and I’ve seen it happen to too many people. Losing files at the wrong moment could cause you to miss out on a critical deal or delay major projects indefinitely. Timing is everything.

So when it comes to your hard drives, you need to set up some type of daily backups as soon as possible. Whatever backup tool you decide to go with, just make sure you’re fully covered and prepared for the worst. The goal is to be able to fully recover a hard drive at a moment’s notice.

Once you’ve covered that first step, consider adding a cloud backup solution. Cloud storage is much more reliable than a series of physical backup drives.

Backing Up Email

I would be lost without email.

For me, this might actually be the most important part of my business to back up. My email includes all of my contacts, my entire work history, and the logins for all of my accounts. Everything I do on a day-to-day basis stems from my email. You might not rely on it as heavily as I do, but I’m sure that email still plays a crucial role in your business.

Today, most of us are already using cloud services, like G Suite, so we rarely think about backing up our email. And it’s true that your email won’t be lost if your computer gets damaged or your hard drive fails. But if you lost access to your login or your email account was corrupted, it would be devastating.

And it does happen. I’ve come across a few folks who were locked out of their email accounts by Google with no explanation. I’m sure there are bad actors out there abusing Google’s tools, but it’s also very possible for accounts to be accidentally shut down, too.

Even normal business operations result in lost email and documents. If your business has employees, put this at the top of your priority list. Any turnover usually results in losing that employee’s email history. For the most part, their emails will be deleted when the user is removed from your system, but there’s a good chance that you’re going to need access to those emails. Just because that employee is gone, it doesn’t mean that their responsibilities disappear.

While it’s possible to export your G Suite data, you’d then be on the hook for doing this regularly and storing your exports securely. In my opinion, this requires too much manual work and leaves room for error.

I’d recommend going through the G Suite Marketplace to find an app that can handle all of your backups automatically in the cloud. (Editor’s note: For the easiest, most reliable solution, we recommend Google Vault.) Once you set this up, you’ll never have to worry about your G Suite data again. Even if it somehow gets corrupted, you’ll always be able to restore it quickly.

What about Office 365 and Outlook? It’s easy to backup Outlook manually by exporting your entire inbox. There are also ways to back up your company’s email with Exchange Online. The best method will depend on your exact implementation of Outlook at your company.

For those of you managing email on your own network who don’t plan to move to a cloud-based email service, just ensure your existing backups cover your email or find a way to ensure they do as soon as possible.

Backing Up Your Website

If your website goes down, or, even worse, you become a victim of malware, you’ll lose the lifeblood of your business: new customers.

People hack websites all the time in order to spread viruses and malware. Small businesses and startups are an easy target for cybercriminals because their sites often aren’t protected as well as those of larger companies. If something horrible like this happens, you’ll need to reset your entire site to protect your business, customers, and website visitors.

This process is a whole lot easier when you have website backups. So, when you create your website, make daily backups a priority from the outset. Start with your web host. Contact them to see what kind of backups they offer. Their answer could ultimately sway you to use one web host over another. For those of you who are using WordPress, there are lots of different plugins that offer regular backups. I’ve reviewed the best options and covered this topic more extensively here.

Generally speaking, website backups will not be free. But paying for a high-quality backup solution is well worth the cost, and far less expensive than the price of recovering from a total loss without backups.

This will also protect you and your employees from the fallout of launching a bug that accidentally brings the whole website down. Unfortunately, this happens more often than any of us would like to admit. Backups make this an embarrassing error, rather than a fatal one.

Backing Up Paperwork

Being 100% paper-free isn’t always an option. Even though the vast majority of documentation has transitioned to digital, there are still some forms that stubbornly remain in paper. No matter how hard I try, I still get stuck with paper documents for physical receipts, some tax filings, and some government paperwork.

When you launch your business, you will generate a batch of paper records that will slowly grow over time. Keeping these papers neatly organized in a filing cabinet is important, but this only helps with storage. Paper documents are still vulnerable to theft, flooding, fire, and other physical damage. So why not just digitize them quickly and be done with it? Not only will this free up extra space around the office, but it will also give you peace of mind about losing your files in a catastrophe.

The easiest way to back up your paperwork is to get a scanner, scan your documents, and then upload them to the cloud with the rest of your files. You can forget about them until they’re necessary.

It’s in your best interest to do this with your existing paper files immediately. Then make it part of your process whenever you get physical paperwork. If you wait too long, not only are you susceptible to losing important files, but the task will only grow more tedious and time-consuming.

Backing Up Processes

Not many companies think about it, but not backing up processes has easily caused me the most grief out of any other item in this post. In a perfect world, all of your staff will give you plenty of notice before they leave. This will give you time to fill the position and have that employee train the next person in their remaining weeks. But you and I both know that the world isn’t perfect.

Things happen. Employees leave on the spot or do something egregious that results in an immediate firing. Not everyone leaving your business will end on good terms, so you can’t bank on them being helpful during a transitional period. And when people leave your company, their knowledge is lost forever.

If those processes aren’t written down, training someone else can be extremely difficult, and nearly impossible if a top-tier employee leaves. The only way to prevent this is by turning all processes into standard operating procedures, better known as SOPs. Then you just need to store these SOPs somewhere that is also backed up, whether that is your hard drive (as mentioned above) or in a project management tool like Confluence, Notion, or even a folder in your Google Drive. As long as you have your SOPs saved on some sort of cloud backup solution, they’ll always be there when you need to access them.

Backing Up Software Databases

If you run a software business or use software for any internal tools, you need to get backups set up for all of your databases. Similar to your hard drives, sooner or later one of them will go down.

When I was at KISSmetrics, we had an engineer shut down our core database for the entire product by accident. When someone makes a mistake like that they don’t always act rationally. Instead of notifying management immediately, this engineer walked away and went to bed. The database was down overnight until the following morning. While we had some backups, we still lost about twelve hours worth of customer data. Without those backups, it would have been even worse.

The more critical the database, the more robust the backup solution needs to be. As I said before, you need to plan for the worst. Sometimes a daily backup might not be good enough if the database is super critical. If you can’t afford to lose 24 hours worth of information, then you’ll need a solution that backs up at the frequency your business requires.

Work with your engineering team to make sure all core functionality is completely redundant. Customers can tolerate their login page being down for a short period, but they won’t tolerate permanent data loss.

Final Thoughts on Business Backup

I know, your list of things to do when you start a new business just got longer! But backing up your data, files, and other important information is crucial for every business across all industries. You can’t operate under the assumption that you’re immune from these pitfalls. Sooner or later, they happen to all of us. Whether it be physical damage to a hard drive, theft to a computer, human error, or a malicious attack against your website, you must limit your exposure.

But good news: Once your backups are in place, your business will be unbreakable.

Editor’s Note: If you’ve read this far, you’re likely very serious about backing up your business—or maybe you’re just passionate about the process? Either way, Lars has outlined a lot of the “whys” and plenty of good “hows” for you here, but we’d love to help you tick a few things off of your list. Here are a few notes for how you can implement Lars’ advice using Backblaze:

Backing up your…

…Hard Drives:

This is an easy one: backup is the core of what we do, and backing up your computers, and your hard drives are the easiest first step you’ll take. And now, if you opt for Forever Version History, you only need to hook up your older drives once.

…Email… and Paperwork, Process, and Database:

If your email is already with a cloud service, you’ve got one backup, but if you are using Outlook, Apple Mail, or other applications storing email locally on your computer, Backblaze will automatically back those up.


As Lars mentioned, a lot of hosting services offer backup options. But especially if you’re looking for WordPress backups, we have you covered.

Another option to consider is using Cloudflare or other caching services to prevent “soft downtime.” If you’ve engaged with Backblaze, we have a partnership with Cloudflare to make this solution easier.

Now that you’re all backed up and have some extra time and peace of mind, we’d love to hear more about your business: How does your infrastructure help you succeed?

The post The Entrepreneur’s Guide to Backing Up: Startup Edition appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

SMART Stats Exposed — a Drive Stats Remix

Post Syndicated from Patrick Thomas original

SMART Stats On Trial

Editor’s Note:  Since 2013, Backblaze has published statistics and insights based on the hard drives in our data centers. Why? Well, we like to be helpful, and we thought sharing would help others who rely on hard drives but don’t have reliable data on performance to make informed purchasing decisions. We also hoped the data might aid manufacturers in improving their products. Given the millions of people who’ve read our Hard Drive Stats posts and the increasingly collaborative relationships we have with manufacturers, it seems we might have been right.

But we don’t only share our take on the numbers, we also provide the raw data underlying our reports so that anyone who wants to can reproduce them or draw their own conclusions, and many have. We love it when people reframe our reports, question our logic (maybe even our sanity?), and provide their own take on what we should do next. That’s why we’re featuring Ryan Smith today.

Ryan has held a lot of different roles in tech, but lately he’s been dwelling in the world of storage as a product strategist for Hitachi. On a personal level, he explains that he has, “passion for data, finding insights from data, and helping others see how easy and rewarding it can be to look under the covers.” It shows.

A few months ago we happened on a post by Ryan with an appealing header featuring our logo with an EXPOSED stamp superimposed in red over our humble name. It looked like we had been caught in a sting operation. As a company that loves transparency, we were delighted. Reading on we found a lot to love and plenty to argue over, but more than anything, we appreciated how Ryan took data we use to analyze hard drive failure rates and extrapolated out all sorts of other gleanings about our business. As he puts it, “it’s not the value at the surface but the story that can be told by tying data together.” So, we thought we’d share his original post with you to (hopefully) incite some more arguments and some more tying together of data.

While we think his conclusions are reasonable based on the data available to him, the views and analysis below are entirely Ryan’s. We appreciate how he flagged some areas of uncertainty, but thought it most interesting to share his thoughts without rebuttal. If you’re curious about how he reached them, you can find his notes on process here. He doesn’t have the full story, but we think he did amazing work with the public data.

Our 2019 Q3 Hard Drive Stats post will be out in a few weeks, and we hope some of you will take Ryan’s lead and do your own deep dive into the reporting when it’s public. For those of you who can’t wait, we’re hoping this will tide you over for a little while.

If you’re interested in taking a look at the data yourselves, here’s our Hard Drive Data and Stats webpage that has links to all our past Hard Drive Stats posts and zip files of the raw data.

Ryan Smith Uses Backblaze’s SMART Stats to Illustrate the Power of Data

(Originally published July 8, 2019 on

It is now common practice for end-customers to share telemetry (call home) data with their vendors. My analysis below shares some insights about your business that vendors might gain from seemingly innocent data that you are sending them every day.

On a daily basis, Backblaze (a cloud backup and storage provider) logs all its drive health data (aka SMART data) for over 100,000 of its hard drives. With 100K+ records a day, each year can produce over 30 million records. They share this raw data on their website, but most people probably don’t really dig into it much. I decided to see what this data could tell me and what I found was fascinating.

Rather than looking at nearly 100 million records, I decided to only look at just over one million which consisted of the last day of every quarter from Q1’16 to Q1’19. This would give me enough granularity to see what is happening inside Backblaze’s cloud backup storage business. For those interested, I used MySQL to import and transform the data into something easy to work with (see more details on my SQL query); I then imported the data into Excel where I could easily pivot the data and look for insights. Below are the results of this effort.

Capacity Growth

User Data vs Physical Capacity

User Data Stored vs Physical Capacity

I grabbed the publicly posted “Petabytes stored” that BackBlaze claims on their website (“User Petabytes”) and compared that to the total capacity from the SMART data they log (“Physical Petabytes”) and then compared them against each other to see how much overhead or unused capacity they have. The Theoretical Max (green line) is based on their ECC protection scheme (13+2 and/or 17+3) that they use to protect user data. If the “% User Petabytes” is below that max then this means Backblaze either has unused capacity or they didn’t update their website with the actual data stored.

Data Read/Written vs Capacity Growth

Reads/Writes versus Capacity Growth (Year-over-Year)

Looking at the last two years, by quarter, you can see a healthy amount of year-over-year growth in their write workload; roughly 80% over the last four quarters! This is good since writes likely correlate with new user data, which means broader adoption of their offering. For some reason their read workloads spiked in Q2’17 and have maintained a higher read workload since then (as indicated by the YoY spikes from Q2’17 to Q1’18, and then settling back to less than 50% YoY since); my guess is this was likely driven by a change to their internal workload rather than a migration because I didn’t see subsequent negative YoY reads.


Now let’s look at some performance insights. A quick note: Only Seagate hard drives track the needed information in their SMART data in order to get insights about performance. Fortunately, roughly 80% of Backblaze’s drive population (both capacity and units) are Seagate so it’s a large enough population to represent the overall drive population. Going forward, it does look like the new 12 TB WD HGST drive is starting to track bytes read/written.

Pod (Storage Enclosure) Performance

Pod (Hard Drive Enclosure) Performance

Looking at Power-on-hours of each drive, I was able to calculate the vintage of each drive and the number of drives in each “pod” (this is the terminology that Backblaze gives to its storage enclosures). This lets me calculate the number of pods that Backblaze has in its data centers. Their original pods stored 45 drives and this improved to 60 drives in ~Q2’16 (according to past blog posts by Backblaze). The power-on-date allowed me to place the drive into the appropriate enclosure type and provide you with pod statistics like the Mbps per pod. This is definitely an educated guess as some newer vintage drives are replacement drives into older enclosures but the overall percentage of drives that fail is low enough to where these figures should be pretty accurate.

Backblaze has stated that they can achieve up to 1 Gbps per pod, but as you can see they are only reaching an average throughput of 521 Mbps. I have to admit I was surprised to see such a low performance figure since I believe their storage servers are equipped with 10 Gbps ethernet.

Overall, Backblaze’s data centers are handling over 100 GB/s of throughput across all their pods which is quite an impressive figure. This number keeps climbing and is a result of new pods as well as overall higher performance per pod. From quick research, this is across three different data centers (Sacramento x 2, Phoenix x 1) and maybe a fourth on its way in Europe.

Hard Drive Performance

Hard Drive Read/Write Performance

Since each pod holds between 45 and 60 drives, with an overall max pod performance of 1 Gbps, I wasn’t surprised to see such average low drive performance. You can see that Backblaze’s workload is read heavy with less than 1 MB/s and writes only a third of that. Just to put that in perspective, these drives can deliver over 100 MB/s, so Backblaze is not pushing the limits of these hard drives.

As discussed earlier, you can also see how the read workload changed significantly in Q2’17 and has not reverted back since.

Seagate Hard Drive Read/Write Performance, by Density

As I expected, the read and write performance is highly correlated to the drive capacity point. So, it appears that most of the growth in read/write performance per drive is really driven by the adoption of higher density drives. This is very typical of public storage-as-a-service (STaaS) offerings where it’s really about $/GB, IOPS/GB, MBs/GB, etc.

As a side note, the black dashed lines (average between all densities) should correlate with the previous chart showing overall read/write performance per drive.


Switching gears, let’s look at Backblaze’s purchasing history. This will help suppliers look at trends within Backblaze to predict future purchasing activities. I used power-on-hours to calculate when a drive entered the drive population.

Hard Drives Purchased by Density, by Year

Hard Drives Purchased by Capacity

This chart helps you see how Backblaze normalized on 4 TB, 8 TB, and now 12 TB densities. The number of drives that Backblaze purchases every year has been climbing until 2018 where it saw its first decline in units. However, this is mainly due to the efficiencies of the capacity per drive.

A question to ponder: Did 2018 reach a point where capacity growth per HDD surpassed the actual demand required to maintain unit growth of HDDs? Or is this trend limited to Backblaze?

Petabytes Purchased by Quarter

Drives/Petabytes Purchased, by Quarter

This looks at the number of drives purchased over the last five years, along with the amount of capacity added. It’s not quite regular enough to spot a trend, but you can quickly spot that the amount of capacity purchased over the last two years has grown dramatically compared to previous years.

HDD Vendor Market Share

Hard Drive Supplier Market Share

Western Digital/WDC, Toshiba/TOSYY, Seagate/STX

Seagate is definitely the preferred vendor, capturing almost 100% of the market share save for a few quarters where WD HGST wins 50% of the business. This information could be used by Seagate or its competitors to understand where it stands within the account for future bids. However, the industry is monopolistic so it’s not hard to guess who won the business if a given HDD vendor didn’t.


Drive Population by Quarter

Total Drive Population, by Quarter

This shows the total drive population over the past three years. Even though the number of drives being purchased has been falling lately, the overall drive population is still growing.

You can quickly see that 4 TB drives saw its peak population in Q1’17 and has rapidly declined. In fact, let’s look at the same data but with a different type of chart.

Total Drive Population, by Quarter

That’s better. We can see that 12 TBs really had a dramatic effect on both 4 TB and 8 TB adoption. In fact, Backblaze has been proactively retiring 4 TB drives. This is likely due to the desire to slow the growth of their data center footprint which comes with costs (more on this later).

As a drive vendor, I could use this data to use the 4 TB trend to calculate how much drive replacement will be occurring next quarter, along with natural PB growth. I will look more into Backblaze’s drive/pod retirement later.

Current Drive Population, by Deployed Date

Q1'2019 Drive Population, by Deployed Date

Be careful when interpreting this graph. What we are looking at here is the Q1’19 drive population where the date on the x-axis is the date the drive entered the population. This helps you see of all the drives in Backblaze’s population today, in which the oldest drives are from 2015 (with the exception of a few stragglers).

This indicates that the useful life of drives within Backblaze’s data centers are ~4 years. In fact, a later chart will look at how drives/pods are phased out, by year.

Along the top of the chart, I noted when the 60-drive pods started entering into the mix. The rack density is much more efficient with this design (rather than the 45-drive pod). Combine this, along with the 4 TB to 12 TB efficiency, Backblaze has aggressively been retiring its 4 TB/45-drive enclosures. There is still a large population of these remaining so expect some further migration to occur.

Boot Drive Population

Total Boot Drive Population, by Quarter

This is the overall boot drive population over time. You can see that it is currently dominated by the 500 GB with only a few remaining smaller densities in the population today. For some reason, Toshiba has been the preferred vendor with Seagate only recently gaining some new business.

The boot drive population is also an interesting data point to use for verifying the number of pods in the population. For example, there were 1,909 boot drives in Q1’19 and my calculation of pods based on the 45/60-drive pod mix was 1,905. I was able to use the total boot drives each quarter to double check my mix of pods.

Pods (Drive Enclosures)

As discussed earlier, pods are the drive enclosures that house all of Backblaze’s hard drives. Let’s take a look at a few more trends that show what’s going on within the walls of their data center.

Pods Population by Deployment Date

Pods (HDD Enclosure) Population by Deployment Date

This one is interesting. Each line in the graph indicates a particular snapshot in time of the total population. And the x-axis represents the vintage of the pods for that snapshot. By comparing snapshots, this allows you to see changes over time to the population. Namely, new pods being deployed and old pods being retired. To capture this, I looked at the last day of Q1 data for the last four years and calculated the date the drives entered the population. Using the “Power On Date” I was able to deduce the type of pod (45 or 60 drive) it was deployed in.

Some insights from this chart:

  • From Q2’16 to Q1’17, they retired some pods from 2010-11
  • From Q2’17 to Q1’18, they retired a significant number of pods from 2011-14
  • From Q2’18 to Q1’19, they retired pods from 2013-2015
  • Pods that were deployed since late 2015 have been untouched (you can tell this by seeing the lines overlap with each other)
  • The most pods deployed in a quarter was 185 in Q2’16
  • Since Q2’16, the number of pods deployed has been declining, on average; this is due to the increase in # of drives per pod and density of each drive
  • There are still a significant number of 45-drive pods to retire

Pods Deployed/Retired

Total Pods (HDD Enclosure) Population

Totaling up all the new pods being deployed and retired, it is easier to see the yearly changes happening within Backblaze’s operation. Keep in mind that these are all calculations and may erroneously include drive replacements as new pods; but I don’t expect it to vary significantly from what is shown here.

The data shows that any new pods that have been deployed in the past few years have mainly been driven by replacing older, less dense pods. In fact, the pod population has plateaued at around 1,900 pods.

Total Racks

Total Racks

Based on blog posts, Backblaze’s pods are all designed at 4U (4 rack units) and pictures on their site indicate 10 pods fit in a rack; this equates to 40U racks. Using this information, along with the drive population and the power-on-date, I was able to calculate the number of pods on any given date as well as the total number of racks. I did not include their networking racks in which I believe they have two of these racks per row in their data center.

You can quickly see that Backblaze has done a great job at slowing the growth of the racks in their data center. This all results in lower costs for their customers.

Retiring Pods

What interested me when looking at Backblaze’s SMART data was the fact that drives were being retired more than they were failing. This means the cost of failures is fairly insignificant in the scheme of things. It is actually efficiencies driven by technology improvements such as drive and enclosure densities that drove most of the costs. However, the benefits must outweigh the costs. Being that Backblaze uses Sungard AS for its data centers, let’s try to visualize the benefit of retiring drives/pods.

Colocation Costs, Assuming a Given Density

Yearly Colocation Costs, Assuming One Drive Density

This shows the total capacity over time in Backblaze’s data centers, along with the colocation costs assuming all the drives were a given density. As you can see, in Q1’19 it would take $7.7M a year to pay for colocating costs of 861 PB if all the drives were 4 TB in size. By moving the entire population to 12 TB this can be reduced to $2.6M. So, just changing the drive density can have significant impacts on Backblaze’s operational costs. I did assume $45/RU costs in the analysis which their costs may be as low as $15/RU based on the scale of their operation.

I threw in 32 TB densities to illustrate a hypothetical SSD-type density so you can see the colocation cost savings by moving to SSDs. Although lower, the acquisition costs are far too high at the moment to justify a move to SSDs.

Break-Even Analysis of Retiring Pods

Break-Even Analysis of Retiring Older Pods/Drives

This chart helps illustrate the math behind deciding to retire older drives/pods based on the break-even point.

Let’s break down how to read this chart:

  • This chart is looking at whether Backblaze should replace older drives with the newer 12 TB drives
  • Assuming a cost of $0.02/GB for a 12 TB drive, that is a $20/TB acquisition cost you see on the far left
  • Each line represents the cumulative cost over time (acquisition + operational costs)
  • The grey lines (4 TB and 8 TB) all assume they were already acquired so they only represent operational costs ($0 acquisition cost) since we are deciding on replacement costs
  • The operational costs (incremental yearly increase shown) is calculated off of the $45 per RU colocation cost and how many of this drive/enclosure density fits per rack unit. The more TBs you can cram into a rack unit, the lower your colocation costs are

Assuming you are still with me, this shows that the break-even point for retiring 4 TB 4U45 pods is just over two years! And 4 TB 4U60 pods at 3 years! It’s a no brainer to kill the 4 TB enclosures and replace them with 12 TB drives. Remember that this assumes a $45RU colocation cost so the break-even point will shift to the right if the colocation costs are lower (which they surely are).
You can see that the math to replace 8 TB drives with 12 TB doesn’t make as much sense so we may see Backblaze’s retirement strategy slow down dramatically after it retires the 4 TB capacity points.

As hard drive densities get larger and $/GB decreases, I expect the cumulative costs to start lower (less acquisition cost) and rise slower (less RU operational costs) making future drive retirements more attractive. Eyeballing it, it would be once $/GB approaches $0.01/GB to $0.015/GB.

Things Backblaze Should Look Into

Top of mind, Backblaze should look into these areas:

  • The architecture around performance is not balanced; investigate having a caching tier to handle bursts and put more drives behind each storage node to reduce “enclosure/slot tax” costs.
  • Look into designs like 5U84 from Seagate/Xyratex providing 16.8 drives per RU versus the 15 being achieved on Backblaze’s own 4U60 design; Another 12% efficiency!
    • 5U allows for 8 pods to fit per rack versus the 10.
  • Look at when SSDs will be attractive to replace HDDs at a given $/GB, density, idle costs, # of drives that fit per RU (using 2.5” drives instead of 3.5”) so that they can stay on top of this trend [there is no rush on this one].
    • Performance and endurance of SSDs is irrelevant since the performance requirements are so low and the WPD is almost non-existence, making QLC and beyond a great candidate.
  • Look at allowing pods to be more flexible in handling different capacity drives to handle drive failures more cost efficiently without having to retire pods. Having concepts of “virtual pods” that don’t have physical limits will better accommodate the future that Backblaze has where it won’t be retiring pods as aggressively, yet still let them grow their pod densities seamlessly.

In Closing

It is kind of ironic that the reason Backblaze posted all their SMART data is to share insights around failures when I didn’t even analyze failures once! There is much more analysis that could be done around this data set which I may revisit as time permits.

As you can see, even simple health data from drives, along with a little help from other data sources, can help expose a lot more than you would initially think. I have long felt that people have yet to understand the full power of giving data freely to businesses (e.g. Facebook, Google Maps, LinkedIn, Mint, Personal Capital, News Feeds, Amazon). I often hear things like, “I have nothing to hide,” which indicates the lack of value they assign to their data. It’s not the value at its surface but the story that can be told by tying data together.

Until next time, Ryan Smith.

•   •   •

Ryan Smith is currently a product strategist at Hitachi Vantara. Previously, he served as the director of NAND product marketing at Samsung Semiconductor, Inc. He is extremely passionate about uncovering insights from just about any data set. He just likes to have fun by making a notable difference, influencing others, and working with smart people.

Tell us what you think about Ryan’s take on data, or better yet, give us your own! You can find all the data you would ever need on Backblaze’s Hard Drive Data and Stats webpage. Share your thoughts in the comments below or email us at

The post SMART Stats Exposed — a Drive Stats Remix appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Interview With the Podfather: Tim Nufire on the Origins of the Pod

Post Syndicated from Patrick Thomas original

Tim, the Podfather, working in data center

It’s early 2008, the concept of cloud computing is only just breaking into public awareness, and the economy is in the tank. Despite this less-than-kind business environment, five intrepid Silicon Valley veterans quit their jobs and pooled together $75K to launch a company with a simple goal: provide an easy-to-use, cloud-based backup service at a price that no one in the market could beat — $5 per month per computer.

The only problem: both hosted storage (through existing cloud services) and purchased hardware (buying servers from Dell or Microsoft) were too expensive to hit this price point. Enter Tim Nufire, aka: The Podfather.

Tim led the effort to build what we at Backblaze call the Storage Pod: The physical hardware our company has relied on for data storage for more than a decade. On the occasion of the decade anniversary of the open sourcing of our Storage Pod 1.0 design, we sat down with Tim to relive the twists and turns that led from a crew of backup enthusiasts in an apartment in Palo Alto to a company with four data centers spread across the world holding 2100 storage pods and closing in on an exabyte of storage.

✣   ✣   ✣

Editors: So Tim, it all started with the $5 price point. I know we did market research and that was the price at which most people shrugged and said they’d pay for backup. But it was so audacious! The tech didn’t exist to offer that price. Why do you start there?

Tim Nufire: It was the pricing given to us by the competitors, they didn’t give us a lot of choice. But it was never a challenge of if we should do it, but how we would do it. I had been managing my own backups for my entire career; I cared about backups. So it’s not like backup was new, or particularly hard. I mean, I firmly believe Brian Wilson’s (Backblaze’s Chief Technical Officer) top line: You read a byte, you write a byte. You can read the byte more gently than other services so as to not impact the system someone is working on. You might be able to read a byte a little faster. But at the end of the day, it’s an execution game not a technology game. We simply had to out execute the competition.

E: Easy to say now, with a company of 113 employees and more than a decade of success behind us. But at that time, you were five guys crammed into a Palo Alto apartment with no funding and barely any budget and the competition — Dell, HP, Amazon, Google, and Microsoft — they were huge! How do you approach that?

TN: We always knew we could do it for less. We knew that the math worked. We knew what the cost of a 1 TB hard drive was, so we knew how much it should cost to store data. We knew what those markups were. We knew, looking at a Dell 2900, how much the margin was in that box. We knew they were overcharging. At that time, I could not build a desktop computer for less than Dell could build it. But I could build a server at half their cost.

I don’t think Dell or anyone else was being irrational. As long as they have customers willing to pay their hard margins, they can’t adjust for the potential market. They have to get to the point where they have no choice. We didn’t have that luxury.

So, at the beginning, we were reluctant hardware manufacturers. We were manufacturing because we couldn’t afford to pay what people were charging, not because we had any passion for hardware design.

E: Okay, so you came on at that point to build a cloud. Is that where your title comes from? Chief Cloud Officer? The pods were a little ways down the road, so Podfather couldn’t have been your name yet. …

TN: This was something like December, 2007. Gleb (Budman, the Chief Executive Officer of Backblaze) and I went snowboarding up in Tahoe, and he talked me into joining the team. … My title at first was all wrong, I never became the VP of Engineering, in any sense of the word. That was never who I was. I held the title for maybe five years, six years before we finally changed it. Chief Cloud Officer means nothing, but it fits better than anything else.

E: It does! You built the cloud for Backblaze with the Storage Pod as your water molecule (if we’re going to beat the cloud metaphor to death). But how does it all begin? Take us back to that moment: the podception.

TN: Well, the first pod, per se, was just a bunch of USB drives strapped to a shelf in the data center attached to two Dell 2900 towers. It didn’t last more than an hour in production. As soon as it got hit with load, it just collapsed. Seriously! We went live on this and it lasted an hour. It was a complete meltdown.

Two things happened: The bus was completely unstable, so the USB drives were unstable. Second, the DRDB (Distributed Replicated Block Device) — which is designed to protect your data by live mirroring it between the two towers — immediately fell apart. You implement a DRDB not because it works in a well-running situation, but because it covers you in the failure mode. And in failure mode it just unraveled — in an hour. It went into a split-brain mode under the hardware failures that the USB drives were causing. A well-running DRDB is fully mirrored, and split-brained mode is when the two sides simply give up and start acting autonomously because they don’t know what the other side is doing and they’re not sure who is boss. The data is essentially inconsistent at that point because you can choose A or B but the two sides are not in agreement.

While the USB specs say you can connect something like 256 or 128 drives to a hub, we were never able to do more than like, five. After something like five or six, the drives just start dropping out. We never really figured it out because we abandoned the approach. I just took the drives out and shoved them inside of the Dells, and those two became pods number 0 and 1. The Dells had room for 10 or 8 drives apiece, and so we brought that system live.

That was what the first six years of this company was like, just a never-ending stream of those kind of moments — mostly not panic inducing, mostly just: you put your head down and you start working through the problems. There’s a little bit of adrenaline, that feeling before a big race of an impending moment. But you have to just keep going.

Tim working on Storage Pod
Tim welcoming another Storage Pod to the family.

E: Wait, so this wasn’t in testing? You were running this live?

TN: Totally! We were in friends-and-family beta at the time. But the software was all written. We didn’t have a lot of customers, but we had launched, and we managed to recover the files: whatever was backed up. The system has always had self healing built into the client.

E: So where do you go from there? What’s the next step?

TN: These were the early days. We were terrified of any commitments. So I think we had leased a half cabinet at the 365 Main facility in San Francisco, because that was the most we could imagine committing to in a contract: We committed to a year’s worth of this tiny little space.

We had those first two pods — the two Dell Towers (0 and 1) — which we eventually built out using external exclosures. So those guys had 40 or 45 drives by the end, with these little black boxes attached to them.

Pod number 2 was the plywood pod, which was another moment of sitting in the data center with a piece of hardware that just didn’t work out of the gate. This was Chris Robertson’s prototype. I credit him with the shape of the basic pod design, because he’s the one that came up with the top loaded 45 drives design. He mocked it up in his home woodshop (also known as a garage).

E: Wood in a data center? Come on, that’s crazy, right?

TN: It was what we had! We didn’t have a metal shop in our garage, we had a woodshop in our garage, so we built a prototype out of plywood, painted it white, and brought it to the data center. But when I went to deploy the system, I ended up having to recable and rewire and reconfigure it on the fly, sitting there on the floor of the data center, kinda similar to the first day.

The plywood pod was originally designed to be 45 drives, top loaded with port multipliers — we didn’t have backplanes. The port multipliers were these little cards that took one set of cables in and five cables out. They were cabled from the top. That design never worked. So what actually got launched was a fifteen drive system that had these little five drive enclosures that we shoved into the face of the plywood pod. It came up as a 15 drive, traditionally front-mounted design with no port multipliers. Nothing fancy there. Those boxes literally have five SATA connections on the back, just a one-to-one cabling.

E: What happened to the plywood pod? Clearly it’s cast in bronze somewhere, right?

TN: That got thrown out in the trash in Palo Alto. I still defend the decision. We were in a small one-bedroom apartment in Palo Alto and all this was cruft.

Wooden pod
The plywood pod, RIP.

E: Brutal! But I feel like this is indicative of how you were working. There was no looking back.

TN: We didn’t have time to ask the question of whether this was going to work. We just stayed ahead of the problems: Pods 0 and 1 continued to run, pod 2 came up as a 15 drive chassis, and runs.

The next three pods are the first where we worked with Protocase. These are the first run of metal — the ones where we forgot a hole for the power button, so you’ll see the pried open spots where we forced the button in. These are also the first three with the port-multiplier backplane. So we built a chassis around that, and we had horrible drive instability.

We were using the Western Digital Green, 1 TB drives. But we couldn’t keep them in the RAID. We wrote these little scripts so that in the middle of the night, every time a drive dropped out of the array, the script would put it back in. It was this constant motion and churn creating a very unstable system.

We suspected the problem was with power. So we made the octopus pod. We drilled holes in the bottom, and ran it off of three PSUs beneath it. We thought: “If we don’t have enough power, we’ll just hit it with a hammer.” Same thing on cooling: “What if it’s getting too hot?” So we put a box fan on top and blew a lot of air into it. We were just trying to figure out what it was that was causing trouble and grief. Interestingly, the array in the plywood pod was stable, but when we replaced the enclosure with steel, it became unstable as well!

Storage Pod with fan
Early experiments in pod cooling.

We slowly circled in on vibration as the problem. That plywood pod had actual disk enclosure with caddies and good locking mechanisms, so we thought the lack of caddies and locking mechanisms could be the issue. I was working with Western Digital at the time, too, and they were telling me that they also suspected vibration as the culprit. And I kept telling them, ‘They are hard drives! They should work!’

At the time, Western Digital was pushing me to buy enterprise drives, and they finally just gave me a round of enterprise drives. They were worse than the consumer drives! So they came over to the office to pick up the drives because they had accelerometers and lot of other stuff to give us data on what was wrong, and we never heard from them again.

We learned later that, when they showed up in an office in a one bedroom apartment in Palo Alto with five guys and a dog, they decided that we weren’t serious. It was hard to get a call back from them after that … I’ll admit, I was probably very hard to deal with at the time. I was this ignorant wannabe hardware engineer on the phone yelling at them about their hard drives. In hindsight, they were right; the chassis needed work.

But I just didn’t believe that vibration was the problem. It’s just 45 drives in a chassis. I mean, I have a vibration app on my phone, and I stuck the phone on the chassis and there’s vibration, but it’s not like we’re trying to run this inside a race car doing multiple Gs around corners, it was a metal box on a desk with hard drives spinning at 5400 or 7200 rpm. This was not a seismic shake table!

The early hard drives were secured with EPDM rubber bands. It turns out that real rubber (latex) turns into powder in about two months in a chassis, probably from the heat. We discovered this very quickly after buying rubber bands at Staples that just completely disintegrated. We eventually got better bands, but they never really worked. The hope was that they would secure a hard drive so it couldn’t vibrate its neighbors, and yet we were still seeing drives dropping out.

At some point we started using clamp down lids. We came to understand that we weren’t trying to isolate vibration between the drives, but we were actually trying to mechanically hold the drives in place. It was less about vibration isolation, which is what I thought the rubber was going to do, and more about stabilizing the SATA connector on the backend, as in: You don’t want the drive moving around in the SATA connector. We were also getting early reports from Seagate at the time. They took our chassis and did vibration analysis and, over time, we got better and better at stabilizing the drives.

We started to notice something else at this time: The Western Digital drives had these model numbers followed by extension numbers. We realized that drives that stayed in the array tended to have the same set of extensions. We began to suspect that those extensions were manufacturing codes, something to do with which backend factory they were built in. So there were subtle differences in manufacturing processes that dictated whether the drives were tolerant of vibration or not. Central Computer was our dominant source of hard drives at the time, and so we were very aggressively trying to get specific runs of hard drives. We only wanted drives with a certain extension. This was before the Thailand drive crisis, before we had a real sense of what the supply chain looked like. At that point we just knew some drives were better than others.

E: So you were iterating with inconsistent drives? Wasn’t that insanely frustrating?

TN: No, just gave me a few more gray hairs. I didn’t really have time to dwell on it. We didn’t have a choice of whether or not to grow the storage pod. The only path was forward. There was no plan B. Our data was growing and we needed the pods to hold it. There was never a moment where everything was solved, it was a constant stream of working on whatever the problem was. It was just a string of problems to be solved, just “wheels on the bus.” If the wheels fall off, put them back on and keep driving.

E: So what did the next set of wheels look like then?

TN: We went ahead with a second small run of steel pods. These had a single Zippy power supply, with the boot drive hanging over the motherboard. This design worked until we went to 1.5TB drives and the chassis would not boot. Clearly a power issue, so Brian Wilson and I sat there and stared at the non-functioning chassis trying to figure out how to get more power in.

The issue with power was not that we were running out of power on the 12V rail. The 5V rail was the issue. All the high end, high-power PSUs give you more and more power on 12V because that’s what the gamers need — it’s what their CPUs and the graphics card need, so you can get a 1000W or a 1500W power supply and it gives you a ton of power on 12V, but still only 25 amps on 5V. As a result, it’s really hard to get more power on the 5V rail, and a hard drive takes 12V and 5V: 12V to spin the motor and 5V to power the circuit board. We were running out of the 5V.

So our solution was two power supplies, and Brian and I were sitting there trying to visually imagine where you could put another power supply. Where are you gonna put it? We can put it were the boot drive is, and move the boot drive to the side, and just kind of hang the PSU up and over the motherboard. But the biggest consequence with this was, again, vibration. Mounting the boot drive to the side of a vibrating chassis isn’t the best place for a boot drive. So we had higher than normal boot drive failures in those nine.

Storage Pod power supply
Tim holding the second power supply in place to show where it should go.

So the next generation, after pod number 8, was the beginning of Storage Pod 1.0. We were still using rubber bands, but it had two power supplies, 45 drives, and we built 20 of them, total. Casey Jones, as our designer, also weighed in at this point to establish how they would look. He developed the faceplate design and doubled down on the deeper shade of red. But all of this was expensive and scary for us: We’re gonna spend $10 grand!? We don’t have much money. We had been two years without salary at this point.

Storage Pod faceplates
Casey Jones’ faceplate vent design versions, with the final first generation below.

We talked to Ken Raab from Sonic Manufacturing, and he convinced us that he could build our chassis, all in, for less than we were paying. He would take the task off my plate, I wouldn’t have to build the chassis, and he would build the whole thing for less than I would spend on parts … and it worked. He had better backend supplier connections, so he could shave a little expense off of everything and was able to mark up 20%.

We fixed the technology and the human processes. On the technology side, we were figuring out the hardware and hard drives, we were getting more and more stable. Which was required. We couldn’t have the same failure rates we were having on the first three pods. In order to reduce (or at least maintain) the total number of problems per day, you have to reduce the number of problems per chassis, because there’s 32 of them now.

We were also learning how to adapt our procedures so that the humans could live. By “the Humans,” I mean me and Sean Harris who joined me in 2010. There are physiological and psychological limits to what is sustainable and we were nearing our wits end.… So, in addition to stabilizing the chassis design, we got better at limiting the type of issues that would wake us up in the middle of the night.

E: So you reached some semblance of stability in your prototype and in your business. You’d been sprinting with no pay for a few years to get to this point and then … you decide to give away all your work for free? You open sourced Storage Pod 1.0 on September 9th, 2009. Were you a nervous wreck that someone was going to run away with all your good work?

TN: Not at all. We were dying for press. We were ready to tell the world anything they would listen to. We had no shame. My only regret is that we didn’t do more. We open sourced our design before anyone was doing that, but we didn’t build a community around it or anything.

Remember, we didn’t want to be a manufacturer. We would have killed for someone to build our pods better and cheaper than we could. Our hope from the beginning was always that we would build our own platform until the major vendors did for the server market what they did in the personal computing market. Until Dell would sell me the box that I wanted at the price I could afford, I was going to continue to build my chassis. But I always assumed they would do it faster than a decade.

Supermicro tried to give us a complete chassis at one point, but their problem wasn’t high margin; they were targeting too high of performance. I needed two things: Someone to sell me a box and not make too much profit off of me, and I needed someone who would wrap hard drives in a minimum performance enclosure and not try to make it too redundant or high performance. Put in one RAID controller, not two; daisy chain all the drives; let us suffer a little! I don’t need any of the hardware that can support SSDs. But no matter how much we ask for barebones servers, no one’s been able to build them for us yet.

So we’ve continued to build our own. And the design has iterated and scaled with our business. So we’ll just keep iterating and scaling until someone can make something better than we can.

E: Which is exactly what we’ve done, leading from Storage Pod 1.0 to 2.0, 3.0, 4.0, 4.5, 5.0, to 6.0 (if you want to learn more about these generations, check out our Pod Museum), preparing the way for more than 800 petabytes of data in management.

Storage Pod Museum
The Backblaze Storage Pod Museum in San Mateo, California

✣   ✣   ✣

But while Tim is still waiting to pass along the official Podfather baton, he’s not alone. There was the early help from Brian Wilson, Casey Jones, Sean Harris, and a host of others, and then in 2014, Ariel Ellis came aboard to wrangle our supply chain. He grew in that role over time until he took over the responsibility over charting the future of the Pod via Backblaze Labs, becoming the Podson, so to speak. Today, he’s sketching the future of Storage Pod 7.0, and — provided no one builds anything better in the meantime — he’ll tell you all about it on our blog.

The post Interview With the Podfather: Tim Nufire on the Origins of the Pod appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.