If you’ve ever wondered about the science behind some of your favorite recipes, then you may have come across Alton Brown and his cooking show, “Good Eats.” Equal parts smart and sardonic, “Good Eats” showed its viewers how to whip up an excellent dish all while teaching about the history and science of the recipes through wacky sketches.
After the popular show ended in 2012, Brown used to tease on social media about a possible comeback. In a moment of serendipity, Eric Bigman, a seasoned video editor and long-time fan of “Good Eats,” found Brown’s business card in a stationary shop in New York City. The next time that Brown posted a hint of reviving the show, Bigman took a chance and emailed Brown directly. He got in touch at just the right time—Brown would go on to hire Bigman to help update some classic episodes as “Good Eats: Reloaded” for the Cooking Channel, and to create fresh episodes as “Good Eats: The Return” for the Food Network.
We’re sharing the story of why Bigman and the team chose to transition from Amazon S3 to Backblaze B2 Cloud Storage as a key ingredient in their infrastructure for “Good Eats: The Return” and “Good Eats: Reloaded,” how this move saved time ongoing by a factor of 100, and how it also eliminated failures worse than any overcooked egg or burnt cake.
Perfecting Recipes and Workflows
To refresh the classic episodes for “Good Eats: Reloaded,” Bigman had to blend together the old footage which had been degraded throughout the retrieval process with new widescreen, high-definition footage. Since he was adjusting to a new team and process, Bigman waited until the end of that first season to archive his data using Amazon S3 while simultaneously trying to finish post-production. He knew that uploading to S3 would take a long time—time he couldn’t spare while trying to deliver 13 episodes on a deadline.
When Bigman’s team then started production for “Good Eats: The Return,” he wanted to work toward a more fluid, integrated process. He started backing up every other week. But he was still facing the same problem as before: The data backup process to Amazon S3 was too time-consuming. What’s more is that Alton Brown was worried about their backups, too. The show meant a lot to him and he didn’t want all of his team’s hard work to disappear in some overnight mishap.
From then on, Bigman tried to back up every night, but he was growing more and more frustrated with Amazon S3. It seemed to him like nine times out of 10, he walked into a pipeline failure in the morning. Babysitting the backup process took up valuable production time during his day, and he felt he couldn’t trust AWS to complete the backup overnight.
Real-Time Solutions Are Essential in Backups, and Cooking
Bigman turned to media solutions integrator CineSys-Oceana for a better backup and archiving solution. They suggested Backblaze B2. Bigman also chose Cyberduck, a libre server and cloud storage browser that integrates with Backblaze B2, to upload data. Now, when they’re in production, Bigman keeps the Cyberduck browser open and continuously uploads to B2 Cloud Storage.
Bigman originally intended to move the show’s archives from Amazon S3 to Glacier to reduce costs. But with Backblaze B2, he doesn’t have to balance access to footage against his budget. That instant access is critical during filming. Bigman often pulls footage for continuity checks. The real-time workflow and Backblaze’s simplicity ensure that he can always access footage he needs.
Backblaze B2 Makes Remote Work Possible
When they started production on the second season of “Good Eats: Reloaded,” Bigman moved to working from home in New Jersey while the rest of the team continued work in Atlanta. The fully cloud-based setup ensured they could keep their post-production process going without any problems. Bigman’s assistant in Atlanta easily accesses data through the Backblaze website. If Bigman needs a file quickly, his assistant logs in to the Backblaze website and uses the web GUI to drag and drop. It’s quick, easy, and helps spread the work out.
“Good Eats”: A Fine Example of Good Storage
With seamless workflows, instant access, cost-effective storage rates, and virtually zero upload failures, Backblaze saves Bigman the most critical resource on a quick-turnaround production—time. He says that “Time saved is by a factor of 100. I just don’t have to think about it anymore. It’s done, and that means I’m done.”
Read the full case study about how Alton Brown’s post-production team unlocked a seamless, remote workflow and quick backups that let them focus on producing a beloved show.
At Backblaze, we are fortunate to serve hundreds of thousands of customers in more than 150 countries. To make this possible, we have a Solutions Engineering team whose main goal is to help existing and potential customers succeed with the technical implementation of their cloud-based workflows. Keep reading to learn how this team of four forms the technical backbone of the Sales team, and how they can help you with challenges you might experience while enabling cloud solutions at your business.
What Makes Backblaze Solutions Engineers Different
In a traditional sales environment, there is a pre- and post-sales engineer. (Want to know more about the difference between sales and solutions engineers? More on that later.) The pre-sales engineer addresses technical questions that a customer may have prior to purchasing the product, while a post-sales engineer assists them with setting up the software and its integrations.
The first thing that sets our solutions engineers apart is that our customers at Backblaze get to work with the same team of people on both sides of the transaction. Their journey starts with an introductory call with a business development representative (BDR), who tries to get a better understanding of the client’s needs and concerns. Once the BDR qualifies the customer, they transfer them to an account executive (AE), who manages the customer’s account.
AEs work closely with solutions engineers (SEs) and ask them to step in when clients have unique technical requests. SEs begin by asking the customer questions to understand the full scope of the problem and offer them the best solution. The types of queries range from topics like writing scripts to partner integrations (more on these subjects later).
But once the problem is solved, and the customer is up and running, they aren’t passed off to another team, as is often the case in other operations. The SEs remain in conversation with customers to ensure they continue to get what they need from our products.
Another thing that sets our SE team apart is that they do not have quotas—their primary goal is to help the customer find the best solution to their needs, not to optimize their potential value for Backblaze. Since solving problems is their sole objective, their titles changed from “sales engineer” to “solutions engineer” in 2018.
Our Solutions Engineering team is built differently because we have a unique approach to business. The SEs work to create long-term relationships with customers because they want to help them succeed throughout the future.
Our Solutions Engineering team is made up of a solutions engineer manager and three solutions engineers. While all four of them come from diverse backgrounds, they are all well-versed in every aspect related to being an SE (such as writing to our API, testing integrations, and providing workflow recommendations to users), so each of them can step into another person’s role at any given time if needed.
Troy Liljedahl, the Solutions Engineer Manager, started his career at Backblaze as a support technician. He transitioned over to the Solutions Engineering team because he wanted to pursue his passion for helping people, but in a more technical manner. He explained, “The solutions engineer job is the best of both worlds—not only am I getting to help customers succeed, but I am also able to do that by being a technical resource for them.”
Troy trains and manages the three other solutions engineers. Udara Gunawardena, one of the SEs, worked in a technical role at an Apple store during his college days. After graduating and becoming a full stack engineer, he realized that he missed interacting with users. He discovered solutions engineering and quickly made a career switch. Now, his programming background helps him with a number of tasks such as assisting customers who are trying to write to our API.
Another one of our solutions engineers joined the team around the same time as Udara. She completed her master’s in engineering management and worked in sales at a SaaS company. Now, her sales expertise helps her communicate with customers, solve their technical issues, and build strong relationships with them.
Rounding out the team is Mike Farace, who unlike his peers, works with us remotely, stationed all the way over in Ukraine. Although he may be physically distant, his relationship with Backblaze is one of the oldest. He worked with the founders of Backblaze between 2003 to 2006 while working as an IT architect at MailFrontier. He was the first contractor for Backblaze and set up the VPN so that the five founders could remotely access the testing servers that they were using in Brian Wilson’s apartment. Throughout the years, he was involved in some Backblaze projects and when there was the right fit for him, he moved into a full-time role as a solutions engineer.
Finding the Best Solutions for You
Broadly speaking, solutions engineers are responsible for understanding how our product works on its own and in integrations with other products, making recommendations based on that knowledge, and helping customers implement personalized workflows. Because Backblaze B2 Cloud Storage works with hundreds of different integrations, our SEs have extensive problem solving capacity. For example, they helped AK Productions use Flexify.IO to transfer their data from Google Cloud to Backblaze B2.
The SEs also have a wide understanding of the best solutions that work for different industries—the right solution for a media and entertainment customer may be different than the best one for an IT professional.
When teams begin working with new customers, they work to educate them on the different tools that are available. If a customer has a particular integration in mind, SEs can tell them about the benefits/disadvantages of using one integration over another and help them find the solution that matches their needs. Some topics that SEs often address are lifecycle rule settings, encryption, and SSO. They also help customers think through potential issues that they may not have yet considered.
One of the more technical aspects of the SE role is helping customers write to our API. Udara explained that this is his favorite part of the job: “The really creative stuff happens when people are trying to write to our API. Some people might be trying to stream video from surveillance cameras dynamically to Backblaze B2 whereas another company will have game streamers trying to save their video captures onto our cloud storage product. There’s a gamut of ways in which people use Backblaze B2 and that makes it really exciting.”
SEs also make sure they understand the full scope of a customer’s needs. They ask questions, even if they think they know the answer. Troy explained, “Although there will be patterns between customers, we truly look at every customer as unique and every setup as unique.” The team also does a great job of speaking the language of the customers—while some clients may have a technical background, others may not. Regardless of the customer’s technical expertise, SEs can give them a high-level overview of the solution without making them feel overwhelmed.
Another important aspect of the SE’s role is their ability to work cross-functionally with the other teams at Backblaze. Apart from the Sales team, they work closely with the Engineering team, especially when helping major clients. Udara said, “We have to communicate and collaborate effectively with the Engineering team to increase performance and to solve the issues that our clients may have.”
On the other hand, when SEs are working with smaller customers, they collaborate with the Customer Support team. Udara further explained, “The Customer Support team has seen the entire gamut of use cases, especially for Backblaze Computer Backup. They’re a great resource to us because they have the answers to even the smallest issues.”
The SEs are equally passionate as they are proficient at doing their jobs. Mike particularly loves the problem solving aspect to the job because it feels like a puzzle. He explained, “When you have a puzzle, you have to figure out what it looks like now, what it’s supposed to look like, and how to get it there. On top of that, when helping users, you’re dealing with different constraints like the customer’s budget and technology. It’s always a different puzzle to solve, which makes the job exciting.”
Testing Partner Integrations
When SEs have free time between talking to customers, they act as quality assurance for the Sales team by testing integrations—both those that could potentially work with Backblaze B2 and those that are currently being used. While doing so, they simultaneously document every step and when the integration testing is complete, they work with Customer Support to write a knowledge base article. These articles are available online for customers as a help guide so that they can learn how to use Backblaze B2 with our integration partners.
Back in October 2019, one of our customers wanted to use Backblaze B2 with EditShare Flow, which is a media asset management (MAM) all-in-one workflow solution that allows video editors to collaborate in the cloud. One of our solutions engineers, who had just joined the team, took on the responsibility of testing the integration. This partnership could potentially allow creative professionals to edit their videos on EditShare Flow and store their content on Backblaze B2.
Since the SE was new to the team and still learning our products, she worked with her teammates to learn about the different terms involved such as “MAM” and “metadata.” Once she learned more about EditShare Flow and its intricacies, she was ready to start the integration testing. She got the metadata from a NAS device, then she pivoted and pushed the data from that NAS device to Backblaze B2. She also tested the upload speed and ensured that the interface was user-friendly.
While testing the integration, she noted each step for setting it up which would later help her when writing a knowledge base article about the integration. After the integration testing was complete, she sent the customer all the documentation he needed to use Backblaze B2 with EditShare Flow. The customer was happy and if he had any further questions, he could always reach out to the SE.
When a company applies to be featured as an integration partner on our website, Mike tests the integration to ensure that the product can be paired successfully with Backblaze B2. Sometimes, companies that are already on the website make changes to their software, so Mike tests those products as well. He and the rest of the team also proactively test current integrations to verify that they are working and to identify any areas of improvement.
SEs have a demo environment to show prospective customers how different features within Backblaze Computer Backup can play out. Mike maintains this demo environment, which is set up with pre-created accounts and consists of a large VMware server with over 140 machines running. He creates different demos ranging from ones that show how to add a new computer to those showing how to restore files on behalf of a user. There are even some computers in the environment that are in an error state, so that customers can see what that might look like, as well.
How Can You Become a Solutions Engineer?
If becoming a solutions engineer sounds like your ideal career, Troy offered the following advice, “The biggest skill that I look for when hiring a solutions engineer is the ability to ask the right questions. We need to identify a customer’s problem in order to offer them the right solution. Another quality I look for is the ability to talk in the same language as a customer, whether that means giving someone a high-level technical overview of the product or simplifying concepts so that it’s easier for a customer to understand.”
Troy continued, saying that although candidates need not be programmers, they should feel comfortable reading code and potentially writing scripts. However, this varies between different companies. At Backblaze, the role of an SE strikes a balance between being technical and customer service-oriented, but at other companies, the role may be more technical or more customer support-focused.
If you are interested in learning more about how you could use Backblaze for your organization, click here to contact Sales and begin your journey with Backblaze! We look forward to hearing from you.
When the COVID-19 pandemic forced many of us to shelter in place at home, it seemed like a good opportunity to learn news things: Baking sourdough bread, sure! Tackling 2000 piece puzzles, great. Hosting virtual birthday parties on Zoom for three dozen family members? Got it! It was fun (mostly), but it quickly became clear that what seemed like a pause was actually the beginning of a whole new way of living and doing business.
And for anyone responsible for media workflows, turning to online learning during this time quickly became a clear requirement. The NAB Show, the media and entertainment industry’s huge show every April, cancelled their live event in Las Vegas and rebirthed it as an online experience a month later.
During normal times, this show was everyone in the industry’s opportunity to learn what the newest essential solutions for their workflow might be. In 2020, this wasn’t possible, despite the fact that learning how to use cloud-based tools to enable work is no longer a long-term strategic “hope for”—it’s a must-have.
After the cancellation of trade shows and other in-person events, our marketing and sales teams got together to decide how they could reach our NAB audience in the absence of face-to-face discussions at our booth on the NAB Show floor. The answer was Cloud University: an ongoing series of webinars highlighting powerful, cloud-enabled workflow solutions built with over a dozen partners, including ProMAX, CatDV, Cloudflare, iconik, and more. This series of free courses features live demos, tips, and best practices on topics like remote collaboration, content delivery, cloud migration, and workflow automation, with more to come each week.
You can always head to our Cloud University page to stay up to date with the latest classes, but we’ll include digests of past classes here, as well as callouts to additional information that might be useful to you, and some key takeaways to scan so you can tell what webinar might be most effective for you.
We hope to see you at our next online class, and of course, if there’s anything you’d like us to cover at Cloud University, don’t hesitate to say so in the comments!
DEMO LABORATORY: Turnkey Cloud Backup, Sync, and Restore with QNAP NAS
Are you ready to take your content worldwide? You, too, can have a simple, fast, incredibly cost effective solution for your website or content service up and running in minutes. Attend this event to learn how to:
Improve end-user experience with global caching and optimization
Eliminate egress fees for content movement
Compare cost savings of Backblaze and Cloudflare vs. S3 and Cloudfront
Quickly build a pilot workflow at no cost
CONTENT AGILITY 101: Unlocking Immediate Content Value for Remote Teams & Partners with Imagen
Do you need rock-solid backup and synchronization of your critical systems and high-value folders, but aren’t sure where to start? Watch this class to learn how to set up reliable backup and sync in minutes, and pick up best practices along the way. Topics covered include:
Goodsync overview for Mac, Windows, Linux, and NAS systems
One minute setup—configuring Goodsync with Backblaze B2 Cloud Storage
Five different ways to initiate backup and synchronization jobs
Best practices with real-world examples from our customers
DEMO LABORATORY: Cloud-to-Cloud Data Migration with Flexify.IO
Want your cloud storage spending to go farther for you, but concerned about the cost or complexity of moving between clouds? Fear not. In this class, you’ll learn how Flexify.IO and Backblaze can help you:
Easily and inexpensively transfer data from one cloud provider to another
Eliminate downtime during cloud-to-cloud migration
Choose the right cloud storage to meet your workflow needs
DEMO LABORATORY: A Hybrid Cloud Approach to Media Management with iconik
In their move toward cloud workflows, content owners are looking for solutions that manage content stored on-premises seamlessly with content in the cloud. Backblaze partner iconik built their smart media management system with this hybrid cloud approach in mind. With iconik, you don’t have to wait for all your content to be in the cloud before you and your creative team can take advantage of their cloud-based platform. In this class, iconik expert Mike Szumlinski details how you can:
Get started with cloud-based media management without migrating any content
Search and preview all your content from any device, anywhere
Add collaborators on the fly, including view-only licenses at no charge
Instantly ingest content stored in Backblaze B2 Cloud Storage into iconik
Their report is a fascinating look inside a disruptive business that is a major driver of growth for Backblaze B2 Cloud Storage. With that in mind, we wanted to share our top takeaways from their report and highlight key trends that will dramatically impact businesses soon—if they haven’t already.
Takeaway 1: Workflow Applications in the Cloud Unlock Accelerated Growth
Traditional workflow apps thrive in the cloud when paired with active, object storage.
We’ve had many customers adopt iconik with Backblaze B2, including Everwell, Fin Films, and Complex Networks, among several others. Each of these customers not only quickly converted to an agile, cloud-enabled workflow, they also immediately grew their use of cloud storage as the capacities it unlocked fueled new business. As such, it’s no surprise that iconik is growing fast, doubling all assets in Q4 2019 alone.
iconik is a prime example of an application that was traditionally installed on physical servers and storage in a facility. A longtime frustration with such systems is trying to ‘right-size’ the amount of server horsepower and storage to allocate to the system. Given how quickly content grows, making the wrong storage choice could be incredibly costly, or incredibly disruptive to your users as the system ‘hits the wall’ of capacity and the storage needs to be expanded frequently.
By moving the entire application to the cloud, users get the best of all worlds: a responsive and immersive application that keeps them focused on collaboration and production tasks, protection for the entire content library while keeping it immediately retrievable, and seamless growth to any size needed without any disruptions.
And these are only the benefits of moving your storage solution to the cloud. Almost every other application in your workflow that traditionally needs on-site servers and storage can be similarly shifted to the cloud, lending benefits like “pay-as-you-use-it” cost models, access from everywhere, and the ability to extend features with other cloud delivered services like transcoding, machine learning, AI services, and more. (Our own B2 Cloud Storage service just launched S3 Compatible APIs, which allows infinitely more solutions for diverse workflows.)
Takeaway 2: Now, Every Company Is a Media Company
Every company benefits by leveraging the power of collaboration and content management in their business.
Every company generates massive amounts of rich content, including graphics, video, product and sales literature, training videos, social media clips, and more. And every company fights ‘content sprawl’ as documents are duplicated, stored on different department’s servers, and different versions crop up. Keeping that content organized and ensuring that your entire organization has perfect access to the up-to-the-minute changes in all of it is easily done in iconik, and now accounts for 41% of their customers.
Even if your company is not an ad agency, or involved in film and television production, thinking and moving like a content producer and organizing around efficient and collaborative storytelling can transform your business. By doing so, you will immediately improve how your company creates, organizes, and updates the content that carries your image and story to your end users and customers. The end result is faster, more responsive, and cleaner messaging to your end users.
Takeaway 3: Solve For Video First
Make sure your workflow tools and storage are optimized for video first to head off future scaling challenges.
Despite being a small proportion of content in iconik’s system, video takes up the most storage. While most customers have large libraries of HD or even SD content now, 4K size video is rapidly gaining ground as it becomes the default resolution.
Video files have traditionally been the hardest element of a workflow to balance. Most shared storage systems can serve several editors working on HD streams, but only one or two 4K editors. So a system that proves that it can handle larger video files seamlessly will be able to scale as these resolution sizes continue to grow.
If you’re evaluating changes in your content production workflow, make sure that it can handle 4K video sizes and above, even if you’re predominantly managing HD content today.
Takeaway 4: Hybrid Cloud Needs to Be Transparent
Great solutions transparently bridge on-site and cloud storage, giving you the best features of each.
iconik’s report calls out the split of the storage location for assets it stores—whether on-site, or in the cloud. But the story behind the numbers reveals a deeper message.
Where assets are stored as part of a hybrid-cloud solution is a bit more complex. Assets in heavy use may exist locally only, while others might be stored on both local storage and the cloud, and the least often used assets might exist only in the cloud. And then, many customers choose to forego local storage completely and only work with content stored in the cloud.
While that may sound complex, the power of iconik’s implementation is that users don’t need—and shouldn’t need to know—about all that complexity. iconik keeps a single reference to the asset no matter how many copies there are, or where they are stored. Creative users simply use the solution as their interface as they move their content through production, internal approval, and handoff.
Meanwhile, admin users can easily make decisions about shifting content to the cloud, or move content back from cloud storage to local storage. This means that current projects are quickly retrieved from local storage, then when the project is finished the files can move to the cloud, freeing up space on local storage for other active projects.
For customers working with Backblaze B2, the cloud storage expands to whatever size needed on a simple, transparent pricing model. And it is fully active, or in other words, it’s immediately retrievable within the iconik interface. In this way it functions as a “live” archive as opposed to offline content archives like LTO tape libraries, or a cold storage cloud which could require days for file retrieval. As such, using ‘active’ cloud storage like Backblaze B2 eases the admin’s decision-making process about what to keep, and where to keep it. With transparent cloud storage, they have the insight needed to effectively scale their data.
Looking into Your (Business) Future
iconik’s report confirms a number of trends we’ve been seeing as every business comes to terms with the full potential and benefits of adopting cloud-based solutions:
The dominance of video content.
The need for transparent reporting and visibility of the location of data.
The fact that we’re all in the media business now.
And that cloud storage will unlock unanticipated growth.
Given all we can glean from this first report, we can’t wait for the next one.
But don’t take our word for it, you should dig into their numbers and let us and iconik know what you think. Tell us how these takeaways might help your business in the coming year, or where we might have missed something. We hope to see you in the comments.
As of March 31, 2020, Backblaze had 132,339 spinning hard drives in our cloud storage ecosystem spread across four data centers. Of that number, there were 2,380 boot drives and 129,959 data drives. This review looks at the Q1 2020 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. In addition, near the end of the post, we review a few 2019 predictions we posed a year ago. As always, we look forward to your comments.
Hard Drive Failure Stats for Q1 2020
At the end of Q1 2020, Backblaze was using 129,959 hard drives to store customer data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 129,764 hard drives. The table below covers what happened in Q1 2020.
Notes and Observations
The Annualized Failure Rate (AFR) for Q1 2020 was 1.07%. That is the lowest AFR for any quarter since we started keeping track in 2013. In addition, the Q1 2020 AFR is significantly lower than the Q1 2019 AFR which was 1.56%.
During this quarter 4 (four) drive models, from 3 (three) manufacturers, had 0 (zero) drive failures. None of the Toshiba 4TB and Seagate 16TB drives failed in Q1, but both drives had less than 10,000 drive days during the quarter. As a consequence, the AFR can range widely from a small change in drive failures. For example, if just one Seagate 16TB drive had failed, the AFR would be 7.25% for the quarter. Similarly, the Toshiba 4TB drive AFR would be 4.05% with just one failure in the quarter.
On the contrary, both of the HGST drives with 0 (zero) failures in the quarter have a reasonable number of drive days, so the AFR is less volatile. If the 8TB model had 1 (one) failure in the quarter, the AFR would only be 0.40% and the 12TB model would have an AFR of just 0.26% with 1 (one) failure for the quarter. In both cases, the 0% AFR for the quarter is impressive.
There were 195 drives (129,959 minus 129,764) that were not included in the list above because they were used as testing drives or we did not have at least 60 drives of a given model. For example, we have: 20 Toshiba 16TB drives (model: MG08ACA16TA), 20 HGST 10TB drives (model: HUH721010ALE600), and 20 Toshiba 8TB drives (model: HDWF180). When we report quarterly, yearly, or lifetime drive statistics, those models with less than 60 drives are not included in the calculations or graphs. We use 60 drives as a minimum as there are 60 drives in all newly deployed Storage Pods.
That said, all the data from all of the drive models, including boot drives, is included in the files which can be accessed and downloaded on our Hard Drive Test Data webpage.
Computing the Annualized Failure Rate
Throughout our reports we use the term Annualized Failure Rate (AFR). The word “annualized” here means that regardless of the period of observation (month, quarter, etc.) the failure rate will be transformed into being an annual measurement. For a given group of drives (i.e. model, manufacturer, etc.) we compute the AFR for a period of observation as follows:
AFR = (Drive Failures / (Drive Days / 366) * 100
Drive Failures is the number of drives that failed during the period of observation.
Drive Days is the number of days all of the drives being observed were operational during the period of observation.
There are 366 days in 2020, obviously in non-leap years we use 365.
Example: Compute the AFR for the Drive Model BB007 for the last six months given;
There were 28 drive failures during the period of observation (six months).
There were 6,000 hard drives at the end of the period of observation.
The total number of days all of the drives of drive model BB007 were in operation during the period of observation (6 months) totaled 878,400 days.
Using the drive count method, model BB007 had a failure rate of 0.93%. The reason for the difference is that Backblaze is constantly adding and subtracting drives. New Backblaze Vaults come online every month; new features like S3 compatibility rapidly increase demand; migration replaces old, low capacity drives with new, higher capacity drives; and sometimes there are cloned and temp drives in the mix. The environment is very dynamic. The drive count on any given day over the period of observation will vary. When using the drive count method, the failure rate is based on the day the drives were counted. In this case, the last day of the period of observation. Using the drive days method, the failure rate is based on the entire period of observation.
In our example, the following table shows the drive count as we added drives over the six month period of observation:
When you total up the number of drive days, you get 878,400, but the drive count at the end of the period of observation is 6,000. The drive days formula responds to the change in the number of drives over the period of observation, while the drive count formula responds only to the count at the end.
The failure rate of 0.93% from the drive count formula is significantly lower, which is nice if you are a drive manufacturer, but not correct for how drives are actually integrated and used in our environment. That’s why Backblaze chooses to use the drive days method as it better fits the reality of how our business operates.
Predictions from Q1 2019
In the Q1 2019 Hard Drive Stats review we made a few hard drive-related predictions of things that would happen by the end of 2019. Let’s see how we did.
Prediction: Backblaze will continue to migrate out 4TB drives and will have fewer than 15,000 by the end of 2019: we currently have about 35,000.
Reality: 4TB drive count as of December 31, 2019: 34,908.
Review: We were too busy adding drives to migrate any.
Prediction: We will have installed at least twenty 20TB drives for testing purposes.
Reality: We have zero 20TB drives.
Review: We have not been offered any 20TB drives to test or otherwise.
Prediction: Backblaze will go over one exabyte (1,000 petabytes) of available cloud storage. We are currently at about 850 petabytes of available storage.
Review: To quote Maxwell Smart, “Missed it by that much.”
Prediction: We will have installed, for testing purposes, at least 1 HAMR based drive from Seagate and/or 1 MAMR drive from Western Digital.
Reality: Not a sniff of HAMR or MAMR drives.
Review: Hopefully by the end of 2020.
In summary, I think I’ll go back to my hard drive statistics and leave the prognosticating to soothsayers and divining rods.
Lifetime Hard Drive Stats
The table below shows the lifetime failure rates for the hard drive models we had in service as of March 31, 2020. The reporting period is from April 2013 through December 31, 2019. All of the drives listed were installed during this timeframe.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data webpage. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and 3) You do not sell this data to anyone—it is free.
If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
In 2015, we kept hearing the same request. It went something like: “I love your computer backup service, but I also need a place to store data for other reasons—backing up servers, hosting files, and building applications. Can you give me direct access to your storage?” We listened, and we built Backblaze B2 Cloud Storage.
To build on my own words from the time, “It. Was. HUGE.” B2 Cloud Storage fundamentally changed the trajectory of our company. Over just the past two years, we’ve added more customer data than we did in our entire first decade. We now have customers in over 160 countries and they’ve entrusted us with more than an exabyte of data.
Brands like American Public Television, Patagonia, and Verizon’s Complex Networks—alongside more than 100,000 other customers—use Backblaze B2 to back up and archive their data; host their files online; offload their NAS, SAN, and other storage systems; replace their aging tape infrastructure; and as the store for the applications they’ve built. Many of them have told us how the low cost enabled them to do what they wouldn’t have been able to, and the simplicity to do it quickly.
I’m proud that we’ve been able to help customers by offering the most affordable download prices in the industry, making it easy to migrate from the cloud oligarchy, and offering unprecedented choice in how our customers use their data.
Today, I’m thrilled to announce the public beta launch of our most requested feature: S3 Compatible APIs for B2 Cloud Storage.
If you have wanted to use B2 Cloud Storage but your application or device didn’t support it, or you wanted to move to cloud storage in general but couldn’t afford it, you should be able to instantly start with the new S3 Compatible APIs for Backblaze B2.
Welcome to Backblaze B2 via our S3 Compatible APIs
In practical terms, this launch means that you can now instantly plug into the Backblaze B2 Cloud Storage service by doing little more than pointing your data to a new destination. There’s no need to write new code, no change in workflow, and no downtime. Our S3 Compatible APIs are ready to use, and our solutions engineering team is ready to help you get started.
But they’re not alone. We have a number of launch partners leveraging our S3 Compatible APIs so you can use B2 Cloud Storage. These partners include IBM Aspera, Quantum, and Veeam.
Official Launch Partners
Cinnafilm, IBM Aspera, Igneous, LucidLink, Marquis, Masstech, Primestream, Quantum, Scale Logic, Storage Made Easy, Studio Network Solutions, Veeam, Venera Technologies, Vidispine, Xendata. These companies join a list of more than 100 other software, hardware, and cloud companies already offering Backblaze B2 to support their customers’ cloud storage needs.
Challenging the Cloud Oligarchy
For too long, cloud storage has been an oligarchy, leaving customers with choices that all look the same: expensive, opaque, complicated. We pride ourselves on being simple, reliable, and affordable. As Inc. magazine put it recently, B2 Cloud Storage is “Everything Amazon’s AWS Isn’t.”
With B2 Cloud Storage S3 Compatible APIs, we’re making the choice for something different a no-brainer. While there are all new ways to use Backblaze B2, our pricing is unchanged. We remain the affordability leader, and our prices have nothing to hide. With Backblaze, customers don’t need to choose what data they actually use—all data is instantly available.
While we’re opening this new pathway to B2 Cloud Storage, it doesn’t mean we’re changing anything about the service you already love. Our Native APIs are still performant, elegant, and supported. We’re just paving the way for more of you to see what it feels like when a cloud storage solution works for you.
Backblaze is responsible for a huge amount of customer, company, and employee data—in fact, we recently announced that we have more than an exabyte under management. With a huge amount of data, however, comes a huge amount of responsibility to protect that data.
This is why our security team works tirelessly to protect our systems. One of the ways in which they safeguard the data we’ve been entrusted with is by working alongside hackers. But these aren’t just any hackers…
Sometimes Hackers Can Be Good
Although it may sound odd at first, hackers have helped us discover and resolve a number of issues in our systems. This is thanks to Backblaze’s collaboration with HackerOne, a platform that crowdsources white hat hackers to test products and alert companies of security issues when they find them. In return, companies award the hacker a bounty. Bounty amounts are carefully outlined based on severity ratings, so when a vulnerability is discovered, it is awarded based on that bounty structure.
Backblaze + HackerOne
Tim Nufire, Chief Security and Cloud Officer at Backblaze, created the company’s HackerOne program back in March 2015. One of the best things a company of our size can do is incentivize hackers around the world to look at our site to help ensure a secure environment. We can’t afford to onboard several hundred hackers as full-time employees, so by running a program like this, we are leveraging the talent of a very broad, diverse group of researchers, all of whom believe in security and are willing to test our systems in an effort to earn a bounty.
How HackerOne Works
When a hacker finds an issue on our site, backup clients, or any other public endpoint, they file a ticket with our security team. The team reviews the ticket and once they have triaged and confirmed that it is a real issue, they pay the hacker a bounty which depends on the severity of the find. The team then files an internal ticket with the engineering team. Once the engineers fix the issue, the security team will check to make sure that the problem was resolved.
To be extra cautious, the team gets a second set of eyes on the issue by asking the hacker to ensure that the vulnerability no longer exists. Once they agree everything is correct and give us the green light, the issue is closed. If you’re interested in learning even more about this process, check out Backblaze’s public bounty page, which offers even more information on our response efficiency, program statistics, policies, and bounty structure.
Moving from Private to Public
Initially, our program was private, which meant that we only worked with hackers we invited into our program. But in April 2019, our team took the program public. This meant that anyone could join our HackerOne program, find security issues, and earn a bounty.
The reasoning behind our decision to make the program public was simple: the more people we encourage to hack our site, the faster we can find and fix problems. And that’s exactly what happened at Backblaze. Thanks to the good guys on HackerOne we are one step ahead of the bad guys.
Some Issues We Resolve, Some We Contest
Let’s take a look at some examples as we work through two ‘classes’ of bugs typically reported by hackers.
One class of bugs that hackers find is Cross-Site Request Forgery (CSRF) attacks. CSRF attacks attempt to trick users into making unwanted changes such as disabling security features on a website they’re logged into. CSRF attacks specifically target a user’s settings, not their data, since the attacker has no way to see the response to the malicious request. To resolve issues like this, we make changes like adding the SameSite attribute to web pages, among other techniques. Problem solved!
But sometimes making changes on our end isn’t the right response. Another class of “vulnerabilities” that hackers are quick to point out is “Information Disclosure” related to software versions or other system components. However, Backblaze does not see this as a vulnerability. “Security through obscurity” is not good security, so we intentionally make information like this visible and encourage hackers to use it to find holes in our defenses. It’s our belief that, by being as open with our community as possible, we’re more secure than we would be by hiding details about how our systems are configured.
We call attention to these two examples specifically because they underline one of the most interesting aspects of working with HackerOne: deciding when something is truly an issue that needs fixing, and when it is not.
Help Us Decide!
HackerOne has proven to be a great resource to scale our security efforts, but we’re missing one thing: a capable new team member to lead this program at Backblaze! Yes, we are hiring an Application Security Manager.
Among other interesting tasks, whoever fills the role will be responsible for triaging and prioritizing the issues identified through the HackerOne platform. This is a new role for us which was identified as a must-have by our security team because Backblaze is growing quickly.
Security has been our top priority since day one, but as our company scales and the amount of data that we store increases, we need someone who can help us navigate that growth. As Tim Nufire points out, “Growth makes us a bigger target, so we need a stronger defense.”
The Application Security Manager will not only have the opportunity to apply their security knowledge, but they will also have the unique chance to shape a security team at a growing, successful, and sustainable company. We think that sounds pretty exciting.
Who We Are Looking For
If you are someone who has years of experience in the security field, but hasn’t had the chance to take charge and lead a team, then this is your opportunity! We are looking for someone who is an expert in application layer security and is willing to teach us what we don’t know.
We need someone who is not afraid to roll up their sleeves and get to work even when there is no clear direction given. There are a couple of technical skills that we hope the new hire would have (like Burp Suite), but the most important qualities are being hands-on and having organizational management skills. This is because the Application Security Manager will formulate strategy and build a roadmap for the team moving forward. If that excites you as much as it excites us, feel free to send your resume to email@example.com or apply online here.
And of course, we are always looking for more white hat hackers to test our site. If you can’t join us in the office, then join us on HackerOne to help discover and resolve potential vulnerabilities. We look forward to hearing from you!
Software engineering interviews are carried out differently across every tech company, and—unsurprisingly—their merits are hotly debated and often ridiculed. At Backblaze, we have our own way of hiring software engineers, which we like to think is pretty fair. In fact, because we value transparency, we wanted to take some time on the blog to share exactly what our interview process looks like, along with some tips and tricks for any of you software engineers out there who might be interested in acing the interview (we are hiring, after all).
Whether you’re looking for a new gig, or you’re just curious about how software engineers share what they’re capable of, we hope you enjoy the window into our hiring process.
Interviewing During COVID-19: If you’ve read the statement by our CEO about our response to COVID-19, you know that most of our staff is working from home. But that doesn’t mean we’ve stopped interviewing folks. For the most part, our process has remained the same, but we’ll include some notes below about how we’ve worked to replace in-person interviews in the era of “social distancing.”
The Software Engineering Interview Process at Backblaze
Our interviews begin like most, with a 30-minute chat over the phone between the recruiter and candidate. The phone interview begins with the recruiter giving the candidate a brief outline of the company history, an overview of products and cultural environment, as well as the logistics of the interview process.
The recruiter also learns more about the candidate to see if they are a good fit for the role. What our recruiters are especially keen to understand is: will the candidate be a good cultural fit? We look for the typical characteristics like ability to collaborate, flexibility, and friendliness, but we also look for some aspects that are more specific to the Backblaze culture.
At Backblaze, we value the opinions of all teammates and we try to avoid rigid staff hierarchies. Because of that, our recruiters want to see that a candidate has the ability to voice their own opinion. Additionally, our recruiters tend to screen out candidates who are “order takers” used to working in a highly structured environment. We’re a startup, and we are still building out our processes and procedures. As a result, people who prefer following stringent guidelines may not succeed here.
Another characteristic that our recruiters look for is a candidate’s ability to interact cross-functionally. It’s not uncommon for our employees to work on projects with people on other teams. Whereas larger companies divide each department into separate buildings, all onsite employees at Backblaze work in the same space. As a software engineer at Backblaze, you will find yourself sitting across from a sales manager at lunch or working with customer support on a bug fix. To succeed at Backblaze, it’s crucial that candidates are willing and able to communicate with people from different departments.
Lastly, we want software engineers who are passionate about their jobs and the work that they will get to do with us. Expressing your excitement about what we do from the beginning of the interview process will help you to stand out from the crowd. This also means being unafraid to ask questions about the company and/or the industry. Most of our employees do not actually come from the cloud storage or backup industry, so our recruiters appreciate when candidates are willing to learn.
Interested in getting a broad sense of how our business has evolved over the years? We recently compiled a primer of blog posts that we ask most new hires to read. Read it here to be a step ahead of the game when it comes to understanding our business.
Finally, since our work environment is very collaborative, team-oriented, and fast-paced, we prefer to have our engineers work onsite. However, our senior engineering roles have more flexibility—they can work 100% remotely and even those who work onsite can work remotely two days of the week. Candidates who wish to work remotely would typically discuss that with the recruiter during the phone interview.
After assessing a candidate’s cultural fit—if the recruiter decides that we should move forward with the candidate—she sends them a take-home assignment. This assessment involves writing code using the B2 API, which is publicly available. If the candidate does well at this point, then they’re invited to take the next step.
The Online Coding Exercise
Once candidates clear the first steps of interviewing, they’re asked to do an online coding exercise alongside one of our lead engineers, such as Brian Beach. He works remotely from Lahaina, Hawaii and has decades of engineering experience, including as VP and Principal Engineer at TiVo. He earned an undergraduate degree and a Ph.D. from University of California, Santa Cruz. At Backblaze, he has used his knowledge and expertise to design the vault storage system and now, he’s an architect on the engineering team.
Most companies conduct technical interviews for engineering roles, and so does Backblaze. The difference in our process lies in the format of the interview. Many companies ask candidates to write code on a whiteboard and the interviewer marks their answer as either right or wrong.
At Backblaze, our software engineers use a tool called CoderPad which replaces the traditional whiteboarding interview. Using this tool, candidates write their code in collaboration with Brian, while talking to each other about the work.
Technical interviews at Backblaze allow for a discussion between the interviewer and candidate. Brian isn’t trying to catch the candidate in a “gotcha” moment. His goal is to get a better understanding of an engineer’s interpersonal skills and to understand the candidate’s thought process. He wants to see how the candidate will work through problems on a team, not alone.
“It’s trying to simulate how we would treat each other if I was a co-worker sitting next to him and I asked him for help on a problem,” said John Shimek, Senior Software Engineer at Backblaze, while describing his experience of the interview process. “It should feel comfortable, and it did.”
Brian explained, “A lot of people don’t like coding exercises in interviews and that’s understandable because at many other companies, it’s high pressure. If you get something wrong, you fail the interview. But at Backblaze, we take a collaborative approach by working on a problem together, bouncing ideas off of each other, and coming up with a solution that way.”
During the coding exercise, candidates work on four problems and each one targets specific areas in which the team needs expertise. Brian said that in terms of technical skills, he assesses the candidate’s ability to write Java code, get answers right, collaborate through tricky problems, and write tests to check that the code works.
Kyle Wood, a Software Engineer on the Data Engineering team, found that using CoderPad was logistically easier than using a whiteboard. He was able to type out the code on a computer—which, after all, is where the majority of coding happens—rather than write it out by hand on a whiteboard. “By doing it on a computer, that back and forth discussion was able to happen much quicker and much more naturally than a traditional whiteboard interview,” Kyle commented.
The Onsite Interview
Once you’ve made it past the phone screening, take-home assignment, and online coding exercise, you have passed more than half of the journey. The last part of the process is an onsite interview. This interview consists of meeting with various people from the team, including three additional CoderPad/whiteboarding sessions with senior engineers which are focused on coding and design, as well as determining if the candidate is a cultural fit.
Editor’s Note: During COVID-19, our engineers are using Google Hangouts for the onsite interview. The technical portion of the interview is done through CoderPad and Jamboard. These are tools that allow the interviewer and the candidate to work through a problem together.
John walked us through his onsite interview experience at Backblaze. Prior to his visit, our recruiting team reached out to him to coordinate his travel from Minneapolis, MN to San Mateo, CA. Once the logistics were finalized, John came to San Mateo and met with ten of his potential teammates. A few of these people include Tina Cessna (VP of Engineering), Yev Pusin (Director of Marketing), and Kim Truong (Senior Engineering Manager).
The entire onsite interview process was six hours long with a five to ten minute break in between each interview. John explained that for the technical portion of the interview, he was asked to do some whiteboarding exercises. But John said, “It didn’t matter if I accidentally got something wrong as long as the intent of what I was trying to do was there.”
But not all interviews were conducted inside a meeting room. Adam Feder, Principal Engineer at Backblaze, took John on a walk-and-talk interview, which John enjoyed because he was able to get some fresh air for a bit. He also had a lunch interview with Billy Ng (Software Engineer) and LeAnn Sucht (Quality Assurance Engineer). These conversations were focused on John’s technical processes and his engineering experience.
Advice for Future Candidates
John is now on the other end of the interview process, in that he is now responsible for interviewing and facilitating the coding exercise with candidates. He talked to us about how candidates can stand out.
“Candidates worry so much about being perfect,” he said, “but in reality, we just want to see that you are thinking through the problem.” He also advises candidates to ask for clarification if they don’t understand the question.
John highly recommends candidates to be proficient in Java when interviewing for general or backend engineering roles. Other roles require different technical skills; those who work closely with a client should know C++ or Objective C while others who work on the Android front should know Java and Kotlin. Python is a bonus, but not a requirement because someone who knows Java can easily learn Python. With that being said, he further explained, “Technical skills are important, but knowing how to solve a problem is more important.”
John gave his last piece of advice, “I’d rather work with someone who knows their faults and is willing to ask for help. We can teach a skill, but we can’t teach someone how to ask for help.”
We are always hiring more engineers. If you are interested in being part of a world-class engineering team at an established startup, check out our Career Center. See a position that you’re interested in? Send your resume to firstname.lastname@example.org. We look forward to hearing from you!
“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.” –The Editors
The term cloud storage is used in popular media as though everyone knows exactly what it means. But ask anyone to list and define the different types of cloud storage, and you’re likely to get some blank looks. This, despite the fact that understanding the different varieties of cloud is essential to deciding the storage solution that is right for your business. With that in mind, we’re going to take a look at the three primary types of cloud storage. Below, you’ll find a quick and easy-to-use field guide to the three basic types of cloud storage being used today: object, file, and block storage.
Your business has certain needs. Maybe you need to share content with a number of contributors, producers, or editors based around the world. Or possibly you have a complex and huge database of sales metrics you need to process or manipulate that is stressing your on-site capabilities. Or you might simply have data you need to archive. Regardless, while people are quick to recommend the “cloud” for any business scenario involving data, you need to know which cloud is right for your scenario. Read on to learn more.
The Three Types of Cloud Storage
In cloud storage, the definition of an ‘object’ is pretty simple. Object storage is literally some assemblage of data with one unique identifier and an infinite amount of metadata.
Maybe that doesn’t sound so simple after all. Let’s break it down into its components to try to make it clearer.
This data that makes up an object could be anything—an advertising jingle’s audio file, the photo album from your company party, a 300-page software manual, or simply a related grouping of bits and bytes.
When that data is added to object storage, it typically receives an identifier that is referred to as a Universally Unique Identifier (UUID) or a Globally Unique Identifier (GUID). These identifiers are 128-bit integers. In layman’s terms, this means that the identifier—the “name” of the object—is a complex number, of sorts. The identifier is so complex, in fact, that it allows for each identifier to be considered unique.
The third and final component of an object is its metadata—literally “the data about the data”—which can be any information that is used to classify or characterize the data in a particular object. This metadata could be the jingle’s name, a collection of the geographical coordinates where a set of digital pictures were taken, or the name of the author who wrote the user manual.
The Advantages of Object Storage
The primary advantages of object storage—and the reason it’s used by the majority of cloud storage providers—is that it enables the storage of massive amounts of unstructured data while still maintaining easy data accessibility. The greater amount of storage is achieved thanks to its flat structure—by using GUIDs instead of the hierarchies characteristic of file storage or block storage, object storage allows for infinite scalability. In other words, by doing away with structure, there’s more room for data.
The higher level of accessibility is largely thanks to the metadata, which is infinitely customizable. Think of the metadata as a set of labels for your data. Because this metadata can be refined and rewritten and expanded infinitely, the objects in object storage can easily be reorganized and scaled, based on different metadata criteria.
This last point is what makes object storage so popular for backup and archiving functions. Metadata’s unrestricted nature allows storage administrators to easily implement their own policies for data preservation, retention, and deletion, making it easier to reinforce data and create better “disaster recovery” strategies like Reed-Solomon Erasure Coding and Vault Architecture.
The Primary Uses of Object Storage
The main use cases for object storage include:
Storage of unstructured data like multimedia files
Storage of large data sets
Storage of large quantities of media assets like video footage as an archive in place of local tape drives
The prime use cases for object storage generally include storing large amounts of data that businesses need to access only periodically. For instance, if your business does a lot of production work in any medium, you probably need a lot of space to store your finished projects after their useful life is complete, but you probably also need access to the files in case you or a client need them again in the future.
This sort of accessible archive is perfect for object storage because the data doesn’t need to be highly structured. For example, KLRU, the Austin Public Television station responsible for broadcasting the famous musical showcase “Austin City Limits,” recently opted to migrate their 40+ year archive of footage into cloud storage. Object storage provided a cheap, but reliable, archive for all of their work. And their ability to organize the content with metadata meant they could easily distribute it to their network of licensees (or anyone else interested in using the content).
The scalability and flexibility of object storage has made it the go-to choice for many businesses who are transitioning to cloud solutions. That said, the relative complexity of the naming schema for the objects—that 128-bit identifier isn’t exactly user-friendly for most of us—and the metadata management approach can prove to be too complex or ill suited for certain use-cases. For media production companies and agencies this will often lead to the use of third party software (including Media Asset Managers (MAM) and Digital Asset Managers (DAM) that layer organizational schema over the top of the object store. When this doesn’t work, many turn to file storage, which we’ll discuss next.
For administrators in need of a friendlier user interface but smaller storage requirements—think millions of files, instead of billions—file storage might be the answer.
So what is file storage? In file storage, the data is stored in files. These files are, in turn, organized in folders, and these folders are then arranged into directories and subdirectories in a hierarchical fashion. To access a file, users or machines only need the path from directory to subdirectory to folder to file.
Because all the data stored in such a system is already organized in a hierarchical directory tree, like the files on your hard drive, it’s therefore easy to name, delete, or otherwise manipulate files without any additional interface. If you have used practically any operating system (whether Windows or Apple iOS, or whatever else), then you’re likely already familiar with these types of file and folder trees and are more than capable of working within them.
The Advantages of File Storage
The approachability of file storage is often seen as its primary advantage. But, using file storage in the cloud adds one key element: sharing. In cloud file storage, like on an individual computer, an administrator can easily set access as well as editing permissions across files and trees such that security and version control are far easier to manage. This allows for easy access sharing and thereby easy collaboration.
The disadvantage of file storage systems, however, is that if you plan for your data to grow, there is a certain point at which the hierarchy and permissions will become complex enough to slow the system significantly.
The Use Cases for File Storage
Common use cases for file storage are:
Storage of files for an office or directory in a content repository
Storage of files in a small development or data center environment that is a cost effective option for local archiving
Storage of data that requires data protection and easy deployment
Generally speaking, discrete amounts of structured data work well in file storage systems. If this describes your organization’s data profile, and you need robust sharing, cloud file storage could be right for you. Specific examples would include businesses that require web-based applications. In this instance, a file storage system can allow multiple users who need to manipulate files at the same time the access they need, while also clearly delineating who can make changes. Another example is data analytics operations, which often require multiple servers to modify multiple files at the same time. These requirements make file storage systems a good solution for that use case as well.
Now that you have a better idea of the differences between object and file storage, let’s take a look at block storage and its special use cases.
A lot of cloud-based enterprise workloads are currently using block storage. In this type of system, data is broken up into pieces called blocks, and then stored across a system that can be physically distributed to maximize efficiency. Each block receives a unique identifier, which allows the storage system to put the blocks back together when the data they contain are needed.
The Advantages of Block Storage
A block storage system in the cloud is used in scenarios where it’s important to be able to quickly retrieve and manipulate data, with an operating system accessing these data points directly across block volumes.
Block storage also decouples data from user environments, allowing that data to be spread across multiple environments. This creates multiple paths to the data and allows the user to retrieve it quickly. When a user or application requests data from a block storage system, the underlying storage system reassembles the data blocks and presents the data to the user or application.
The primary disadvantages of block storage are its lack of metadata, which limits organizational flexibility, and its higher price and complexity—as compared to the other solutions we’ve discussed.
The Use Cases for Block Storage
Primary use cases for block storage are:
Storage of databases
Storage for RAID volumes
Storage of data for critical systems that impact business operations
Storage of data as file systems for operating systems for virtualization software vendors
The relatively fast, reliable performance of block storage systems make them the preferred technology for databases. For the same reason block storage works well for databases, it also provides good support for enterprise applications: for transaction-based business applications, block storage ensures users are serviced quickly and reliably. Virtual machine file systems (VMFS) like VMware also tend to use block storage because of the way data is distributed across multiple volumes.
Making a Choice Between Different Types of Cloud Storage
So which cloud storage system is right for you? Block or file storage could be useful if you’re dealing with a lot of data that members of a team have to change frequently. You might find that block storage works best for you if you need to store an organized collection of data that you can access quickly. File storage has the advantage that the data is easy to manipulate directly without a custom-built interface. But if you need highly scalable storage units for relatively unstructured data, that is where object storage shines. Whatever path you decide, now you have a sense of the use cases, advantages, and disadvantages of different storage types to weigh your next step into the cloud storage ecosystem.
With the impact of coronavirus on all of our lives, it’s been a struggle to find pieces of good news to share. But we wanted to take a break from the usual programming and share a milestone we’re excited about, one that’s more than 12 years in the making.
Since the beginning of Backblaze—back in 2007, when our five co-founders were working out of Brian Wilson’s apartment in Palo Alto—watching the business grow has always been profoundly exciting.
Our team has grown. From five, way back in the Palo Alto days, to 145 today. Our customer base has grown. Today, we have customers in over 160 countries… it’s not so long ago that we were excited about having our 160thcustomer.
More than anything else, the data we manage for our customers has grown.
By 2014, we were storing 100 petabytes—the equivalent of 11,415 years of HD video.
Years passed, our team grew, the number of customers grew, and—especially after we launched B2 Cloud Storage in 2015—the data grew. At some scale it got harder to contextualize what hundreds and hundreds of petabytes really meant. We like to remember that each byte is part of some individual’s beloved family photos or some organization’s critical data that they’ve entrusted us to protect.
That belief is part of every single Backblaze job description. Here’s how we put it in that context:
Our customers use our services so they can pursue dreams like curing cancer (genome mapping is data-intensive), archive the work of some of the greatest artists on the planet (learn more about how Austin City Limits uses B2), or simply sleep well at night (anyone that’s spilled a cup of coffee on a laptop knows the relief that comes with complete, secure backups).”
It’s critically important for us that we achieved this growth by staying the same in the most important ways: being open & transparent, building a sustainable business, and caring about being good to our customers, partners, community, and team. That’s why I’m excited to announce a huge milestone today—our biggest growth number yet.
We’ve reached 1.
Or, by another measurement, we’ve reached 1,000,000,000,000,000,000.
Yes, today, we’re announcing that we are storing 1 exabyte of customer data.
What does it all mean? Well. If you ask our engineers, not much. They’ve already rocketed past this number mentally and are considering how long it will take to get to a zettabyte (1,000,000,000,000,000,000,000 bytes).
But, while it’s great to keep our eyes on the future, it’s also important to celebrate what milestones mean. Yes, crossing an exabyte of data is another validation of our technology and our sustainably independent business model. But I think it really means that we’re providing value and earning the trust of our customers.
Thank you for putting your trust in us by keeping some of your bytes with us. Particularly in times like these, we know that being able to count on your infrastructure is essential. We’re proud to serve you.
As the world grapples with a pandemic, celebrations seem inappropriate. But we did want to take a moment and share this milestone with you, both for those of you who have been with us over the long haul and in the hopes that it provides a welcome distraction. To that end, we’ve been working on a few things that we’d planned to launch in the coming weeks. We’ve made the decision to push forward with those launches in hopes that the tools may be of some use for you (and, if nothing else, to try to do our part to provide a little entertainment). For today, here’s to our biggest 1 yet. And many more to come.
So now that you know what an exabyte looks like, let’s look at how Backblaze got there.
Way back in 2010, we had 10 petabytes of customer data under management. It was a big deal for us, it took us two years to accomplish and, more importantly, it was a sign that thousands of customers trusted us with their data.
It meant a lot! But when we decided to tell the world about it, we had a hard time quantifying just how big 10 petabytes were, so naturally we made an infographic.
In what felt like the blink of an eye, it was two years later, and we had 75 petabytes of data. The Burj was out. And, because it was 2013, we quantified that amount of data like this…
Pop songs now average around 3:30 in length, which means if you tried to listen to this imaginary musical archive, it would take you 167,000 years. And sadly, the total number of recorded songs is only the tens to hundreds of millions, so you’d have some repeats.
That’s a lot of songs! But more importantly, our data under management had grown by 750%! But we could barely take time to enjoy it because five months later we hit 100 petabytes, and we had to call it out. Stacking up to the Burj Khalifa was in the past! Now, we rivaled Mt. Shasta…
But stacking drives was rapidly becoming less effective as a measurement. Simply put, the comparison was no longer apples to apples: the 3,000 drives we stacked up in 2010 only held one terabyte of data. If you were to take those same 3,000 drives and use the average drive size we had in 2013, about 4 terabytes of data per drive, the size of the stack would stay the same, as hard drives had not physically grown, but the density of the storage inside the drives had grown by 400%.
The thought of migrating petabytes or even terabytes of data from your existing cloud provider to another may seem impossible. Naturally, as data sets grow, they become harder to move, and many times the inertia of that growing data means that it stays put, even in scenarios in which the storage solution is poorly suited to the data’s use case.
What is even worse are scenarios in which major cloud storage providers lock customers into their ecosystems by design, making it difficult to diversify one’s data storage or even for them to detach from a given service completely. These providers are like a digital Hotel California; you can check out anytime you like, but you can never leave, because the egress fees, downtime, and operational overhead will kill you.
For Additional Information: This post is one of a series in lieu of this year’s NAB conference, which was recently cancelled. The content here accompanies a series of webinars outlining cloud-storage based solutions for data-heavy workflows. If you’d like to learn more about seamless data migration using Flexify.IO, please join us for our upcoming webinar about this solution on April 7.
Backblaze and Flexify.IO Offer a New Path for Cloud Migration
Fortunately, there is a way of avoiding these pitfalls and adopting a seamless data migration strategy, thanks to our integration partner, Flexify.IO. They help businesses migrate data between clouds in a reliable, secure, and predictable manner. Oh, and they are blazing fast—but more on that later.
Before we dive into how Flexify.IO works, it would be helpful for you to understand why and when a business may want to consider a cloud migration strategy. If you are like most businesses, you researched the cloud storage space and landed on a provider that served your needs at some point in time. But then, as your business grew and your needs evolved, your storage solution became difficult and expensive to scale with your growing storage needs, and you lost the control and flexibility you once had.
For Aiden Korotkin of AK Productions, it was exactly this operational overhead that prompted him to migrate his data from Google Cloud Platform into Backblaze B2 Cloud Storage. As Korotkin transitioned from freelancing to running a full-service video production company, the cost and complexity of data management began to weigh him down. Simple tasks took several hours, and he could no longer justify the cost of doing business on Google Cloud. He needed to migrate his data into an easy-to-use, cost-effective storage platform, without causing any disruption to business workflows.
Korotkin discovered Backblaze B2 while researching secure cloud storage platforms on a user privacy and security blog. After digging more deeply into Backblaze’s services, he felt that we might be the right solution to his challenge. But the hurdle of transferring all of his data was nearly paralyzing. Ordinarily, moving several terabytes of data into Backblaze B2 via standard business connections would take weeks, months, or longer. But when Korotkin described his misgivings to the Solutions Engineers at Backblaze, they realized AK Productions might benefit from one of our newest partnerships. And sure enough, Flexify.IO accomplished the transfer in just 12 hours, with no downtime.
How Does Flexify.IO Do It?
With Flexify.IO, there is no need for users to manually offload their static and production data from a cloud storage provider and reupload it into Backblaze B2. Flexify.IO reads the data from the source storage (like Amazon S3, for instance) and writes it to the destination storage (Backblaze B2) via inter-cloud bandwidth. Since the data transfer happens within the cloud environment, (see Figure 1) this core framework allows Flexify.IO to achieve fast and secure data migration at cloud-native speeds. This process is considerably quicker than moving data through your local internet bandwidth.
Speed and Data Security
A do-it-yourself (DIY) migration approach may work for you if you are migrating less than one terabyte or one million objects, but it will become more challenging as your data set grows. When you need to migrate millions of objects, fine-tuning a DIY script is time-consuming and can be extremely costly if something goes wrong.
Flexify.IO is designed and tested for reliability, with proper error handling that “retries” in place to ensure a successful migration. What this means is that they periodically check to make sure the data has been moved to the destination it was intended for.
They also check to make sure your data is not corrupted during the process by comparing hashes and checksums at every stage. A checksum is a value used to verify the integrity of a file or a data transfer. It is a sum derived from a file or other data object that a system can “check” by comparing the sum against a record of what the sum should be. Checksums are typically used to compare two sets of data to make sure they are the same.
And with an advanced migration algorithm (see Figure 3) that allows for high speed transfers, Flexify.IO migrates incredibly large amounts of data from Amazon S3, Google Cloud, Azure, or any cloud storage provider into Backblaze B2, remarkably fast.
One of the biggest challenges when migrating data to new platforms is downtime, or being unable to access your data for a certain amount of time. Via traditional migration methods that span days, weeks, or months, lengthy downtimes can put your business at risk. And if data is changed or modified during migration, this may result in data loss or gaps and further delay your migration.
Flexify.IO reduces downtime by implementing an “incremental migration process.” They do several passes to scan your source and destination data, looking for changes between the two. If there are any differences, only those get migrated over. No manual checks or scans are necessary, as they ensure a seamless migration.
Cost and Support
The number one reason businesses stay locked-in to their cloud storage provider is due to the high egress fees required to free their data. Since partnering with Flexify.IO, we have been able to significantly reduce these egress costs for customers who wish to migrate off their existing cloud storage platforms. You pay a flat fee per GB, which includes egress fees, infrastructure costs, setup, and planning.
For an Amazon S3 to Backblaze B2 migration in the US and Canada regions, this reduced fee comes out to $0.04/GB, which is less than half the price you would usually pay to retrieve your data from S3. And at $5/TB of storage, Backblaze B2 is a fraction of the cost of major cloud storage providers like Amazon S3, Google Cloud, or Azure. Customers like AK Productions begin to realize significant cost savings within a few months of switching to a more affordable platform like Backblaze B2.
An Example of ROI: Moving Data from S3 to Backblaze B2 Using Flexify.IO If you currently store 100 terabytes of data in Amazon S3, you are now paying $2,300 per month. Migrating this data to Backblaze B2 will reduce your storage cost to $500 per month. If you attempt this move without Flexify.IO, then your cost will be around $9,000 and will require significant personnel time. Migration expense with Flexify.IO to Backblaze B2 is only $4,000. Taken together, these moves would lead to $1,800 per month in savings after two months! To learn more about what you could save, contact our sales team.
Additionally, Flexify.IO offers a transparent and upfront pricing structure for several cloud-to-cloud as well as on-premises to cloud migration options, which you can find on the pricing table on their website. We recommend using their managed service plan, which gives you access to Flexify.IO’s stellar support team who ensure a seamless end-to-end migration.
“Flexify.IO cared that my data transferred correctly. They had no idea that some files were the very last copies I had, but they treated them that way regardless,” says Korotkin of AK Productions.
Get Started with Flexify.IO
The recent trend in the commoditization of object storage has unlocked better alternatives for businesses looking to find a new home for their data. You now have the option of choosing more affordable and easy-to-use platforms, like Backblaze B2, without paying unnecessarily high storage bills or egress fees. And with the help of Flexify.IO’s migration solution, cloud-to-cloud migrations have never been more straightforward.
It only takes a few minutes to set up a Flexify.IO account. Once you click that big “Start Migration” button inside the platform, Flexify.IO handles the rest. To get started today, check out our Quick Start Guide on how to transfer your data from your existing cloud storage provider into Backblaze B2.
If you’d like to learn more about seamless data migration using Flexify.IO, please join us for our upcoming webinar about this solution on April 7.
For Additional Information: This post is one of a series focusing on solutions for professionals in media and entertainment. If you’d like to learn more, we’re hosting a series of webinars about these solutions over the coming months. Please join us for our first on March 26 with iconik!
Jeff Nicosia, owner and founder of Industrious Films, will be the first to tell you that his company of creatives is typical of the new, modern creative agency. They make their name and reputation every day by delivering big advertising agency creative services directly to clients without all the extra expense of big advertising agency overhead and excess.
Part of this lighter approach includes taking a more flexible attitude towards building their team. With decades of experience at some of the best-known agencies in LA and New York, Industrious Films knows that the best people for their projects are spread out all over the country, so they employ a distributed set of creatives on almost every job.
But with entire teams working remotely, they need tools that boost collaboration and reduce the inefficiency of sending huge media files back and forth as everyone pushes to meet client delivery deadlines.
Backblaze hired Industrious Films to produce videos for our NAB booth presence last year, and during our collaborative process we introduced Backblaze B2 Cloud Storage to their team. For this group of road-tested veterans, the potential for our cloud storage project to help their process was eye-opening indeed.
How Cloud Storage Has Impacted Industrious Films Workflow
As we re-engaged with Industrious Films to work on new projects this year, we wanted to hear Jeff’s thoughts on what cloud storage has accomplished for his team, and what it was like before they started using B2.
Skip Levens: Jeff, can you tell me about the niche that Industrious Films has carved out, and what your team is like?
Jeff Nicosia: Industrious Films brings the best of advertising agency video production directly to companies by eliminating the middleman. We tell customer and company stories really well, with craft, and have found a way to do it at a price that lets companies do a lot more video in their marketing vs. a once-a-year luxury. We’ve really found our niche in telling company stories and customer videos, working for companies like Quantum, Backblaze (of course), DDN, Thanx, Unisan, ExtraHop and tons more.
We’re all creatives that worked at ad agencies, design studios, post houses, etc. We’re spread out but come together for projects all over the country, and actually the world. Right now I’m in Manhattan Beach (Los Angeles, CA) while our main editor is on the other side of LA—25 minutes or 2 hours by car away depending on time of day—and our main graphics editor is in Iowa. Oh, and our colorist is either in Los Angeles or Brazil, depending on the time of year.
As for shooting we use sound guys, shooters, PA’s, etc., either from LA, or we hire locally wherever we’re shooting the video. We have crews we have collaborated with on multiple occasions in LA, Seattle, New York, London, and San Francisco. I actually shot a timelapse of a fairly typical shoot day: “A 14-Hour Shoot in 60s” to give you an idea of what it’s like.
SL: Jeff, before we talk about how you adopted Backblaze B2 and cloud storage in general, can you paint a picture of what it’s usually like to shoot and deliver a large video project like the one you created for us?
JN: It’s a never-ending exchange of hard drives and bouncing between Dropbox, Box, Google Drive, and what have you, as everyone is swapping files and sending updates. We’re also chasing customers and asking, “Did you get the file?” Or, “Did you send the file?” All of this was hard enough when video size was HD—now, when everything’s 4K or higher it just doesn’t work at all. A single 4K RAW file of 3-4GB might take up an entire Google Drive allowance, and it gets very expensive to save to Google Drive beyond that size. We’ve spent an entire day trying to upload a single critical file that looks like its uploading, then have it crap out hours later. At that point, we’ve just wasted a day and we’re back to slapping files on a hard drive and trying to make the FedEx cutoff.
“Any small business or creative endeavor has to be remote nowadays. You want to hire the best people, no matter where they are. When people live where they want and work out of a home office they charge less for their rates—that’s how we deliver a full-service ad agency and video production service at the prices we can.”
SL: I remember, from working together on other projects, that we were constantly swapping hard drives and saying, “Is this one yours?” Or finally seeing you again years later, and handing you back your hard drive.
JN: Right! It’s so common. And you can’t just put files on a hard drive and ship it. We’ve had overnight services lose drives on us enough times that we’ve learned to always make extra copies on entirely new hard drives before sending a drive out. It’s always a time crunch and you have to make sure you have a spare drive and that it’s big enough. And you just know that when you send it to a client you’re never going to see that drive again. It’s a cost of business, and hundreds of dollars a month just gone—or at least it used to be. I’ve spent way too much time stalking Best Buy buying extra hard drives when there’s a sale because we were constantly buying drives.
SL: So that was the mindset when we kicked off our NAB Video Project last year (for NAB 2019) and I said, instead of handing you a hard drive with all of our B-roll, logos, etc., let’s use Backblaze B2.
Technical Note: I helped Industrious Films set up three buckets: a private bucket that I issued write-only app keys for (basically a ‘drop bucket’ for any incoming content); a private bucket for everyone on the project to access; and a public bucket for sharing review files directly from Backblaze if needed.
Next, I cut app keys for Industrious Films that spanned all three buckets so that they could easily organize and rearrange content as needed. I entered the app key into a bookmark in Cyberduck, and gave Industrious Films the bookmark to drop into Cyberduck.
JN: Well, we work for technical clients, but I’m not really a technical guy. And my team are all creatives, not techies, so anything we use has to be incredibly simple. I wasn’t sure how it was going to work. Most of us were familiar with FTP clients, and this interface looks like the files and folders we’d see on a typical shared storage server, so it was very easy to adapt.
“Even though I have a background in tech, I’ve worked in technology companies, and my customers are tech companies, I’m not a tech savvy guy at all and I don’t want to be. So the tools I use have to be simple and let me get on with telling my customer’s story.”
Everyone on my team works out of their home offices, or shared workspaces. I’ve got a 100 Megabit connection, up and down, and our graphics guy has the same—and he’s in the middle of Iowa. We each started uploading files in Cyberduck, then we jumped on a Skype call together and watched 6GB files fly across and we were just blown away. We just couldn’t believe that this was cloud storage, and it seemed like the more we put in, the faster it got. Our graphics guy was just raving about it, trying out bigger and bigger file uploads. He was freaking out—he kept saying, “What kind of secret sauce do these guys have!?”
SL: Can you tell me how the team adjusted to using a shared bucket? What did collaboration look like?
JN: First of all, since we had a files and folders interface, I jumped right in and did the usual organization of assets. One folder for Backblaze customer video reels, one for Backblaze B-roll, one for logos, one for audio, one for storyboards, motion graphics templates, etc. Then everyone downloads what they need from the folder to work locally, and puts changed and finished files back up in the shared bucket for everyone to see. That way we can review on the fly.
I sync everything to a local RAID array, but most of the time my focus is only on the shared bucket with the team. I don’t use an asset manager or project manager solution—I can always drop in something like iconik later if we’re doing overlapping large projects simultaneously. This works for our team for now and is exactly what we need.
“My graphics lead moved from North Hollywood to Iowa. And whether he’s 25 miles away from me or 2000, if we’re not in the same room, we need a way to send files to each other quickly to work together. So if the tools are good enough, it doesn’t matter where the team is anymore.”
SL: I seem to remember we needed some of those files for tweaks and changes as we were deploying on the NAB show floor?
JN: Right, since we had the entire project and all the source files online, in all the chaos of NAB booth building before the show opened, as we played our video on the huge screens—we realized we could still swap in a better graphic. So, we just pulled from the Backblaze web interface and dropped it in right there. Otherwise, we’d have had to track down the new file and have someone deliver it to us, or more likely not make the change at all.
“Speed is collaboration for us as a small team. When uploads and downloads are fast and we’re not fighting to move files around, then we can try new ideas quickly, zero in on the best approach, and build and turn projects faster. It’s how we punch way, way above our weight as a small shop and compete with the biggest agencies.”
SL: What advice would you give creatives who want to try to rely less on dragging hard drives around? Any final thoughts?
A: Well, first of all, hard drives are never totally going away. At least not until something very simple and very cheap comes along. I might work for technical customers, but sometimes their marketing leads will hand me hard drives, or when I want to deliver a file or have them review a file they’ll ask me to put it on a private YouTube or Vimeo link. They want to review on their phone or at lunch, so it needs to be simple for them, too. But at least we can organize everything we do on Backblaze and there’s a lot fewer hard drives in my life at least.
One of the biggest revelations I’ve had is not just for editors and producers working on projects like we did, but for shooters too. On a shoot, everyone takes a copy of the raw files and no one leaves the shoot until there are two copies. If there’s a problem with the camera carts (storage cards) this whole process can be agonizingly slow. If only more people knew they could upload a copy to something like Backblaze that would not only function as a shared copy but also allow everyone to start reviewing and editing files right away instead of waiting until they got back to the shop.
And finally, everyone can do what we’ve done. The way we’ve thrived and how creatives find their niche and thrive in a gig economy is to use simple, easy to use tools that let you tell those stories, offer better service, and compete with bigger agencies with higher overhead. We did it, anyone else can too.
SL: Absolutely! Thanks Jeff, for taking time out to talk to us. We really appreciate your team’s work and look forward to working together on our next project!
“What is Cloud Storage?” is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about “the Cloud” is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.” –The Editors
“Big Data” is a phrase people love to throw around in advertising and planning documents, despite the fact that the term itself is rarely defined the same way by any two businesses, even among industry leaders. However, everyone can agree about its rapidly growing importance—understanding Big Data and how to leverage it for the greatest value will be of critical organizational concern for the foreseeable future.
So then what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? What makes it so big, anyway? Let’s bring Big Data down to size.
What is Big Data?
First things first, for purposes of this discussion, “Big” means any amount of data that exceeds the storage capacity of a single organization. “Data” refers to information stored or processed on a computer. Collectively, then, “Big Data” is a massive volume of both structured or unstructured (or both) data that is too large to effectively process using traditional relational database management systems or applications. In more general terms, when your infrastructure is too small to handle the data your business is generating—either because the volume of data is too large, it moves too fast, or it simply exceeds the current processing capacity of your systems—you’ve entered the realm of Big Data.
Let’s take a look at the defining characteristics.
Characteristics of Big Data
Current definitions of Big Data often reference a “triple (or in some cases quadruple) V” construct for detailing its characteristics. The “V”s reference velocity, volume, variety, and variability. We’ll define them for you here:
Velocity refers to the speed of generation of the data—the pace at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, mobile devices, etc. This speed determines how rapidly data must be processed to meet business demands, which determines the real potential for the data.
The term Big Data itself obviously references significant volume. But beyond just being “big,” the relative size of a data set is a fundamental factor in determining its value. The volume of data stored by an organization is used to ascertain its scalability, accessibility, and ease or difficulty of management. A few examples of high volume data sets are all of the credit card transactions in the United States on a given day; the entire collection of medical records in Europe; and every video uploaded to YouTube in an hour. A small to moderate volume might be the total number of credit card transactions in your business.
Variety refers to how many disparate or separate data sources contribute to an organization’s Big Data, along with the intrinsic nature of the data coming from each source. This relates to both structured and unstructured data. Years ago, spreadsheets and databases were the primary sources of data handled by the majority of applications. Today, data is generated in a multitude of formats such as email, photos, videos, monitoring devices, PDFs, audio, etc.,—all of which demand different considerations in analysis applications. This variety of formats can potentially create issues for storage, mining, and analyzing data.
This concerns any inconsistencies in the data formats coming from any one source. Where variety considers different inputs from different sources, variability considers different inputs from one data source. These differences can complicate the effective management of the data store. Variability may also refer to differences in the speed of the data flow into your storage systems. Where velocity refers to the speed of all of your data, variability refers to how different data sets might move at different speeds. Variability can be a concern when the data itself has inconsistencies despite the architecture remaining constant.
An example from the health sector would be the variances within influenza epidemics (when and where they happen, how they’re reported in different health systems) and vaccinations (where they are/aren’t available) from year to year.
Understanding the makeup of Big Data in terms of Velocity, Volume, Variety, and Variability is key when strategizing big data solutions. This fundamental terminology will help you to effectively communicate among all players involved in decision making when you bring Big Data solutions to your team or your wider business. Whether pitching solutions, engaging consultants or vendors, or hearing out the proposals of the IT group, a shared terminology is crucial.
What is Big Data Used For?
Businesses use Big Data to try to predict future customer behavior based on past patterns and trends. Effective predictive analytics are the metaphorical crystal ball that organizations seek about what their customers want and when they want it. Theoretically, the more data collected, the more patterns and trends the business can identify. This information can potentially make all the difference for a successful strategy in customer acquisition and retention, and create loyal advocates for a business.
In this case, bigger is definitely better! But, the method an organization chooses to address its Big Data needs will be a pivotal marker for success in the coming years. Choosing your approach begins with understanding the sources of your data.
Sources of Big Data
Today’s world is incontestably digital: an endless array of gadgets and devices function as our trusted allies on a daily basis. While helpful, these constant companions are also responsible for generating more and more data every day. Smartphones, GPS technology, social media, surveillance cameras, machine sensors (and the growing number of users behind them) are all producing reams of data on a moment-to-moment basis that has increased exponentially, from 1 Zetabyte of customer data produced in 2009 to more than 35 Zetabytes in 2020.
If your business uses an app to receive and process orders for customers, or if you log extensive point-of-sale retail data, or if you have massive email marketing campaigns, you could have sources for untapped insight into your customers.
Once you understand the sources of your data, the next step is understanding the methods for housing and managing it. Data Warehouses and Data Lakes are two of the primary types of storage and maintenance systems that you should be familiar with.
Where Is Big Data Stored? Data Warehouses & Data Lakes
Although both Data Lakes and Data Warehouses are widely used for Big Data storage they are not interchangeable terms.
A Data Warehouse is an electronic system used to organize information. A Data Warehouse goes beyond the capabilities of a traditional relational database’s function of housing and organizing data generated from a single source only.
How Do Data Warehouses Work?
A Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose. A warehouse combines information from multiple sources into a single comprehensive database.
For example, in the retail world, a data warehouse may consolidate customer info from point-of-sale systems, the company website, consumer comment cards, and mailing lists. This information can then be used for distribution and marketing purposes, to track inventory movements, customer buying habits, manage promotions, and to determine pricing policies.
Additionally, the Data Warehouse may also incorporate information about company employees such as demographic data, salaries, schedules, and so on. This type of information can be used to inform hiring practices, set Human Resources policies and help guide other internal practices.
Data Warehouses are fundamental in the efficiency of modern life. For instance:
Have a plane to catch?
Airline systems rely on Data Warehouses for many operational functions like route analysis, crew assignments, frequent flyer programs, and more.
Have a headache?
The healthcare sector uses Data Warehouses to aid organizational strategy, help predict patient outcomes, generate treatment reports, and cross-share information with insurance companies, medical aid services, and so forth.
Are you a solid citizen?
In the public sector, Data Warehouses are mainly used for gathering intelligence and assisting government agencies in maintaining and analyzing individual tax and health records.
Playing it safe?
In investment and insurance sectors, the warehouses are mainly used to detect and analyze data patterns reflecting customer trends, and to continuously track market fluctuations.
Have a call to make?
The telecommunications industry makes use of Data Warehouses for management of product promotions, to drive sales strategies, and to make distribution decisions.
Need a room for the night?
The hospitality industry utilizes Data Warehouse capabilities in the tailored design and cost-effective implementation of advertising and marketing programs targeted to reflect client feedback and travel habits.
Data Warehouses are integral in many aspects of the business of everyday life. That said, they aren’t capable of handling the inflow of data in its raw format, like object files or blobs. A Data Lake is the type of repository needed to make use of this raw data. Let’s examine Data Lakes next.
What is a Data Lake?
A Data Lake is a vast pool of raw data, the purpose for which is not yet defined. This data can be both structured and unstructured. The prime attributes of a Data Lake are a secure and adaptable data storage and maintenance system distinguished by its flexibility, agility, and ease of use.
If you’re considering a business approach that involves Data Lakes, you’ll want to look for solutions that have the following characteristics: they should retain all data and support all data types; they should easily adapt to change; and they should provide quick insights to as wide a range of users as you require.
Use Cases for Data Lakes
Data Lakes are most helpful when working with streaming data, like the sorts of information gathered from machine sensors, live event-based data streams, clickstream tracking, or product/server logs.
Deployments of Data Lakes typically address one or more of the following business use cases:
Business intelligence and analytics – analyzing streams of data to determine high-level trends and granular, record-level insights. A good example of this is the oil and gas industry, which has used the nearly 1.5 Terabytes of data they generate on a daily basis to increase their efficiency.
Data science – unstructured data allows for more possibilities in analysis and exploration, enabling innovative applications of machine learning, advanced statistics and predictive algorithms. State, city, and federal governments around the world are using data science to dig more deeply into the massive amount of data they collect regarding traffic, utilities, and pedestrian behavior to design safer, smarter cities.
Data serving – Data Lakes are usually an integral part of high-performance architectures for applications that rely on fresh or real-time data, including recommender systems, predictive decision engines or fraud detection tools. A good example of this use case are the different Customer Data Platforms available that pull information from many behavioral and transactional data sources to highly refine and target marketing to individual customers.
When considered together, the different potential applications for Data Lakes in your business seem to promise an endless source of revolutionary insights. But the ongoing maintenance and technical upgrades required for these data sources to retain relevance and value is massive. If neglected or mismanaged, Data Lakes quickly devolve. As such, one of the biggest considerations to weigh when considering this approach is whether you have the financial and personnel capacity to manage Data Lakes over the long term.
What is a Data Swamp?
A Data Swamp, put simply, is a Data Lake that no one cared to manage appropriately. They arise when a Data Lake is being treated as storage only, with a lack of curation, management, retention and lifecycle policies, and metadata. And if you decided to work Data Lake derived insights into your business planning, and end up with a Swamp, you are going to be sorely disappointed. You’re paying the same amount to store all of your data, but returning zero effective intelligence to your bottom line.
Final Thoughts on Big Data Maintenance
Any business or organization considering entry into Big Data country will want to be very careful and planful as they consider how they will store, maintain, and analyze their data. Making the right choices at the outset will ensure you’re able to traverse the developing digital landscape with strategic insights that enable informed decisions to keep you ahead of your competitors. We hope this primer on Big Data gives you the confidence to take the appropriate first steps.
Getting started with cloud storage is easy. You can sign up for an account in seconds, and in minutes you can have directories full of files from your latest project in your cloud account, accessible to you and anyone you want to share the files with. If you have dozens or even hundreds of terabytes of data already, however, uploading it all into your cloud storage bucket, or transferring it there from another cloud service will take a bit of careful planning.
Thankfully, whether you’re looking to transfer a significant amount of data from your on-premises solution to the cloud, or you’d like to escape the grips of a cloud storage provider like Amazon S3 without breaking your budget, Backblaze has worked hard to ensure you have any number of pathways to, from, and around the cloud. We’ve gathered six of our favorite services and partnerships for transferring or migrating your data.
For Additional Information: This post is one of a series leading up to the annual NAB conference in Las Vegas from April 18-22. If you’re attending, please join us at booth SL8716 to learn more about the solutions outlined below. For all of our readers, we’re hosting a series of webinars about these solutions over the coming months. Please join us for our first on March 26 with iconik!
Quick and Affordable Uploads to the Cloud with the Fireball
Many of our customers have used our Backblaze Fireball rapid ingest service to migrate large data sets from their on-premises environments into Backblaze cloud storage quickly, affordably, and securely.
How it works: we send you a Backblaze Fireball, a 70 TB hard drive array. You copy files to the Fireball directly or through a data transfer tool of your choice. (Backup, sync, and archive tools are great for that.) Once you’re done, return the Fireball to Backblaze and we’ll securely upload the files to your Backblaze B2 Cloud Storage account inside one of our data centers. Fireball service is priced at $550 per 30-day rental, which gives you a comfortable window of time to load your data.
This service has proved to be a customer favorite because even with high speed internet connections, it can take months to transfer data sets to the cloud. Fireball can get you up and running in weeks. For example, after KLRU—the PBS affiliate in Austin, Texas—completed their digital restoration project of more than four decades of Austin City Limits shows, they used the Fireball to efficiently load the entire 40 terabyte video library into B2 Cloud Storage.
Similarly, creative agency Baron & Baron jump-started their entrance into cloud backups with the Fireball. Using Archiware, they created full backups of all of their servers into Fireballs, which were then uploaded securely into their B2 account much quicker than if they’d backed up over their internet connection.
As popular as the Fireball is, not all of our customers use it to get started. Some don’t have dozens of terabytes to upload at once. Others have access to high-speed internet connections, or their content wasn’t organized enough to justify renting a Fireball for a single bulk upload to the cloud. These customers opted for other pathways to migrate their data into the cloud that we’ll outline below.
No Rush? Try Internet Transfers with Cyberduck and Transmit 5
If you don’t have a huge digital library to upload or don’t generate terabytes of data every day, your existing internet connection may be sufficient for your transfer. And with the right tools you won’t have to wait too long. B2 Cloud Storage is integrated with two tools—Cyberduck and Transmit 5—that use multi-threading to transfer large files much faster.
Cyberduck runs on both macOS and Windows. It’s open-source, free software, but their team of volunteers will gladly accept donations to help them develop, maintain, and distribute it.
Transmit 5 by Panic runs on macOS. You can try it out for free for seven days or keep it forever for $45 per license. Volume discounts are available.
If you want to learn more about how multi-threading makes this speed boost possible, you can read more about the basics of threading here. The short version is that before transmission, large files are broken into chunks that are simultaneously transferred across multiple threads, then recompiled after transmission is complete. If you’re transferring video, high-resolution photos, or other large files, these file transfer tools are well-worth the effort of installing them, and more than worth their inexpensive cost.
One Backblaze media customer took this approach for migrating their decades-old digital archive to the cloud. They started by backing up their active projects to the cloud to protect against far-too-often accidental deletions. Then, in the background, they gradually copied their archive of finished projects and raw footage from on-premises storage to the cloud. The complete archive migration took months to finish, but that was acceptable given their ability to maintain their copies on existing on-premises storage during the process.
Boost Large File Transfers with FileCatalyst
If your existing internet bandwidth is sufficient for your day-to-day needs, but you occasionally need to transfer high volumes of data quickly—for example, after videotaping a weekend conference—look to fast file transport solutions like FileCatalyst.
FileCatalyst accelerates file transfers between remote locations even when there is high network latency or packet loss—in other words, when the connection is unreliable and weak—allowing you to send at speeds up to 10Gbps. Optimized for large data sets, its proprietary technologies include UDP-based protocol that’s much faster than standard TCP (the other main protocol computers use to communicate over the internet).
As a software-only solution, FileCatalyst doesn’t require customers to add special hardware or make bandwidth upgrades. Service starts at $300/month, and is available as a month-to-month subscription with no additional data transfer charges or bandwidth caps. Consumption pricing and perpetual license pricing are also available.
FileCatalyst has been integrated with Backblaze cloud storage for two years and is a solution we recommend to customers who need to transfer data on the order of 10 terabytes or more per day. With this solution you can get all the footage from that weekend conference transferred and available to editors come Monday morning.
Pay-As-You-Go with MASV Fast File Transfer
MASV is a recent Backblaze integration that also offers fast file transfer services, but on a pay-as-you-go basis. MASV runs in your web browser, so you don’t need to download and install any software. This simplicity makes MASV perfect for delivering content to and from clients who would rather login and drag-and-drop files than download software and train their team.
Have you ever struggled with keeping content for collaborative teams organized? With MASV, you can provide a simple Portals page where contributors can upload huge media files and share across the team members without requiring direct access to the project’s cloud storage buckets. MASV not only moves the large files faster, but also makes it easier for contributors to get started, and reduces the risk of disruption. And MASV’s pay-as-you-go 25 cents/GB pricing means you only pay for what the team actually uses. A free 7-day trial lets you test it out.
Smart Applications for a “Hybrid Cloud” Approach with iconik
Knowing that their customers increasingly are keeping their data on different types of storage simultaneously, some vendors are building their applications to access and manage data wherever it’s stored, whether on-premises or in the cloud.
B2 Cloud Storage is integrated with iconik, a media asset management (MAM) platform that was designed with this hybrid cloud approach in mind. With iconik, all your assets appear in the same visual interface, regardless of where they’re stored. The cloud-based MAM generates and stores proxies or thumbnails in the cloud, but keeps full-resolution files in their original location. iconik downloads the full-resolution files to the user only when needed.
This hybrid cloud is great when you want the flexibility of the cloud, but want to migrate there in stages. As we’ve noted above, it can take time to move everything to the cloud and it’s easier for people to do it on their own schedule. According to their recent Media Stats Report, iconik customers are making good use of this flexibility, with 53% of iconik-managed assets stored in the cloud and 47% stored on-premises.
Backblaze customer Fin Films took this approach when they moved to the cloud. Owner Chris Aguilar was painfully aware of how vulnerable content on aging hard drives can be; he had to pay a drive reconstruction team to salvage footage before. So, Fin Films uploaded their most irreplaceable content to B2 cloud storage first, then began gradually moving other content.
As they were migrating content to their new cloud archive, Fin Films rolled out iconik to manage their assets better. iconik’s “bring your own storage” approach meant that they didn’t have to pull down their content from Backblaze and upload it to iconik. And iconik’s pay-as-you-use-it pricing allowed Chris Aguilar to add collaborators on the fly, and share his content quickly by adding view-only licenses at no charge. Pricing for iconik starts at $250/month for a small team.
Move from Cloud to Cloud with Flexify.IO
Sometimes the difficulty isn’t getting data into the cloud, it’s moving data from one cloud service to another. Transferring your data from one cloud storage service to another isn’t as easy or inexpensive as you might hope. Cloud providers offer different capabilities and more significantly, dramatically different pricing. As is true with moving your household, hiring experts to do the job can save you a lot of time and stress. Enter Flexify.IO.
Flexify.IO offers cloud data migration services that simplify moving or copying your data between cloud storage providers. The service ensures maximum throughput by using cloud internet connections, as opposed to your local internet bandwidth, and eliminates downtime during migration. And it’s now fully integrated with Backblaze cloud storage, with special pricing for migrating data from AWS to Backblaze.
Video production company AK Productions recently used Flexify.IO to move their video content to B2 Cloud Storage from Google Cloud Storage. They had found their Google cloud service took hours to manage—time they couldn’t afford to spend while growing their business. They were also worried about the privacy of their clients’ important data on Google.
Partnering with Flexify.IO, Backblaze successfully migrated 12 terabytes of data stored on Google Cloud to B2 Cloud Storage in 12 hours. Transferring that amount of data via standard business connections can take weeks, months, or even longer. AK Productions was able to achieve all of this with no disruption to their project workflows. Even with the cost of egress fees from Google, the business will break even and start realizing significant cost savings in approximately six months.
So Many Choices for So Many Pathways to the Cloud
We know choosing the right path to the cloud for your organization when there are so many choices can be difficult. For that reason, we’ll be hosting “how-to” webinars that go deeper on the process and visually show you the steps to take. Our first webinar is March 26 with iconik. To stay in the loop for future webinars after NAB 2020, follow our webinar channel on BrightTalk. And we’ll have a team of Solution Engineers at NAB to demonstrate the tools and offer expert guidance. Sign up now to meet with us at NAB.
As of December 31, 2019, Backblaze had 124,956 spinning hard drives. Of that number, there were 2,229 boot drives and 122,658 data drives. This review looks at the hard drive failure rates for the data drive models in operation in our data centers. In addition, we’ll take a look at how our 12 and 14 TB drives are doing and get a look at the new 16 TB drives we started using in Q4. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
2019 Hard Drive Failure Rates
At the end of 2019 Backblaze was monitoring 122,658 hard drives used to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 5,000 drive days during Q4 (see notes and observations for why). This leaves us with 122,507 hard drives. The table below covers what happened in 2019.
Notes and Observations
There were 151 drives (122,658 minus 122,507) that were not included in the list above. These drives were either used for testing or did not have at least 5,000 drive days during Q4 of 2019. The 5,000 drive-day limit removes those drive models where we only have a limited number of drives working a limited number of days during the period of observation. NOTE: The data for all drives, data drives, boot drives, etc., is available for download on the Hard Drive Test Data webpage.
The only drive model not to have a failure during 2019 was the 4 TB Toshiba, model: MD04ABA400V. That’s very good, but the data sample is still somewhat small. For example, if there had been just 1 (one) drive failure during the year, the Annualized Failure Rate (AFR) for that Toshiba model would be 0.92%—still excellent, not 0%.
The Toshiba 14 TB drive, model MG07ACA14TA, is performing very well at a 0.65% AFR, similar to the rates put up by the HGST drives. For their part, the Seagate 6 TB and 10 TB drive continue to be solid performers with annualized failure rates of 0.96% and 1.00% respectively.
The AFR for 2019 for all drive models was 1.89% which is much higher than 2018. We’ll discuss that later in this review.
Beyond the 2019 Chart—“Hidden” Drive Models
There are a handful of drive models that didn’t make it to the 2019 chart because they hadn’t recorded enough drive-days in operation. We wanted to take a few minutes to shed some light on these drive models and where they are going in our environment.
Seagate 16 TB Drives
In Q4 2019 we started qualifying Seagate 16 TB drives, model: ST16000NM001G. As of the end of Q4 we had 40 (forty) drives in operation, with a total of 1,440 drive days—well below our 5,000 drive day threshold for Q4, so they didn’t make the 2019 chart. There have been 0 (zero) failures through Q4, making the AFR 0%, a good start for any drive. Assuming they continue to pass our drive qualification process, they will be used in the 12 TB migration project and to add capacity as needed in 2020.
Toshiba 8 TB Drives
In Q4 2019 there were 20 (twenty) Toshiba 8 TB drives, model: HDWF180. These drives have been installed for nearly two years. In Q4, they only had 1,840 drive days, below the reporting threshold, but lifetime they do have 13,994 drive days with only 1 drive failure, giving us an AFR of 2.6%. We like these drives, but by the time they were available to us in quantity, we could buy 12 TB drives at the same cost per TB. More density, same price. Given we are moving to 16 TB drives and beyond, we most likely will not be buying any of these drives in the future.
HGST 10 TB Drives
There are 20 (twenty) HGST 10 TB drives, model: HUH721010ALE600 in the operation. These drives have been in service a little over one year. They reside in the same Backblaze Vault as the Seagate 10 TB drives. The HGST drives recorded only 1,840 drive days in Q4 and a total of 8,042 since being installed. There have been 0 (zero) failures. As with the Toshiba 8 TB, purchasing more of these 10 TB drives is unlikely.
Toshiba 16 TB Drives
You won’t find these in the Q4 stats, but in Q1 2020 we added 20 (twenty) Toshiba 16 TB drives, model: MG08ACA16TA. They have logged a total of 100 drive days, so it is way too early to say anything other than more to come in the Q1 2020 report.
Comparing Hard Drive Stats for 2017, 2018, and 2019
The chart below compares the Annualized Failure Rates (AFR) for each of the last three years. The data for each year is inclusive of that year only and for the drive models present at the end of each year.
The Rising AFR in 2019
The total AFR for 2019 rose significantly in 2019. About 75% of the different drive models experienced a rise in AFR from 2018 to 2019. There are two primary drivers behind this rise. First, the 8 TB drives as a group seem to be having a mid-life crisis as they get older, with each model exhibiting their highest failure rates recorded. While none of the rates is cause for worry, they contribute roughly one fourth (1/4) of the drive days to the total, so any rise in their failure rate will affect the total. The second factor is the Seagate 12 TB drives, this issue is being aggressively addressed by the 12 TB migration project reported on previously.
The Migration Slows, but Growth Doesn’t
In 2019 we added 17,729 net new drives. In 2018, a majority of the 14,255 drives added were due to migration. In 2019, less than half of the new drives were for migration with the rest being used for new systems. In 2019 we decommissioned 8,800 drives totaling 37 Petabytes of storage and replaced them with 8,800 drives, all 12 TB, totaling about 105 Petabytes of storage, then we added an additional 181 Petabytes of storage in 2019 using 12 TB and 14 TB drives.
Manufacturer diversity across drive brands increased slightly in 2019. In 2018, Seagate drives were 78.15% of the drives in operation, by the end of 2019 that percentage had decreased to 73.28%. HGST went from 20.77% in 2018, to 23.69% in 2019, and Toshiba increased form 1.34% in 2018 to 3.03% in 2019. There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.
Lifetime Hard Drive Stats
While comparing the annual failure rates of hard drives over multiple years is a great way to spot trends, we also look at the lifetime annualized failure rates of our hard drives. The chart below shows the annualized failure rates of all of the drives models in production as of 12/31/2019.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purposes. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the CSV files for each chart.
Good luck and let us know if you find anything interesting.
Regular Hard Drive Stats readers will recall that our blog post about Q3 2019 explained that we planned to take a closer look at some drive failures we were seeing at the time and report back when we knew more. Well, we’ve been monitoring the situation since then and wanted to update you on where things stand. Despite the fact that Hard Drive Stats for 2019 are just around the corner, we decided to share this information with you as soon as we could, rather than waiting for the next post. In summary, this year (and going into the next year) we expect to see higher failure rates in some of our hard drives and we will be migrating some drives to newer models. Below, we’ll discuss what’s going on, what we’re doing about it, and why customers shouldn’t worry.
So What’s Up?
In a recent blog post, we interviewed our Director of Supply Chain, Ariel Ellis, about how we purchase and qualify hard drives to be deployed in our data centers. The TL/DR is that our qualification process is robust. Nevertheless, for all providers of scale in the cloud storage industry, trends that are hard to project during testing can emerge over time after drives are used in production batches of dozens of petabytes, or more, at a time.
What we’re seeing in our fleet right now is a higher-than-typical failure rate among some of our 12TB Seagate drives. It’s customary for hard drive manufacturers like Seagate, when working with data centers and cloud service providers, to ensure successful deployment of large-scale drive fleets, and as such we’re working closely with them to analyze the drives and their performance. This analysis usually includes things like testing new drive platforms in real workload environments, providing telemetry tools to predict failures, performing ongoing custom adjustments, and employing firmware development and replacement units (RMAs). Customer data durability is paramount for both Backblaze and Seagate, so as we analyze root causes and implications we’re also working together on a migration effort to replace these particular drives in our data centers. In the short term, failure rates for a subset of our drives may increase, but we have processes in place to adjust for that fluctuation.
Running a cloud business is complex, so it’s very helpful to have a partner like Seagate who can help us to react quickly and bring their expertise in drive deployment to bear in aiding our migration efforts. It’s worth noting that situations like this are not uncommon in our industry and often go unnoticed by the end-users of the services, as most cloud providers do not inform customers or the public when they experience issues like what we’re describing. Backblaze, on the other hand, is a bit more open than most companies in the industry.
We’re in a unique position because of the Hard Drive Stats that we publish, which is why we felt it was important to let folks know about the upcoming changes ahead of time. At the end of the day, we think this openness is helpful for everyone, especially our customers.
In the near term, we expect to see moderately increased failure rates for this specific subset of 12TB drives, but as we complete the drive migration, we project our fleet’s failure rates will restore to historical norms. Meanwhile, it will be business as usual. We’ll continue to provide the most reliable, affordable, and easy-to-use cloud storage and computer backup available, and we’ll continue to provide our Hard Drive Stats for you every quarter.
For those who follow Backblaze, you’ll know that QNAP was an early integrator for our B2 Cloud Storage service. The popular storage company sells solutions for almost any use case where local storage is needed, and with their Hybrid Backup Sync software, you can easily sync that data to the cloud. For years, we’ve helped QNAP users like Yoga International and SoCo Systems back up and archive their data to B2. But QNAP never stops innovating, so we wanted to share some recent updates that will have both current and potential users excited about the future of our integrations.
Hybrid Backup Sync 3.0
Current QNAP and B2 users are used to having Hybrid Backup Sync (HBS) quickly and reliably sync their data to the cloud. With the HBS 3.0 update, the feature has become far more powerful. The latest update adds true backup capability for B2 users with features like version control, client-side encryption, and block-level deduplication. QNAP’s operating system, QTS, continues to deliver innovation and add thrilling new features. In the QTS 4.4.1 update, you also have the ability to preview backed up files using the QuDedup Extract Tool, allowing QNAP users to save on bandwidth costs.
The QTS 4.4.1 update is now available (you can download it here) and the HBS 3.0 update is currently available in the App Center on your QNAP device.
Hybrid Mount and VJBOD Cloud
The new Hybrid Mount and VJBOD Cloud apps will allow QNAP users to designate a drive in their system to function as a cache while accessing their B2 Cloud Storage. This allows users to interact with B2 just like you would a folder on your QNAP device while using B2 as an active storage location.
Hybrid Mount and VJBOD Cloud are both included in the QTS 4.4.1 update and function as a storage gateway on a file-based or block-based level, respectively. Hybrid Mount enables B2 to be used as a file server and is ideal for online collaboration and file-level data analysis. VJBOD Cloud is ideal for a large number of small files or singular massively large files (think databases!) since it’s able to update and change files on a block-level basis. Both apps offer the ability to connect to B2 via popular protocols to fit any environment, including SMB, AFP, NFS, FTP and WebDAV.
QuDedup introduces client-side deduplication to the QNAP ecosystem. This helps users at all levels to save on space on their NAS by avoiding redundant copies in storage. B2 users have something to look forward to as well since these savings carry over to cloud storage via the HBS 3.0 update.
QNAP continues to innovate and unlock the potential of B2 in the NAS ecosystem. We’re huge fans of these new updates and whatever else may come down the pipeline in the future. We’ll be sure to highlight any other exciting updates as they become available.
Backblaze’s data centers may not be the biggest in the world of data storage, but thanks to some chutzpah, transparency, and wily employees, we’re able to punch well above our weight when it comes to purchasing hard drives. No one knows this better than our Director of Supply Chain, Ariel Ellis.
As the person on staff ultimately responsible for sourcing the drives our data centers need to run—some 117,658 by his last count—Ariel knows a thing or two about purchasing petabytes-worth of storage. So we asked him to share his insights on the evaluation and purchasing process here at Backblaze. While we’re buying at a slightly larger volume than some of you might be, we hope you find Ariel’s approach useful and that you’ll share your own drive purchasing philosophies in the comments below.
An Interview with Ariel Ellis, Director of Supply Chain at Backblaze
Sourcing and Purchasing Drives
Backblaze: Thanks for making time, Ariel—we know staying ahead of the burn rate always keeps you busy. Let’s start with the basics: What kinds of hard drives do we use in our data centers, and where do we buy them?
Ariel: In the past, we purchased both consumer and enterprise hard drives. We bought the drives that gave us the best performance and longevity for the price, and we discovered that, in many cases, those were consumer drives.
Today, our purchasing volume is large enough that consumer drives are no longer an option. We simply can’t get enough. High capacity drives in high volume are only available to us in enterprise models. But, by sourcing large volume and negotiating prices directly with each manufacturer, we are able to achieve lower costs and better performance than we could when we were only buying in the consumer channel. Additionally, buying directly gives us five year warranties on the drives, which is essential for our use case.
We began to purchase direct around the launch of our Vault architecture, in 2015. Each Vault contains 1,200 drives and we have been deploying two to four, or more, Vaults each month. 4,800 drives are just not available through consumer distribution. So we now purchase drives from all three hard drive manufacturers: Western Digital, Toshiba, and Seagate.
Backblaze: Of the drives we’re purchasing, are they all 7200 RPM and 3.5” form factor? Is there any reason we’d consider slower drives or 2.5” drives?
Ariel: We use drives with varying speeds, though some power-conserving drives don’t disclose their drive speed. Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.
As far as 2.5″ drives, I’ve run the math and they’re not more cost effective than 3.5″ drives, so there’s no incentive for us to use them.
Backblaze: What about other drive types and modifications, like SSD, or helium enclosures, or SMR drives? What are we using and what have we tried beyond the old standards?
Ariel: When I started at Backblaze, SSDs were more than ten times the cost of conventional hard drives. Now they’re about three times the cost. But for Backblaze’s business, three times the cost is not viable for the pricing targets we have to meet. We do use some SSDs as boot drives, as well as in our backend systems, where they are used to speed up caching and boot times, but there are currently no flash drives in our Storage Pods—not in HDD or M.2 formats. We’ve looked at flash as a way to manage higher densities of drives in the future and we’ll continue to evaluate their usefulness to us.
Helium has its benefits, primarily lower power draw, but it makes drive service difficult when that’s necessary. That said, all the drives we have purchased that are larger than 8 TB have been helium—they’re just part of the picture for us. Higher capacity drives, sealed helium drives, and other new technologies that increase the density of the drives are essential to work with as we grow our data centers, but they also increase drive fragility, which is something we have to manage.
SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.
Soon we’ll be dealing with MAMR and HAMR drives as well. We plan to test both technologies in 2020. We’re also testing interesting new tech like Seagate’s MACH.2 Multi Actuator, which allows the host to request and receive data simultaneously from two areas of the drive in parallel, potentially doubling the input/output operations per second (IOPS) performance of each individual hard drive. This offsets issues of reduced data availability that would otherwise arise with higher drive capacities. The drive also can present itself as two independent drives. For example, a 16 TB drive can appear as two independent 8 TB drives. A Vault using 60 drives per pod could present as 120 drives per pod. That offers some interesting possibilities.
Backblaze: What does it take to deploy a full vault, financially speaking? Can you share the cost?
Ariel: The cost to deploy a single vault varies between $350,000 to $500,000, depending on the drive capacities being used. This is just the purchase price though. There is also the cost of data center space, power to house and run the hardware, the staff time to install everything, and the bandwidth used to fill it. All of that should be included in the total cost of filling a vault.
Evaluating and Testing New Drive Models
Backblaze: Okay, so when you get to the point where the tech seems like it will work in the data center, how do you evaluate new drive models to include in the Vaults?
Ariel: First, we select drives that fit our cost targets. These are usually high capacity drives being produced in large volumes for the cloud market. We always start with test batches that are separate from our production data storage. We don’t put customers’ data on the test drives. We evaluate read/write performance, power draw, and generally try to understand how the drives will behave in our application. Once we are comfortable with the drive’s performance, we start adding small amounts to production vaults, spread across tomes in a way that does not sacrifice parity. As drive capacities increase, we are putting more and more effort into this qualification process.
We used to be able to qualify new drive models in thirty days. Now we typically take several months. On one hand, this is because we’ve added more steps to pre- and post-production testing. As we scale up, we need to scale up our care, because the effect of any issues with drives increases in line with bigger and bigger implementations. Additionally, from a simple physics perspective, a vault that uses high capacity drives takes longer to fill and we want to monitor the new drive’s performance throughout the entire fill period.
Backblaze: When it comes to the evaluation of the cost, is there a formula for $/terabyte that you follow?
Ariel: My goal is to reduce cost per terabyte on a quarterly basis—in fact, it’s a part of how my job performance is evaluated. Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years. That savings is achieved in three primary ways: 1) lowering the actual cost of drives by negotiating with vendors, 2) occasionally moving to higher drive densities, and 3) increasing the slot density of pod chassis. (We moved from 45 drives to 60 drives in 2016, and as we look toward our next Storage Pod version we’ll consider adding more slots per chassis).
Meeting Storage Demand
Backblaze: When it comes to how this actually works in our operating environment, how do you stay ahead of the demand for storage capacity?
Ariel: We maintain several months of the drive space that we would need to meet capacity based on predicted demand from current customers as well as projected new customers. Those buffers are tied to what we expect will be the fill-time of our Vaults. As conditions change, we could decide to extend those buffers. Demand could increase unexpectedly, of course, so our goal is to reduce the fill-time for Vaults so we can bring more storage online as quickly as possible, if it’s needed.
Backblaze: Obviously we don’t operate in a vacuum, so do you worry about how trade challenges, weather, and other factors might affect your ability to obtain drives?
Ariel: (Laughs) Sure, I’ve got plenty to worry about. But we’ve proved to be pretty resourceful in the past when we’re challenged. For example: During the worldwide drive shortage, due to flooding in Southeast Asia, we recruited an army of family and friends to buy drives all over and send them to us. That kept us going during the shortage.
We are vulnerable, of course, if there’s a drive production shortage. Some data center hardware is manufactured in China, and I know that some of those prices have gone up. That said, all of our drives are manufactured in Thailand or Taiwan. Our Storage Pod chassis are made in the U.S.A. Big picture, we try to anticipate any shortages and plan accordingly if we can.
Backblaze: Time for a personal question… What does data durability mean to you? What do you do to help boost data durability, and spread drive hardware risk and exposure?
Ariel: That is personal. (Laughs). But also a good question, and not really personal at all: Everyone at Backblaze contributes to our data durability in different ways.
My role in maintaining eleven nines of durability is, first and foremost: Never running out of space. I achieve this by maintaining close relationships with manufacturers to ensure production supply isn’t interrupted; by improving our testing and qualification processes to catch problems before drives ever enter production; and finally by monitoring performance and replacing drives before they fail. Otherwise it’s just monitoring the company’s burn rates and managing the buffer between our drive capacity and our data under management.
When we are in a good state for space considerations, then I need to look to the future to ensure I’m providing for more long-term issues. This is where iterating on and improving our Storage Pod design comes in. I don’t think that gets factored into our durability calculus, but designing for the future is as important as anything else. We need to be prepared with hardware that can support ever-increasing hard drive capacities—and the fill- and rebuild times that come with those increases—effectively.
Backblaze: That begs the next question: As drive sizes get larger, rebuild times get longer when it’s necessary to recover data on a drive. Is that still a factor, given Backblaze’s durability architecture?
Ariel: We attempt to identify and replace problematic drives before they actually fail. When a drive starts failing, or is identified for replacement, the team always attempts to restore as much data as possible off of it because that ensures we have the most options for maintaining data durability. The rebuild times for larger drives are challenging, especially as we move to 16TB and beyond. We are looking to improve the throughput of our Pods before making the move to 20TB in order to maintain fast enough rebuild times.
And then, supporting all of this is our Vault architecture, which ensures that data will be intact even if individual drives fail. That’s the value of the architecture.
Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.
But now we’re preparing for how we’ll handle 18 and 20 TB drives, and improving system throughput will be extremely important to manage that density. We may even consider using SAS drives even though they are slightly more expensive. We need to consider all options in order to meet our scaling, durability and cost targets.
Backblaze’s Relationship with Drive Manufacturers
Backblaze: So, there’s an elephant in the room when it comes to Backblaze and hard drives: Our quarterly Hard Drive Stats reports. We’re the only company sharing that kind of data openly. How have the Drive Stats blog posts affected your purchasing relationship with the drive manufacturers?
Ariel: Due to the quantities we need and the visibility of the posts, drive manufacturers are motivated to give us their best possible product. We have a great purchasing relationship with all three companies and they update us on their plans and new drive models coming down the road.
Backblaze: Do you have any sense for what the hard drive manufacturers think of our Drive Stats blog posts?
Ariel: I know that every drive manufacturer reads our Drive Stats reports, including very senior management. I’ve heard stories of company management learning of the release of a new Drive Stats post and gathering together in a conference room to read it. I think that’s great.
Ultimately, we believe that Drive Stats is good for consumers. We wish more companies with large data centers did this. We believe it helps keep everyone open and honest. The adage is that competition is ultimately good for everyone, right?
It’s true that Western Digital, at one time, was put off by the visibility Drive Stats gave into how their models performed in our data centers (which we’ve always said is a lot different from how drives are used in homes and most businesses). Then they realized the marketing value for them—they get a lot of exposure in the blog posts—and they came around.
Backblaze: So, do you believe that the Drive Stats posts give Backblaze more influence with drive manufacturers?
Ariel: The truth is that most hard drives go directly into tier-one and -two data centers, and not into smaller data centers, homes, or businesses. The manufacturers are stamping out drives in exabyte chunks. A single tier-one data center consumes maybe 500,000 times what Backblaze does in drives. We can’t compare in purchasing power to those guys, but Drive Stats does give us visibility and some influence with the manufacturers. We have close communications with the manufacturers and we get early versions of new drives to evaluate and test. We’re on their radar and I believe they value their relationship with us, as we do with them.
Backblaze: A final question. In your opinion, are hard drives getting better?
Ariel: Yes. Drives are amazingly durable for how hard they’re used. Just think of the forces inside a hard drive, how hard they spin, and how much engineering it takes to write and read the data on the platters. I came from a background in precision optics, which requires incredibly precise tolerances, and was shocked to learn that hard drives are designed in an equally precise tolerance range, yet are made in the millions and sold as a commodity. Despite all that, they have only about a 2% annual failure rate in our centers. That’s pretty good, I think.
Thanks, Ariel. Here’s hoping the way we source petabytes of storage has been useful for your own terabyte, petabyte, or… exabyte storage needs? If you’re working on the latter, or anything between, we’d love to hear about what you’re up to in the comments.
In this blog series, we explore how you can master the nomadic life—whether for a long weekend, an extended working vacation, or maybe even the rest of your career. We profile professionals we’ve met who are stretching the boundaries of what (and where) an office can be, and glean lessons along the way to help you to follow in their footsteps. In our first post in the series, we provided practical tips for working on the road. In this edition, we profile Chris Aguilar, Amphibious Filmmaker.
There are people who do remote filming assignments, and then there’s Chris, the Producer/Director of Fin Films. For him, a normal day might begin with gathering all the equipment he’ll need—camera, lenses, gear, cases, batteries, digital storage—and securing it in a waterproof Pelican case which he’ll then strap to a paddleboard for a long swim to a race boat far out on the open ocean.
This is because Chris, a one-man team, is the preeminent cinematographer of professional paddleboard racing. When your work day involves operating from a beachside hotel, and being on location means bouncing up and down in a dinghy some 16 miles from shore, how do you succeed? We interviewed Chris to figure out.
Getting Ready for a Long Shoot
To save time in the field, Chris does as much prep work as he can. Knowing that he needs to be completely self-sufficient all day—he can’t connect to power or get additional equipment—he gathers and tests all of the cameras he’ll need for all the possible shots that might come up, packs enough SD camera cards, and grabs an SSD external drive large enough to store an entire day’s footage.
Chris edits in Adobe Premiere, so he preloads a template on his MacBook Pro to hold the day’s shots and orders everything by event so that he can drop his content in and start editing it down as quickly as possible. Typically, he chooses a compatible format that can hold all of the different content he’ll shoot. He builds a 4K timeline at 60 frames per second that can take clips from multiple cameras yet can export to other sizes and speeds as needed for delivery.
Days in the Life
Despite being in one of the most exotic and glamorous locations in the world (Hawaii), covering a 32-mile open-ocean race is grueling. Chris’s days start as early as 5AM with him grabbing shots as contestants gather, then filming as many as 35 interviews on race-day eve. He does quick edits of these to push content out as quickly as possible for avid fans all over the world.
The next morning, before race time, he double-checks all the equipment in his Pelican case, and, when there’s no dock, he swims out to the race- or camera boat. After that, Chris shoots as the race unfolds, constantly swapping out SD cards. When he’s back on dry land his first order of business is copying over all of the content to his external SSD drive.
Even after filming the race’s finish, awards ceremonies, and wrap-up interviews, he’s still not done: By 10PM he’s back at the hotel to cut a highlight reel of the day’s events and put together packages that sports press can use, including the Australian press that needs content for their morning sports shows.
For streaming content in the field, Chris relies on Google Fi through his phone because it can piggyback off of a diverse range of carriers. His backup network solution is a Verizon hotspot that usually covers him where Google Fi cannot. For editing and uploading, he’s found that he can usually rely on his hotel’s network. When that doesn’t work, he defaults to his hotspot, or a coffee shop. (His pro tip is that, for whatever reason, the Starbucks in Hawaii typically have great internet.)
Building a Case
After years of shooting open-ocean events, Chris has settled on a tried and true combination of gear—and it all fits in a single, waterproof Pelican 1510 case. His kit has evolved to be as simple and flexible as possible, allowing him to cover multiple shooting roles in a hostile environment including sand, extreme sun-glare on the water, haze, fog, and of course, the ever-present ocean water.
At the same time, his gear needs to accommodate widely varied shooting styles: Chris needs to be ready to capture up close and personal interviews; wide, dramatic shots of the pre-race ceremonies; as well as a combination of medium shots of several racers on the ocean and long, telephoto shots of individuals—all from a moving boat bobbing on the ocean. Here’s his “Waterproof Kit List”:
The Case Pelican 1510
Chris likes compact, rugged camcorders from Panasonic. They have extremely long battery life, and the latest generation have large sensor sizes, wide dynamic range and even built-in ND filter wheels to compensate for the glare on the water. He’ll also bring other cameras for special shots, like an 8mm film camera for artistic shots, or a GoPro for the classic ‘from under the sea to the waterline’ shots.
Primary Interview Camera
Panasonic EVA1 5.7K Compact Cinema Camcorder 4K 10b 4:2:2 with EF lens-mount (with rotating lens kit depending on the event)
Action Camera and B-Roll
Panasonic AG-CX350 (or EVA1 kitted out similarly if the CX350 isn’t available)
Stills and Video
Panasonic GH5 20.3MP and 4K 60fps 4:2:2 10-b Mirrorless ILC camera
Special Purpose and B-Roll Shots
Eumig Nautica Super 8 film self-sealed waterproof camera
4K GoPro in a waterproof dome housing
As a one-person show, Chris invests in enough SD cards for his cameras that can cover the entire day’s shooting without having to reuse cards. Chris will then copy all of those card’s content to a bus-powered SSD drive.
8-12 64GB or 128GB SD cards
1 TB SSD Glyph or G-Tech SSD drive
Multiple Neutral Density filters. These filters reduce the intensity of all wavelengths without affecting color. With ND filters the operator can dial in combinations of aperture, exposure time and sensor sensitivity without being overexposed, and delivers more ‘filmic’ looks, setting the aperture to a low value for sharper images, or wide open for a shallow depth-of-field
Extra batteries. Needless to say having extra batteries for his cameras and his phone is critical when he may not be able to recharge for 12 hours or more.
Now, The Real Work Begins
When wrapping up an event’s coverage, all of the content captured needs to be stored and managed. Chris’s previous workflow required transferring the raw and finished files to external drives for storage. That added up to a lot of drives. Chris estimates that over the years he had stored about 20 terabytes of footage on paddleboarding alone.
Managing all those drives proved to be too big of a task for someone who is rarely in his production office. Chris needed access to his files from wherever he was, and a way to view, catalog, and share the content with collaborators.
As he got his approach dialed to accommodate remote broadband speed, storage drive wrangling, inexpensive cloud storage, and cloud-based digital asset management systems, putting all his content into the cloud became an option for Chris. Using Backblaze’s B2 Cloud Storage along with iconik content management software, what used to take several days in the office searching through hard drives for specific footage to edit or share with a collaborator now involves just a few keyword searches and a matter of minutes to share via iconik.
For a digital media nomad like Chris, digitally native solutions based in the cloud make a lot of sense. Plus, Chris knows that the content is safely and securely stored, and not exposed to transport challenges, accidents (including those involving water), and other difficulties that could spoil both his day and that of his clients.
Learn More About How Chris Works Remotely
You can learn more about Chris, Fin Film Company, and how he works from the road in our case study on Fin Films. We’ve also linked to Chris’s Kit Page for those of you who just can’t get enough of this gear…
We’d Love to Hear Your Digital Nomad Stories
If you consider yourself a digital nomad and have an interesting story about using Backblaze Cloud Backup or B2 Cloud Storage from the road (or wherever), we’d love to hear about it, and perhaps feature your story on the blog. Tell us what you’ve been doing on the road at email@example.com.
You can view all the posts in this series on the Digital Nomads page in our Blog Archives.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.